An input lag investigation


@Brunnis: Thanks a lot for the tests! If we concentrate on the max_swapchain=2 cases, that’s where the low-latency is , I really can’t understand where the extra frame of lag on GL is coming from: with max_swapchain=2, both drivers wait for vsync immediately after sending the frame to the GPU… Why, oh why is dispmanx faster? That shouldn’t be happening :frowning:


What if this extra lag is actually built into the BCM driver? I know that it is not present when using the VC4 driver (although that is not really proof that the BCM driver is to blame). Just speculating here…


@Brunnis: do you mean that with KMSDRM+GLES (using VC4) on the Pi the lag is the same as plain-DISPMANX? Or that KMSDRM+GLES has one frame less of latency that DISPMANX+GLES? (always max_swapchain=2)


Finally got time to do some more tests.

On Ubuntu 17.10 with up-to-date cores and identical Frame Delay settings, Nestopia and QuickNES react statistically identically.

All results with SMB1 Mario jump from half-height. Real NES hardware reacts on Frame 2, for response time of ~1.5 to ~2.5 frames, depending on input timing and Mario screen position. My MacBook Air LCD screen appears to add nearly one frame of lag.

Low latency controller LCD QuickNES frame delay 8
~13.333333333333333 (3.33 frames)

Low latency controller LCD Nestopia frame delay 8
~12 (3 frames)

…and tests on my CRT PC rig below. Since I’m able to view the tube refresh pattern outright, comparing to real hardware is much easier, and I measure the number of frames different than a real NES. As before, with the proper equipment, this number can be zero.

Low latency controller crt quicknes frame delay 14 IN DELTA (difference from real hardware)
0 frames

Low latency controller crt Nestopia frame delay 14 IN DELTA (difference from real hardware)
0 frames

My theory as to the difference between my other tests include either some change to RetroArch since the last update done to my Ubuntu installation, or an accidental difference in frame delay settings. On my Mid-2012 MacBook Air, I can’t hit such high levels of Frame Delay with some cores. Nestopia is definitely more CPU-intensive than QuickNES.

As an experiment, I ran the same test with QuickNES at a higher Frame Delay setting:

Low latency controller quicknes LCD frame delay 12
~9.875 (2.47 frames)

You can see that the higher Frame Delay lets me frequently (though not always) shave off an extra frame. The window of time to get an ideal responsive action is ever so slightly higher.

Sorry for the bad science. But what would emulation lag debates be without unvalidated hearsay masquerading as fact? :slight_smile:


Yes, KMS + VC4 + GLES or KMS + VC4 + GL has the same input lag as plain Dispmanx, i.e. one frame faster than when using the BCM driver and at any given max_swapchain setting. So, in my test results further up, the expected results for VC4 would be ~6 frames at max_swapchain=3 and ~5 frames at max_swapchain=2.

At least this was true when I tested VC4 a long time ago (2016).

No worries! Good of you to retest. :slight_smile:


It’s bugging the crap out of me that no one seems to have tried to measure the difference in input lag between Canoe and RetroArch on the SNES Mini… Also, just comparing Canoe to RetroArch on PC and trying to get a handle on the input lag of the SNES Mini (minus that added by the display) in its default state. There’s some info available, but I think it’s too thin.

EDIT: Oh, and I should probably try to find some time to rectify this situation. :grin:


I’ve now tried (not measured yet) RetroArch on the SNES Mini. It’s not just something people say, there really does appear to be a pretty obvious increase in input lag compared to using the built-in emulator (Canoe). Will be interesting to see what the numbers say.


Canoe indeed feels quite snappy, but as you’ve demonstrated, there’s nothing inherently latent in RetroArch, so if there’s a difference, hopefully we can figure out where the issue lies and correct it.


Yep, exactly what I was thinking.

I’ve now performed some tests, but it will be a few days to a week before I can perform the remaining tests using RetroArch on PC. However, the SNES Mini testing is done and it’s pretty interesting:

Note: Canoe is the name of the built-in emulator in the SNES Mini.

SNES Mini + Canoe + Super Mario World: 5.65 frames

SNES Mini + RetroArch (snes9x2010) + Super Mario World: 8.34 frames

Difference: ~2.7 frames (45 ms)

The results above are each based on an average of 25 samples, recording the controller in front of the screen at 240 FPS and pressing the jump button. These tests were run on a 22" Samsung LED TV, so they’re not directly comparable to my other tests. I believe this Samsung TV has 1 frame of input lag, but please just use the results above to compare the difference between using Canoe and RetroArch.

RetroArch and Canoe themselves should both produce a response on the third frame after receiving input, since that’s how Super Mario World behaves even on a real console. We know for a fact that snes9x2010 doesn’t add any extra lag on top of this. I think it’s pretty safe to assume that Canoe and RetroArch perform the same here.

Since we’re also using the same gamepad and the emulators run in the same environment, RetroArch should be able to come much closer to Canoe. My guess is that Nintendo is doing something similar to using max_swapchain_images=2 on the SNES Mini. This may be possible, even on such weak hardware, with a very optimized emulator. max_swapchain_images does nothing on the SNES Mini running RetroArch, since it’s not implemented in the RetroArch driver for the Allwinner SoC. However, implementing it would probably be pointless, since it’s highly unlikely to perform well using snes9x2010.

Even so, there’s still another 1.7 frames of input lag that “shouldn’t” be there. My guess is that those can be removed, but the question is how. The SNES Mini uses the Sunxi video driver in RetroArch, right? What do we know about this code? Who would be able to look into the behavior of this code?

Another thing that might be worth looking into is if there’s something that RetroArch does differently on the input side.


Canoe may also do some tricks on a per-title basis. The command line options for it have a number of latency-affecting options, like GL fencing and threaded video, though they didn’t seem to have much/any effect when I tried them (I put 2 copies of the 240p test suite in, one with stock options and one with a smattering of latency-related options and did the manual lag test with each one and experienced no significant difference between them). That doesn’t mean they’re necessarily nonfunctional for the official titles, though.


Hey higan, on August 8th 2016 you tested Vulkan VS openGL latency and Vulkan results were bad ; do you know if that’s better now ?


Has there been any testing with the Android operating system in comparison to PC Linux and/or Windows? I’ve read antidotes about Android being inherently more laggy than being able to run the same software on a PC. We have parity in retroarch and its cores and I’m just wondering if this is either misinformation or possibly outdated information?


I know with my Shield ATV, the latency is pretty good. The big issues are with audio latency, which has been a big issue forever. You’ll notice Android doesn’t have a lot of music production apps like you find on iOS and it’s because the audio latency is not just bad but inconsistent.

There’s also a garbage collector that you can’t control, which can cause latency spikes and a/v desyncs unexpectedly.


Sorry if this is a subject that has been covered before, but has the RetroArch team considered automatic Frame Delay strategies?

Certainly, an emulator isn’t going to use exactly the same number of cycles each frame to allow for perfect delay prediction, but at least for systems 16bits and under, a sample reading would be in the same ballpark.

For example, the frame delay could be set based on observed performance within the first second or two of emulation, with padding of a millisecond or three to be on the safe side.

Or, if real-time raising and lowering of this delay is considered to be fraught with danger, maybe a per-emulator performance requirement “rating” coupled with a short CPU benchmark could be utilized?


@Brunnis: I did the plain-sunxi driver time ago. It was a PITA to do, it uses specific sunxi ioctls and the such… I also did the sunxi (well, general fbdev) context init code for GLES to work on those crappy allwinner boards. I am looking at the sunxi driver, and sadly it seems I used a triple buffer scheme there, instead if immediately waiting for vsync. I suppose I did that for performance reasons, before I learned more about your input lag thanks to your studies…

Is Retroarch on the Snes Mini using the sunxi driver (sunxi_gfx.c) or the mali-fbdev context+GLES renderer (mali_fbdev_ctx.c)?

I don’t know how the page flip is done on the mali fbdev driver anyway: I seem to simply call “egl_swap_buffers(&mali->egl);” and then I suppose the underlying EGL implementation does whatever it needs to, but it’s closed-source so I don’t know what it’s doing.


No idea, sorry. I probably won’t test that again in the near future.

Ahh, so you wrote that code! Awesome. :smile: To be honest, I have only guesses in regards to how RetroArch on the SNES Mini works. I have only seen the precompiled RetroArch and cores in the SNES Mini hack repositories. So, I don’t know who compiles it and with what settings. Does anybody else know? @hunterk?

In regards to double/triple buffering: I believe it must be something else going on as well. The difference between Canoe and RetroArch is almost 3 frames. Just changing to double buffering will only remove one frame of input lag and it will probably make performance tank.


@KMFDManic would be a good person to consult about the builds for the SNES Mini.

edit: otherwise, this seems to be the primary online community for SNES Mini RetroArch mods


Let’s make this a stickied topic since it’s such an important thread.



Thanks for the tip!

Awesome, thanks!