An input lag investigation

@Brunnis: do you mean that with KMSDRM+GLES (using VC4) on the Pi the lag is the same as plain-DISPMANX? Or that KMSDRM+GLES has one frame less of latency that DISPMANX+GLES? (always max_swapchain=2)

Finally got time to do some more tests.

On Ubuntu 17.10 with up-to-date cores and identical Frame Delay settings, Nestopia and QuickNES react statistically identically.

All results with SMB1 Mario jump from half-height. Real NES hardware reacts on Frame 2, for response time of ~1.5 to ~2.5 frames, depending on input timing and Mario screen position. My MacBook Air LCD screen appears to add nearly one frame of lag.

Low latency controller LCD QuickNES frame delay 8
13
15
15
11
13
13
~13.333333333333333 (3.33 frames)

Low latency controller LCD Nestopia frame delay 8
10
10
11
12
14
13
~12 (3 frames)

…and tests on my CRT PC rig below. Since I’m able to view the tube refresh pattern outright, comparing to real hardware is much easier, and I measure the number of frames different than a real NES. As before, with the proper equipment, this number can be zero.

Low latency controller crt quicknes frame delay 14 IN DELTA (difference from real hardware)
0 frames

Low latency controller crt Nestopia frame delay 14 IN DELTA (difference from real hardware)
0 frames

My theory as to the difference between my other tests include either some change to RetroArch since the last update done to my Ubuntu installation, or an accidental difference in frame delay settings. On my Mid-2012 MacBook Air, I can’t hit such high levels of Frame Delay with some cores. Nestopia is definitely more CPU-intensive than QuickNES.

As an experiment, I ran the same test with QuickNES at a higher Frame Delay setting:

Low latency controller quicknes LCD frame delay 12
13
10
10
8
8
9
11
10
~9.875 (2.47 frames)

You can see that the higher Frame Delay lets me frequently (though not always) shave off an extra frame. The window of time to get an ideal responsive action is ever so slightly higher.

Sorry for the bad science. But what would emulation lag debates be without unvalidated hearsay masquerading as fact? :slight_smile:

Yes, KMS + VC4 + GLES or KMS + VC4 + GL has the same input lag as plain Dispmanx, i.e. one frame faster than when using the BCM driver and at any given max_swapchain setting. So, in my test results further up, the expected results for VC4 would be ~6 frames at max_swapchain=3 and ~5 frames at max_swapchain=2.

At least this was true when I tested VC4 a long time ago (2016).

No worries! Good of you to retest. :slight_smile:

It’s bugging the crap out of me that no one seems to have tried to measure the difference in input lag between Canoe and RetroArch on the SNES Mini… Also, just comparing Canoe to RetroArch on PC and trying to get a handle on the input lag of the SNES Mini (minus that added by the display) in its default state. There’s some info available, but I think it’s too thin.

EDIT: Oh, and I should probably try to find some time to rectify this situation. :grin:

2 Likes

I’ve now tried (not measured yet) RetroArch on the SNES Mini. It’s not just something people say, there really does appear to be a pretty obvious increase in input lag compared to using the built-in emulator (Canoe). Will be interesting to see what the numbers say.

Canoe indeed feels quite snappy, but as you’ve demonstrated, there’s nothing inherently latent in RetroArch, so if there’s a difference, hopefully we can figure out where the issue lies and correct it.

Yep, exactly what I was thinking.

I’ve now performed some tests, but it will be a few days to a week before I can perform the remaining tests using RetroArch on PC. However, the SNES Mini testing is done and it’s pretty interesting:

Note: Canoe is the name of the built-in emulator in the SNES Mini.

SNES Mini + Canoe + Super Mario World: 5.65 frames

SNES Mini + RetroArch (snes9x2010) + Super Mario World: 8.34 frames

Difference: ~2.7 frames (45 ms)

The results above are each based on an average of 25 samples, recording the controller in front of the screen at 240 FPS and pressing the jump button. These tests were run on a 22" Samsung LED TV, so they’re not directly comparable to my other tests. I believe this Samsung TV has 1 frame of input lag, but please just use the results above to compare the difference between using Canoe and RetroArch.

RetroArch and Canoe themselves should both produce a response on the third frame after receiving input, since that’s how Super Mario World behaves even on a real console. We know for a fact that snes9x2010 doesn’t add any extra lag on top of this. I think it’s pretty safe to assume that Canoe and RetroArch perform the same here.

Since we’re also using the same gamepad and the emulators run in the same environment, RetroArch should be able to come much closer to Canoe. My guess is that Nintendo is doing something similar to using max_swapchain_images=2 on the SNES Mini. This may be possible, even on such weak hardware, with a very optimized emulator. max_swapchain_images does nothing on the SNES Mini running RetroArch, since it’s not implemented in the RetroArch driver for the Allwinner SoC. However, implementing it would probably be pointless, since it’s highly unlikely to perform well using snes9x2010.

Even so, there’s still another 1.7 frames of input lag that “shouldn’t” be there. My guess is that those can be removed, but the question is how. The SNES Mini uses the Sunxi video driver in RetroArch, right? What do we know about this code? Who would be able to look into the behavior of this code?

Another thing that might be worth looking into is if there’s something that RetroArch does differently on the input side.

1 Like

Canoe may also do some tricks on a per-title basis. The command line options for it have a number of latency-affecting options, like GL fencing and threaded video, though they didn’t seem to have much/any effect when I tried them (I put 2 copies of the 240p test suite in, one with stock options and one with a smattering of latency-related options and did the manual lag test with each one and experienced no significant difference between them). That doesn’t mean they’re necessarily nonfunctional for the official titles, though.

Hey higan, on August 8th 2016 you tested Vulkan VS openGL latency and Vulkan results were bad ; do you know if that’s better now ?

Has there been any testing with the Android operating system in comparison to PC Linux and/or Windows? I’ve read antidotes about Android being inherently more laggy than being able to run the same software on a PC. We have parity in retroarch and its cores and I’m just wondering if this is either misinformation or possibly outdated information?

I know with my Shield ATV, the latency is pretty good. The big issues are with audio latency, which has been a big issue forever. You’ll notice Android doesn’t have a lot of music production apps like you find on iOS and it’s because the audio latency is not just bad but inconsistent.

There’s also a garbage collector that you can’t control, which can cause latency spikes and a/v desyncs unexpectedly.

Sorry if this is a subject that has been covered before, but has the RetroArch team considered automatic Frame Delay strategies?

Certainly, an emulator isn’t going to use exactly the same number of cycles each frame to allow for perfect delay prediction, but at least for systems 16bits and under, a sample reading would be in the same ballpark.

For example, the frame delay could be set based on observed performance within the first second or two of emulation, with padding of a millisecond or three to be on the safe side.

Or, if real-time raising and lowering of this delay is considered to be fraught with danger, maybe a per-emulator performance requirement “rating” coupled with a short CPU benchmark could be utilized?

@Brunnis: I did the plain-sunxi driver time ago. It was a PITA to do, it uses specific sunxi ioctls and the such… I also did the sunxi (well, general fbdev) context init code for GLES to work on those crappy allwinner boards. I am looking at the sunxi driver, and sadly it seems I used a triple buffer scheme there, instead if immediately waiting for vsync. I suppose I did that for performance reasons, before I learned more about your input lag thanks to your studies…

Is Retroarch on the Snes Mini using the sunxi driver (sunxi_gfx.c) or the mali-fbdev context+GLES renderer (mali_fbdev_ctx.c)?

I don’t know how the page flip is done on the mali fbdev driver anyway: I seem to simply call “egl_swap_buffers(&mali->egl);” and then I suppose the underlying EGL implementation does whatever it needs to, but it’s closed-source so I don’t know what it’s doing.

No idea, sorry. I probably won’t test that again in the near future.

Ahh, so you wrote that code! Awesome. :smile: To be honest, I have only guesses in regards to how RetroArch on the SNES Mini works. I have only seen the precompiled RetroArch and cores in the SNES Mini hack repositories. So, I don’t know who compiles it and with what settings. Does anybody else know? @hunterk?

In regards to double/triple buffering: I believe it must be something else going on as well. The difference between Canoe and RetroArch is almost 3 frames. Just changing to double buffering will only remove one frame of input lag and it will probably make performance tank.

@KMFDManic would be a good person to consult about the builds for the SNES Mini.

edit: otherwise, this seems to be the primary online community for SNES Mini RetroArch mods https://www.reddit.com/r/miniSNESmods

2 Likes

Let’s make this a stickied topic since it’s such an important thread.

4 Likes

Thanks for the tip!

Awesome, thanks!

1 Like

As you may know from my previous post, it has bothered me that the SNES Mini hasn’t been very well researched in terms of input lag. Especially not how RetroArch on the Mini performs. I’ve also wanted to see good comparisons to RetroPie and RetroArch on PC using the same display, to get a more complete picture. This post is my attempt at improving the knowledge on these things.

What I will do is test input lag using Super Mario World and the following setups:

  • SNES Mini using both the built-in emulator (Canoe) and RetroArch
  • RetroPie on Raspberry Pi 3 using default settings as well as various input lag reducing settings
  • RetroArch on PC (Windows) on a high-end desktop PC with all known input lag reducing settings enabled/maximized

Importantly, I will be using the same gamepad for all tests: The SNES Mini wired controller. For the Raspberry Pi and PC tests, I’ll be using Raphnet’s awesome low-latency adapter to connect the controller to a regular USB port.

Let’s begin with a detailed list of specifications before we proceed to the results.

Test method & hardware/software setup

Test method

Super Mario World (NTSC) was used for the tests. The test scene was the very beginning of the level “Yoshi’s Island 2”:

Super Mario World (U) [!]-180214-090528

I used my iPhone 8 to record videos of the monitor and controller at 240 FPS. I then counted the frames from the button appearing pressed down until the character on screen reacted (jumped), using the excellent iPhone app “Is It Snappy?” by Chad Austin. The results presented further down are based on 25 samples for each test case and the result is presented as number of frames of input lag at 60 FPS (i.e. the framerate the game runs at, not 240 FPS camera frames). Below is a screenshot of one of the recorded videos.

test

Note: It would have been better to have an LED connected to the jump button. However, I’m not about to take the soldering iron to my SNES Mini controller. Besides, a previous comparison I did showed a minimal difference (0.05 frames) in the average measured input lag between using an LED and not using an LED.

Common hardware

  • Gamepad: Original SNES Mini wired controller
  • Gamepad USB adapter: Raphnet Technologies Classic Controller to USB Adapter V2 (model number ADAP-1XWUSBMOTE_V2) - Firmware version 2.1.0. This adapter has a hard coded 1000 Hz USB polling rate (fastest rate the USB standard allows). The rate at which the adapter polls the controller was also set to 1000 Hz (again, the fastest setting available).
  • Monitor: Samsung UE22H5005 (22" 1080p LCD TV). 1280x720 resolution was used for all tests, so that the results are comparable (720p is the resolution used by the SNES Mini). The same TV settings and the same HDMI input was used for all tests.

SNES Mini

  • 4:3 aspect ratio, no border
  • Hakchi 2.21f
  • retroarch-clover 1.0c (RetroArch 1.4.1)
  • snes9x-2010

RetroPie

  • Raspberry Pi 3
  • Original Raspberry Pi PSU
  • RetroPie 4.3 (default image, with no updates applied)
  • snes9x-2010 (this is the default SNES emulator)

RetroArch PC

  • Core i7-6700K @ 4.4 GHz
  • 16 GB DDR4-2667
  • GeForce GTX 1080
  • Windows 10 Version 1709 (OS version 16299.192)
  • Nvidia GPU driver 388.13
  • RetroArch nightly from November 12 2017
  • snes9x2010

RetroArch settings

  • SNES Mini: Default settings. There’s not really anything to modify that will improve the situation. The only possible change would be video_frame_delay, but that’s a very demanding setting so not really suitable for the SNES Mini’s weak hardware.
  • RetroPie: I tested both default settings as well as the known settings that affect input lag. The results chart below indicates which settings that were modified for each test case.
  • RetroArch PC: The setup was optimized for the minimum input lag possible, using every bit of computational power afforded by the overclocked i7:
    • video_threaded = false
    • video_hard_sync=true
    • video_hard_sync_frames=0
    • video_frame_delay=14
    • video_smooth=false
    • video_fullscreen = “true”
    • video_windowed_fullscreen = “false”
    • video_scale_integer=false

Finally, just to be clear, vsync was enabled for all tests on all platforms.

Photo of the hardware

Here’s a photo showing most of the hardware used (but not the desktop PC):

Regarding input lag of the Samsung TV

I didn’t use my trusty, low-latency HP Z24i for these tests, since it doesn’t have HDMI (which is required by the SNES Mini). So, to make all measurements comparable, I instead used the Samsung UE22H5005 LCD TV for all tests. From both my own previous tests as well as testing done by Prad.de, we have strong evidence that the HP Z24i has negligible input lag (less than 1 ms). This may come as a shock to you, if you’re one of those who believe that all LCD displays must have a heap of input lag, but this HP monitor is not the only monitor in existence that has virtually no input lag (although the list of such displays isn’t very long).

In order to get a handle on how much input lag the Samsung TV have, I’ve run the RetroArch PC tests on both the Samsung and the HP display so that we can compare the difference. The HP display was tested at native 1920x1200 and the Samsung was tested at 1280x720. The results (average measured input lag for the Super Mario World test case):

  • Samsung UE22H5005: 4.6 frames
  • HP Z24i: 3.54 frames

Difference: 1.06 frames (17.67 ms)

So, given these results, we can assume the Samsung TV adds ~1 frame of total input lag to the figures presented in the chart below. In other words, to get how each system performs without taking the display into account, subtract 1 frame from the result.

As a side note, the result measured on the HP screen (3.54 frames) is the lowest input lag I’ve ever seen measured for emulated Super Mario World. Given the test scene used and given the fact that Super Mario World is designed to respond to input on the third frame after receiving said input, a real SNES on a CRT will, at best, achieve an average input lag of 3.3 frames. That means we’re some 0.2-0.3 frames or 3-5 ms behind the real thing.

The test results

All results in the chart below are reported as number of frames at 60 FPS, since that is the frame rate at which Super Mario World runs. So, to convert the figures to milliseconds, simply multiply them by 16.67.

Result analysis

First of all, remember that the monitor I’ve tested on has ~1 frame of input lag. So, to get the result of each system without taking the monitor into account, simply subtract 1 from all of the results.

We can see that the SNES Mini with it’s default emulator (Canoe) is pretty fast. A real SNES on a CRT would achieve ~3.3 frames in our test case and the SNES Mini achieves ~4.6 frames if we remove the Samsung TV’s input lag. That’s just ~1.3 frames (~22 ms) behind the real thing. That’s pretty awesome and a job well done by Nintendo, especially given the low computational performance of the Mini’s hardware. The real problem for most people will be that their TV’s add quite a lot of input lag on top of this.

We can also see that the default RetroPie is painfully slow at 8 frames (7 if we remove the Samsung TV’s input lag). Remeber that 8 frames is what we achieve with this comparably fast TV (1 frame of input lag is pretty much as fast as TVs go currently) and a very fast input method. Most people will use standard USB gamepads with standard USB polling rates (125 Hz) and TV’s that add 2 or more frames of lag. The average RetroPie user running a stock setup on his TV might therefore have a total input lag of ~10 frames (167 ms). That’s definitely very noticeable and quite distracting. Please note that a game with less built-in lag than Super Mario World might reduce that figure by 1-2 frames, but it’s still not looking very good.

It’s interesting to see how the RetroPie setup reacts when we, one by one, apply the known input lag reducing settings. Combining them all, we can actually match the SNES Mini. However, this is slightly misleading, as there are a few drawbacks to using these settings. Using the Dispmanx video driver means you lose the ability to use shaders as well as the on screen text (for example when saving). The video_max_swapchain_images=2 setting is also very demanding and many SNES games will not run fullspeed with it enabled. You probably can use it together with the other input lag reducing settings for select 8-bit and 16-bit games, but it would be a bit cumbersome to setup and in that case I’d recommend switching to a more powerful platform (such as x86) instead. Choosing the middle ground of using the Dispmanx driver and disabling threaded video is certainly possible. This works perfectly for NES/SNES and will put you within a frame of the SNES Mini, given a fast enough input device.

We also finally get some hard numbers for how RetroArch performs on the SNES Mini and it’s not pretty. It’s around 2.7 frames (45 ms) slower than Canoe and the difference is definitely noticeable. Exactly why RetroArch is this much slower is something I’ll leave to others to figure out, but my guess is that the difference doesn’t have to be this big. “Someone” should probably look into the video backend and possibly the input handling.

Last but not least, RetroArch on PC manages to edge out all other systems/setups. The difference is mainly thanks to the high performance allowing us to use frame delay to shave off an additional 14 ms (0.84 frames) and arrive at near-console performance.

I’ll end with a caveat: The Mini was tested with a single game. Some games inherently have less input lag than Super Mario World (such as Super Metroid, which responds on the second frame) and such differences affect all tested platforms. However, Nintendo could also be using per-game settings for Canoe that affect input lag. For example, it’s possible that they use additional buffering for games that are harder (more computationally) to emulate, to keep framerate high, which might in turn add additional lag. This is speculation at this point and will have to be the subject of a possible future test, but I’ll leave it here as something to keep in mind.

Thanks for reading another lengthy post! :smile:

7 Likes

great analysis as usual!

this bit troubles me. as i support it, i make a point of using stock retropie. i also have a ~5 year old samsung HDTV that probably has crappy response rates, yet i’ve completed super mario world and didn’t notice any lag. that’s not scientific, but 167+ms?! next time i will do a semi-scientific test and record my button presses & the screen with my iphone recording in slow-mo. there’s got to be something going on, here.