Let’s make this a stickied topic since it’s such an important thread.
Thanks for the tip!
As you may know from my previous post, it has bothered me that the SNES Mini hasn’t been very well researched in terms of input lag. Especially not how RetroArch on the Mini performs. I’ve also wanted to see good comparisons to RetroPie and RetroArch on PC using the same display, to get a more complete picture. This post is my attempt at improving the knowledge on these things.
What I will do is test input lag using Super Mario World and the following setups:
- SNES Mini using both the built-in emulator (Canoe) and RetroArch
- RetroPie on Raspberry Pi 3 using default settings as well as various input lag reducing settings
- RetroArch on PC (Windows) on a high-end desktop PC with all known input lag reducing settings enabled/maximized
Importantly, I will be using the same gamepad for all tests: The SNES Mini wired controller. For the Raspberry Pi and PC tests, I’ll be using Raphnet’s awesome low-latency adapter to connect the controller to a regular USB port.
Let’s begin with a detailed list of specifications before we proceed to the results.
Test method & hardware/software setup
Super Mario World (NTSC) was used for the tests. The test scene was the very beginning of the level “Yoshi’s Island 2”:
I used my iPhone 8 to record videos of the monitor and controller at 240 FPS. I then counted the frames from the button appearing pressed down until the character on screen reacted (jumped), using the excellent iPhone app “Is It Snappy?” by Chad Austin. The results presented further down are based on 25 samples for each test case and the result is presented as number of frames of input lag at 60 FPS (i.e. the framerate the game runs at, not 240 FPS camera frames). Below is a screenshot of one of the recorded videos.
Note: It would have been better to have an LED connected to the jump button. However, I’m not about to take the soldering iron to my SNES Mini controller. Besides, a previous comparison I did showed a minimal difference (0.05 frames) in the average measured input lag between using an LED and not using an LED.
- Gamepad: Original SNES Mini wired controller
- Gamepad USB adapter: Raphnet Technologies Classic Controller to USB Adapter V2 (model number ADAP-1XWUSBMOTE_V2) - Firmware version 2.1.0. This adapter has a hard coded 1000 Hz USB polling rate (fastest rate the USB standard allows). The rate at which the adapter polls the controller was also set to 1000 Hz (again, the fastest setting available).
- Monitor: Samsung UE22H5005 (22" 1080p LCD TV). 1280x720 resolution was used for all tests, so that the results are comparable (720p is the resolution used by the SNES Mini). The same TV settings and the same HDMI input was used for all tests.
- 4:3 aspect ratio, no border
- Hakchi 2.21f
- retroarch-clover 1.0c (RetroArch 1.4.1)
- Raspberry Pi 3
- Original Raspberry Pi PSU
- RetroPie 4.3 (default image, with no updates applied)
- snes9x-2010 (this is the default SNES emulator)
- Core i7-6700K @ 4.4 GHz
- 16 GB DDR4-2667
- GeForce GTX 1080
- Windows 10 Version 1709 (OS version 16299.192)
- Nvidia GPU driver 388.13
- RetroArch nightly from November 12 2017
- SNES Mini: Default settings. There’s not really anything to modify that will improve the situation. The only possible change would be video_frame_delay, but that’s a very demanding setting so not really suitable for the SNES Mini’s weak hardware.
- RetroPie: I tested both default settings as well as the known settings that affect input lag. The results chart below indicates which settings that were modified for each test case.
RetroArch PC: The setup was optimized for the minimum input lag possible, using every bit of computational power afforded by the overclocked i7:
- video_threaded = false
- video_fullscreen = “true”
- video_windowed_fullscreen = “false”
Finally, just to be clear, vsync was enabled for all tests on all platforms.
Photo of the hardware
Here’s a photo showing most of the hardware used (but not the desktop PC):
Regarding input lag of the Samsung TV
I didn’t use my trusty, low-latency HP Z24i for these tests, since it doesn’t have HDMI (which is required by the SNES Mini). So, to make all measurements comparable, I instead used the Samsung UE22H5005 LCD TV for all tests. From both my own previous tests as well as testing done by Prad.de, we have strong evidence that the HP Z24i has negligible input lag (less than 1 ms). This may come as a shock to you, if you’re one of those who believe that all LCD displays must have a heap of input lag, but this HP monitor is not the only monitor in existence that has virtually no input lag (although the list of such displays isn’t very long).
In order to get a handle on how much input lag the Samsung TV have, I’ve run the RetroArch PC tests on both the Samsung and the HP display so that we can compare the difference. The HP display was tested at native 1920x1200 and the Samsung was tested at 1280x720. The results (average measured input lag for the Super Mario World test case):
- Samsung UE22H5005: 4.6 frames
- HP Z24i: 3.54 frames
Difference: 1.06 frames (17.67 ms)
So, given these results, we can assume the Samsung TV adds ~1 frame of total input lag to the figures presented in the chart below. In other words, to get how each system performs without taking the display into account, subtract 1 frame from the result.
As a side note, the result measured on the HP screen (3.54 frames) is the lowest input lag I’ve ever seen measured for emulated Super Mario World. Given the test scene used and given the fact that Super Mario World is designed to respond to input on the third frame after receiving said input, a real SNES on a CRT will, at best, achieve an average input lag of 3.3 frames. That means we’re some 0.2-0.3 frames or 3-5 ms behind the real thing.
The test results
All results in the chart below are reported as number of frames at 60 FPS, since that is the frame rate at which Super Mario World runs. So, to convert the figures to milliseconds, simply multiply them by 16.67.
First of all, remember that the monitor I’ve tested on has ~1 frame of input lag. So, to get the result of each system without taking the monitor into account, simply subtract 1 from all of the results.
We can see that the SNES Mini with it’s default emulator (Canoe) is pretty fast. A real SNES on a CRT would achieve ~3.3 frames in our test case and the SNES Mini achieves ~4.6 frames if we remove the Samsung TV’s input lag. That’s just ~1.3 frames (~22 ms) behind the real thing. That’s pretty awesome and a job well done by Nintendo, especially given the low computational performance of the Mini’s hardware. The real problem for most people will be that their TV’s add quite a lot of input lag on top of this.
We can also see that the default RetroPie is painfully slow at 8 frames (7 if we remove the Samsung TV’s input lag). Remeber that 8 frames is what we achieve with this comparably fast TV (1 frame of input lag is pretty much as fast as TVs go currently) and a very fast input method. Most people will use standard USB gamepads with standard USB polling rates (125 Hz) and TV’s that add 2 or more frames of lag. The average RetroPie user running a stock setup on his TV might therefore have a total input lag of ~10 frames (167 ms). That’s definitely very noticeable and quite distracting. Please note that a game with less built-in lag than Super Mario World might reduce that figure by 1-2 frames, but it’s still not looking very good.
It’s interesting to see how the RetroPie setup reacts when we, one by one, apply the known input lag reducing settings. Combining them all, we can actually match the SNES Mini. However, this is slightly misleading, as there are a few drawbacks to using these settings. Using the Dispmanx video driver means you lose the ability to use shaders as well as the on screen text (for example when saving). The video_max_swapchain_images=2 setting is also very demanding and many SNES games will not run fullspeed with it enabled. You probably can use it together with the other input lag reducing settings for select 8-bit and 16-bit games, but it would be a bit cumbersome to setup and in that case I’d recommend switching to a more powerful platform (such as x86) instead. Choosing the middle ground of using the Dispmanx driver and disabling threaded video is certainly possible. This works perfectly for NES/SNES and will put you within a frame of the SNES Mini, given a fast enough input device.
We also finally get some hard numbers for how RetroArch performs on the SNES Mini and it’s not pretty. It’s around 2.7 frames (45 ms) slower than Canoe and the difference is definitely noticeable. Exactly why RetroArch is this much slower is something I’ll leave to others to figure out, but my guess is that the difference doesn’t have to be this big. “Someone” should probably look into the video backend and possibly the input handling.
Last but not least, RetroArch on PC manages to edge out all other systems/setups. The difference is mainly thanks to the high performance allowing us to use frame delay to shave off an additional 14 ms (0.84 frames) and arrive at near-console performance.
I’ll end with a caveat: The Mini was tested with a single game. Some games inherently have less input lag than Super Mario World (such as Super Metroid, which responds on the second frame) and such differences affect all tested platforms. However, Nintendo could also be using per-game settings for Canoe that affect input lag. For example, it’s possible that they use additional buffering for games that are harder (more computationally) to emulate, to keep framerate high, which might in turn add additional lag. This is speculation at this point and will have to be the subject of a possible future test, but I’ll leave it here as something to keep in mind.
Thanks for reading another lengthy post!
great analysis as usual!
this bit troubles me. as i support it, i make a point of using stock retropie. i also have a ~5 year old samsung HDTV that probably has crappy response rates, yet i’ve completed super mario world and didn’t notice any lag. that’s not scientific, but 167+ms?! next time i will do a semi-scientific test and record my button presses & the screen with my iphone recording in slow-mo. there’s got to be something going on, here.
Thanks for this investigation! Are you aware that there’s a newer RetroArch available? https://github.com/KMFDManic/NESC-SNESC-Modifications/releases
I use that one and it’s definitely better than the clover one, especially if you mix in a few of your recommended settings from RetroArch PC.
Yes, please do test. I recommend using the awesome iPhone app “Is It Snappy?” to analyze the recorded video. The app is made for this kind of input lag analysis and is very convenient.
Regarding RetroPie lag: I have tested most RetroPie versions for the past couple of years and on the HP Z24i, the Samsung LCD TV used in this test as well as my Samsung 50" plasma. The results always add up to the same input lag. It makes sense when compared to the PC as well. If you take the PC results and start adding up the extra lag from disabling the lag reducing settings, you get this (using results from my chart):
PC base: 4.6 frames + 0.84 frames (video_frame_delay) + 1 frame (video_max_swapchain_images) + 1 frame (dispmanx - for whatever reason, the stock BCM video driver on the Pi has one frame of extra lag compared to dispmanx and most PC video drivers. The VC4 driver fixes this, so that's a possible future improvement.) + 0.5 frames (video_threaded) = ~8 frames
So, I really don’t think there’s anything strange going on here. Would still be interesting to see your tests.
Thanks, I’ll see if I can make an additional test with the updated RetroArch. I don’t see anything particularly interesting in the GitHub logs, though (that would have an effect on input lag). The only setting of the ones I listed that will have an effect on the SNES Mini is, unfortunately, video_frame_delay and that will make performance tank quickly. video_threaded is disabled by default. I should also mention that video_max_swapchain_images won’t have an effect on the SNES Mini, as it’s not implemented in the RetroArch driver (I’ve actually measured this and confirmed it has no effect).
Could you make a similar graph showing purely the emulation lag (input+cpu+display), with all shared quantities (your TV’s extra frame + SMW input processing) removed?
I have to think there are still a lot of people out there who would take a quick glance at your graph and think “oh, emulation adds, like, 5 frames of lag over The Real Thing!”
And to corroborate your findings, my SNES clone (SNESOAC? SNOAC?) and emulation rig with Retroarch and Snes9x react on the same CRT frame in Super Mario World
Sure! The graph below removes all sources of input lag that are either external (monitor), added by the game design, or otherwise part of how both the emulated system and a real console works:
TV lag: 1.06 frames Super Mario World: 2 frames (extra compared to a game that responds on the next frame) Scanout of frame to display (until character is rendered): 0.8 frames Average time between receiving input and starting new frame: 0.5 frames Total: 4.36 frames
The resulting graph is below. What it shows is basically how much extra input lag each system/configuration exhibits compared to The Real Thing (on a CRT). Don’t know if it makes things more or less clear, though…
Got a 7 years old Samsung TV that has 14ms of lag, so you never know.
I had to name the source hdmi channel “PC” to remove any kind of processing that could slow down things (it’s written in the TV manual to do that).
There are also some command line switches that can affect input latency on Canoe, the built-in emulator.
-no-lowlatency Render in a separate thread, to accommodate “slow” titles.
-lowlatency Render on the main thread to reduce input latency.
-no-cpurender Use the old GPU code for rendering
-cpurender Use the CPU for rendering
-glFinish Graphics option to reduces latency on mali400, but may degrade framerate
-no-glFinish Opposite of the above option, which became default as of 1.9.1201
That is a good idea. I know it is possible to track how long a frame takes to process. And that padding could be a configurable number just like base frame delay is now. I’d even be okay with it being on by default with a 3ms padding. Since anywhere frame delay would cause problems, it would automatically set it low/off anyway.
Concerning RetroArch, are these results similar in other cores?
Does BSNES or SNES9x (recent version) show other results for instance?
There are some cores that suffer from high input lag like Mednafen Saturn. So i would like to find the fastest cores per system.
I wonder what type of meager hardware setup would allow for the ideal settings to reduce input without video_frame_delay=14 due to the fact that it is the most expensive of the lag reduction techniques. The reason this would be the ideal setup in my mind, is that it would allow for Pixellate.cg or sharp-bilinear.cg to do some interpolation, like the SNES Mini when doing non-integer scaling, while still being an affordable piece of hardware to put on a TV without sacrificing a gaming PC budget. Interpolation for non-integer 1080p that limits artifacting is 2nd only to input lag for me in terms of enjoyable experiences. What a blessing that feature is on the SNES Mini.
Now I am not familiar with RetroPie, but I do know the “dispmanx” setting is unique to the Raspberry Pi. I assume that using a x86 based PC, there is no similar choice available due to being “limited”(?) to running the OpenGL renderer? Cheap PC would be even better if that meant free OS.
Can someone do this type of testing for G-SYNC monitors?
I’m always getting conflicting information on what settings you should be using for G-SYNC.
At first, I was under the impression that you just flip v-sync to off in RA and none of the other settings like hard gpu sync/video frame delay matter anymore because they rely on v-sync. But then I got told otherwise so I’m really not sure what the definite settings are.
Does anyone know? Could we have some testing with G-SYNC? It would ideally be the best way to get the least input lag on a non-CRT monitor. I would test myself but I don’t have a high-speed camera.
Is there any chance we could do comparisons between the newly added D3D11/12 drivers and OpenGL/Vulkan on Windows too? D3D11 has a lot higher maximum FPS for me on an Intel PC with nvidia GPU vs. Vulkan/GL.
The bsnes-* and bsnes-mercury-* cores as well as the regular snes9x core perform the same latency wise. They’re more demanding than snes9x-2010, though, so it will be harder to use all the latency reducing settings to full effect (primarily frame delay).
I’d like to, but I need to stop myself now. I keep coming back to this stuff because I find it interesting, but I really don’t have the time anymore. It’s pretty easy to do the testing for anyone that has a 240 FPS camera (that’s the minimum I’d use), though, so hopefully someone can step up and do it.
@Twinaphex On another forum I read that to achieve minimal latency with D3D11 a so called “waitable swap chain” has to be implemented. Apparently it can reduce latency to up to a frame. See this link:
Do you know whether this is used with the current D3D11 version on the buildbot?
Thanks for another great test Brunnis. I’m impressed by how thorough you’re doing them and sharing the results about it. I have a SNES Mini myself also, so it’s really good to know how it stacks up
With regards to the D3D11 driver, it seems it’s putting a lot less load on the system compared to GL. I have a small Atom setup which I can now run with CRT-Easymode-Halation using the D3D11 driver, whereas with GL it will slow down to an unplayable level. Latency wise it feels really good (much much better than the old D3D9 driver). Not sure though if it’s fully on par with GL and hard GPU sync on.
Super interested to hear, as I have a small atom setup (original compute stick) and it would be nice to see improvements performance wise as the latency fixes don’t all work due to the weak (but still better than the raspberry pi 3) chipset.
I wonder what type of meager hardware setup would allow for the ideal settings to reduce input without video_frame_delay=14 due to the fact that it is the most expensive of the lag reduction techniques
The closest (laziest?) way I’ve found to measure performance for systems is by searching for CPUs or devices on Geekbench and comparing Single Core scores. Good enough to get you in the ballpark
A Raspberry Pi 3 is just under 500. An NES/SNES Mini (AllWinner R16) is just above 300.
Today’s high-end CPUs come in around 5000.
Let’s say a score of 5000 is enough to get you able to use a Frame Delay of 14 on non-complex emulators (Snes9x? Sure. Higan? No. Moore’s Law failed us. Sorry). That leaves ~2ms of time to actually DO the computation for emulation.
In theory, a Frame Delay of 12 would then mean ~4ms of time (double!) for computation.
A Frame Delay of 8 gives double the time again. In other words, 1/4 the CPU power needed compared to the highest delay setting. 5000 / 4 = ~1250 Single Core Score in Geekbench
Of course, there’s lots of overhead and other things the OS needs to do, so this won’t be perfectly linear. Not to mention differing CPU needs for each emulator or console. There’s no magic static setting that anyone can point to.