An input lag investigation

Thanks for this investigation! Are you aware that there’s a newer RetroArch available? https://github.com/KMFDManic/NESC-SNESC-Modifications/releases

I use that one and it’s definitely better than the clover one, especially if you mix in a few of your recommended settings from RetroArch PC.

Thanks!

Yes, please do test. I recommend using the awesome iPhone app “Is It Snappy?” to analyze the recorded video. The app is made for this kind of input lag analysis and is very convenient.

Regarding RetroPie lag: I have tested most RetroPie versions for the past couple of years and on the HP Z24i, the Samsung LCD TV used in this test as well as my Samsung 50" plasma. The results always add up to the same input lag. It makes sense when compared to the PC as well. If you take the PC results and start adding up the extra lag from disabling the lag reducing settings, you get this (using results from my chart):

PC base: 4.6 frames

+ 0.84 frames (video_frame_delay)
+ 1 frame (video_max_swapchain_images)
+ 1 frame (dispmanx - for whatever reason, the stock BCM video driver on the Pi has one frame of extra lag compared to dispmanx and most PC video drivers. The VC4 driver fixes this, so that's a possible future improvement.)
+ 0.5 frames (video_threaded)

= ~8 frames

So, I really don’t think there’s anything strange going on here. Would still be interesting to see your tests.

Thanks, I’ll see if I can make an additional test with the updated RetroArch. I don’t see anything particularly interesting in the GitHub logs, though (that would have an effect on input lag). The only setting of the ones I listed that will have an effect on the SNES Mini is, unfortunately, video_frame_delay and that will make performance tank quickly. video_threaded is disabled by default. I should also mention that video_max_swapchain_images won’t have an effect on the SNES Mini, as it’s not implemented in the RetroArch driver (I’ve actually measured this and confirmed it has no effect).

1 Like

Could you make a similar graph showing purely the emulation lag (input+cpu+display), with all shared quantities (your TV’s extra frame + SMW input processing) removed?

I have to think there are still a lot of people out there who would take a quick glance at your graph and think “oh, emulation adds, like, 5 frames of lag over The Real Thing!”

And to corroborate your findings, my SNES clone (SNESOAC? SNOAC?) and emulation rig with Retroarch and Snes9x react on the same CRT frame in Super Mario World :slight_smile:

Sure! The graph below removes all sources of input lag that are either external (monitor), added by the game design, or otherwise part of how both the emulated system and a real console works:

TV lag: 1.06 frames
Super Mario World: 2 frames (extra compared to a game that responds on the next frame)
Scanout of frame to display (until character is rendered): 0.8 frames
Average time between receiving input and starting new frame: 0.5 frames

Total: 4.36 frames

The resulting graph is below. What it shows is basically how much extra input lag each system/configuration exhibits compared to The Real Thing (on a CRT). Don’t know if it makes things more or less clear, though… :smile:

Awesome, thanks!

2 Likes

Got a 7 years old Samsung TV that has 14ms of lag, so you never know.
I had to name the source hdmi channel “PC” to remove any kind of processing that could slow down things (it’s written in the TV manual to do that).

There are also some command line switches that can affect input latency on Canoe, the built-in emulator.

-no-lowlatency Render in a separate thread, to accommodate “slow” titles.
-lowlatency Render on the main thread to reduce input latency.
-no-cpurender Use the old GPU code for rendering
-cpurender Use the CPU for rendering
-glFinish Graphics option to reduces latency on mali400, but may degrade framerate
-no-glFinish Opposite of the above option, which became default as of 1.9.1201

maybe more:

That is a good idea. I know it is possible to track how long a frame takes to process. And that padding could be a configurable number just like base frame delay is now. I’d even be okay with it being on by default with a 3ms padding. Since anywhere frame delay would cause problems, it would automatically set it low/off anyway.

Concerning RetroArch, are these results similar in other cores?

Does BSNES or SNES9x (recent version) show other results for instance?

There are some cores that suffer from high input lag like Mednafen Saturn. So i would like to find the fastest cores per system.

I wonder what type of meager hardware setup would allow for the ideal settings to reduce input without video_frame_delay=14 due to the fact that it is the most expensive of the lag reduction techniques. The reason this would be the ideal setup in my mind, is that it would allow for Pixellate.cg or sharp-bilinear.cg to do some interpolation, like the SNES Mini when doing non-integer scaling, while still being an affordable piece of hardware to put on a TV without sacrificing a gaming PC budget. Interpolation for non-integer 1080p that limits artifacting is 2nd only to input lag for me in terms of enjoyable experiences. What a blessing that feature is on the SNES Mini.

Now I am not familiar with RetroPie, but I do know the “dispmanx” setting is unique to the Raspberry Pi. I assume that using a x86 based PC, there is no similar choice available due to being “limited”(?) to running the OpenGL renderer? Cheap PC would be even better if that meant free OS.

Can someone do this type of testing for G-SYNC monitors?

I’m always getting conflicting information on what settings you should be using for G-SYNC.

At first, I was under the impression that you just flip v-sync to off in RA and none of the other settings like hard gpu sync/video frame delay matter anymore because they rely on v-sync. But then I got told otherwise so I’m really not sure what the definite settings are.

Does anyone know? Could we have some testing with G-SYNC? It would ideally be the best way to get the least input lag on a non-CRT monitor. I would test myself but I don’t have a high-speed camera.

1 Like

Is there any chance we could do comparisons between the newly added D3D11/12 drivers and OpenGL/Vulkan on Windows too? D3D11 has a lot higher maximum FPS for me on an Intel PC with nvidia GPU vs. Vulkan/GL.

1 Like

The bsnes-* and bsnes-mercury-* cores as well as the regular snes9x core perform the same latency wise. They’re more demanding than snes9x-2010, though, so it will be harder to use all the latency reducing settings to full effect (primarily frame delay).

I’d like to, but I need to stop myself now. I keep coming back to this stuff because I find it interesting, but I really don’t have the time anymore. It’s pretty easy to do the testing for anyone that has a 240 FPS camera (that’s the minimum I’d use), though, so hopefully someone can step up and do it.

@Twinaphex On another forum I read that to achieve minimal latency with D3D11 a so called “waitable swap chain” has to be implemented. Apparently it can reduce latency to up to a frame. See this link:

Reduce latency with DXGI 1.3 swap chains

Do you know whether this is used with the current D3D11 version on the buildbot?

Thanks for another great test Brunnis. I’m impressed by how thorough you’re doing them and sharing the results about it. I have a SNES Mini myself also, so it’s really good to know how it stacks up :smiley:

With regards to the D3D11 driver, it seems it’s putting a lot less load on the system compared to GL. I have a small Atom setup which I can now run with CRT-Easymode-Halation using the D3D11 driver, whereas with GL it will slow down to an unplayable level. Latency wise it feels really good (much much better than the old D3D9 driver). Not sure though if it’s fully on par with GL and hard GPU sync on.

Super interested to hear, as I have a small atom setup (original compute stick) and it would be nice to see improvements performance wise as the latency fixes don’t all work due to the weak (but still better than the raspberry pi 3) chipset.

I wonder what type of meager hardware setup would allow for the ideal settings to reduce input without video_frame_delay=14 due to the fact that it is the most expensive of the lag reduction techniques

The closest (laziest?) way I’ve found to measure performance for systems is by searching for CPUs or devices on Geekbench and comparing Single Core scores. Good enough to get you in the ballpark

A Raspberry Pi 3 is just under 500. An NES/SNES Mini (AllWinner R16) is just above 300.

Today’s high-end CPUs come in around 5000.

Let’s say a score of 5000 is enough to get you able to use a Frame Delay of 14 on non-complex emulators (Snes9x? Sure. Higan? No. Moore’s Law failed us. Sorry). That leaves ~2ms of time to actually DO the computation for emulation.

In theory, a Frame Delay of 12 would then mean ~4ms of time (double!) for computation.

A Frame Delay of 8 gives double the time again. In other words, 1/4 the CPU power needed compared to the highest delay setting. 5000 / 4 = ~1250 Single Core Score in Geekbench

Of course, there’s lots of overhead and other things the OS needs to do, so this won’t be perfectly linear. Not to mention differing CPU needs for each emulator or console. There’s no magic static setting that anyone can point to.

@rafan - I have mentioned that bit about waitable swap chains to aliaspider. He might implement it either through hooking up the swap chain setting or the GPU hard sync setting, whichever of the two is best.

All these measurements are fantastic but they lack the varying human factor.
I owned an NES back in the early 90s and played Sega over friends houses, but we quickly moved to a Pentium 133MHz and never had another TV console till I acquired the PS2 in 2002 and back to PC.

So from the late 90s till a year a go, it was all emulation for me and I did not own a single “retro” console till 2017. Lately I amassed several retro consoles (Nes,Snes,Sega,N64,Ps1,Ps2) and everdrives and a Sony CRT to compare with emulation.
LAG, what once we a non-issue and something I was completely oblivious too for almost 20 years became something I very much notice and it expresses itself in how well I play these games.

BUT, before I played these games on the real hardware with zero lag I was very adapted to these games on the emulator that I played them like a ‘pro’ without being bothered by the lag, or dying.
After spending some time playing these games on the hardware and CRT, going back to Emulation lag is very noticeable indeed and clearly diminishes performance, it almost feels like playing inside a dream… if you know what I mean.

BUT (again) and here’s my point, the BRAIN can adapt to the lag of emulation and compensate for the delay and make you play like a pro again completely removing the “in a dream” sensation when going back from hardware+crt to emulation.
In my experience, this adaptation to lag takes maybe a day or a couple of hours of attentive playing.

Of course zero loop lag from an emulator will be ideal, but under 100mS is totally adaptable hence playable thanks to our easily fooled brains. :slightly_smiling_face:

There are some games that you can’t really adapt to the lag if they depend on very fast reactions like Punchout. Beating Mike Tyson with high input lag is near impossible. Most games 100ms can be adapted to though for the far majority of games.

Only in very specific situations is this true. But in an action game the brain can’t see the future and know when an enemy is going to shoot at you and then have you react before it even happens. There’s no way in hell you’d be able to get through a fast past game like Punch Out, or the super fast perfect platforming of a game like Gimmick with input lag.

Only in very specific situations is this true.
There’s no way in hell you’d be able to get through a fast paced game like Punch Out

You are right but games that do not require super reflexes like platformers are definitely playable.

Modern games on the PS4, XboxOne, PC, etc… have much more lag than the CRT generation ever had. The games obviously are not the same and require less reflex timing than old nes,snes,sega games, and the developers very well aware of that.
Modern platforms or the “new wave of indy 8bit games” are less challenging than games of the crt generation especially NES, and also took into consideration the inherent lag of PC gaming on an LCD (or phone, PI, miniPC, whatever…).

Emulation will always have lag, that is the nature of it. Super reflex games that require zero lag like in competitions or speed runs, are still played on the real hardware and CRT.

Try to play guitar with 100mS delay, , , , you’ll be kicked out of the band in no time.
The auditory system in the brain is several magnitudes more sensitive to time shift than the visual system, lucky us gamers. :smile: