An input lag investigation

Hello! I made a short comparison video (only with favorable input timing). Details in descriptions and pinned comment.

https://youtu.be/NyrcPyZtfMg

and another to illustrate the concept of favorable and unfavorable input timing

https://youtu.be/YOMIV6PAyR0
4 Likes

Well of course RetroArch is going to be the fastest with Run Ahead = 1.

Question is, how fast is it without it and only GPU sync ON?

1 Like

In theory without Run Ahead it’s +16 frames Xperia, without Frame Delay +12.5, without Hard GPU Sync +32.

Vulkan with “max swapchain images” on 2 should provide the same latency as Hard Sync to 0 in gl without the increased cpu cost (I think it was just under 20%).

It would be interesting to see if that’s working. :smirk:

1 Like

I cant seem to modify “max swapchain”. It is locked to 3 on my android phone. It is better or worse than 2? Thanks!

3 is worse, 1 additional frame of lag.

That sucks. Sticking to GL hard sync then.

I’d like to share some input latency tests I did recently, with the focus on which video renderer shows better results: D3D11 vs GL. Unfortunately I can’t test Vulkan…

I used a 120FPS smartphone camera (8,3ms/frame… yeah the results are not super accurate) and took the average value of 5-10 tries (jumps) per test. The test game was Little Samson with the Nestopia core and Run-Ahead enabled. V-Sync is ON for all tests but the last one (#5) resulting in constant frame-pacing and perfectly smooth scrolling.

#1 GL = 117ms

#2 GL + Hard GPU Sync = 67ms

#3 D3D11 - mode A = 67ms

  • This mode means: Windows 7 - Aero is disabled
  • there is a permanent tear-line in a fixed position at the top of the screen, slightly wobbling

#4 D3D11 - mode B = 100ms

  • this mode is enabled by pressing ALT+ENTER after the game runs. Pressing ALT+ENTER again switches back to mode A
  • this mode is always active if you have Aero enabled (with Enable Desktop Composition = ON)

#5 D3D11 + RTSS Sync = 67ms

  • for this test I disabled RetroArch’s V-Sync Option and used the Scanline Sync function of RTSS
  • like in test #3 there is a permanent tear-line, but it can be moved and hidden via RTSS
  • downside: fast-forward can’t be used while Scanline Sync is ON, and you will have to start RTSS manually each time you start Retroarch

Conclusion

#2, #3 and #5 have the same latency of 67ms. #4 is 2 frames behind and #1 is the worst with 3 frames behind.

Performance Test

I was also interested to see how performance intensive the “low latency modes” #2, #3 and #5 are. For that, I limited my max CPU clock speed to 80% (via windows energy options) which lead to 2.1Ghz. I used Beetle PSX HW for this test and “fast forward” shows that it can run at 79FPS with this setup. Now the results:

#2 and #3 drop to 40FPS while #5 can keep the 60FPS. This shows that #2 and #3 need quite some CPU overhead to properly run at the 60 FPS target. On the other hand, RTSS can keep the 60FPS even with a small “fast forward” overhead of 63FPS and it doesn’t drop to 40/30/20FPS when the CPU can’t maintain fullspeed. So this would be the best solution for weaker systems that can barely run a core/game at fullspeed and don’t want to miss out on some lower input lag.

Specs

  • RetroArch 1.8.5
  • Laptop: Lenovo N581
  • OS: Windows 7 - 64-bit
  • Screen Resolution: 1366x768
  • CPU: Intel Core i5-3230M
  • IGP: Intel HD Graphics 4000
  • RAM: 4GB, DDR3-1600
2 Likes

Thank you for those tests Ortega, exactly what I was searching for :slight_smile: . Do you know id mode B is the default setup than you get under Windows 10? (with desktop composition always on) Have you tested Frame Delay to get additionnal milliseconds? Also, do you confirm that turning off Run-Ahead increase the lag by multiples of 16ms?

I’m not familar with Windows 10 and whether there is a similar input lag issue as with Windows 7. Run-Ahead should work as intended, as you can confirm in quick Retroarch test with the method:

pause game > press & hold action/jump button > press frame advance until the action is displayed

That’s a good idea to test Frame Delay in addition, so I just did that with the above mentioned “low input lag modes” #2, #3 and #5. The requirement for an acceptable frame delay value is absolutely no stutter/audio crackling.

First test is with Nestopia, so a low CPU-intensive core. Run-Ahead = 1 and Second Instance = ON. The results show barely a difference:

  • #2 GL + Hard GPU Sync = 12ms
  • #3 D3D11 - mode A = 13ms
  • #5 D3D11 + RTSS Sync = 13ms

There is a noticable difference when using a much higher CPU intensive core, Beetle PSX HW (in software render mode). I ran the test in a game scene where the fast-forward overhead was 88FPS:

  • #2 GL + Hard GPU Sync = 2ms
  • #3 D3D11 - mode A = 3ms
  • #5 D3D11 + RTSS Sync = 5ms

So combined with Frame Delay, the RTTS scanline sync method can offer the lowerst average input lag on my system.

3 Likes

what is this story of direct x 11 mode A and mode B I play under windows 7 with systematically aero deactivated, is that what you call mode A?

The 2 fullscreen modes that I refer to as “mode A” and “mode B” are only available on Windows 7 with Aero disabled and when using the D3D10/11 video renderer in RetroArch.

“Mode A” is the default fullscreen mode, with low input latency but with a visible tear-line at the top of the screen. Now if you press ALT+ENTER on your keyboard while a game is running, it switches to “mode B”. This mode has higher input latency (~2 more frames) but has no visible tear-line.

I assume that pressing ALT+ENTER actually switches from “Windowed Fullscreen Mode” to “Exclusive Fullscreen Mode” which has a differenct v-sync behavior. Though when I check “Settings > Video > Fullscreen mode” the option “Windowed Fullscreen Mode” is always ON… changing it to OFF also doesn’t make a difference.

Alt+Enter is a de-facto standard shortcut in many applications to toggle between windowed and fullscreen mode. Works in many games too.

Let’s necro this… :skull:

Got a new shiny phone so I can do high framerate recording.
My PC is an old i5-3570k@4GHZ with an nvidia GTX770 on win7 x64.
Monitor is LG 32gk850g.
Xbox one Gamepad in USB.

RA is using Exact Sync, Gsync is On.
Vulkan is using max swapchain 2 (supposed to be the fastest).
240fps recording (1 frame = 4.167ms)


FCEUMM runahead 1 in smb

glcore hard sync 7 5 6 5 8 8 9 7 6 7

6.8 = 28ms

no hard sync 7 7 5 7 7 9 6 7 7 7

6.9 = 29ms

vulkan 9 7 4 9 7 8 10 6 7 6

7.3 = 30ms

vulkan no shader 7 5 5 8 5 8 7 8 6 9

6.8 = 28ms

So, I guess the explanation for the lack of difference is I’m using gsync.
It’s recorded on the bottom of my LG monitor tested for 6.4ms lag (so worst lag case), for the xbone gamepad in usb I see 6.9ms from a test.

That adds up nicely: 6.9 + 16.67 + 6.4 = 30ms average if you want to think like that. :thinking:


And I tried to check the Mame main core in RA, to see if it has 1 extra frame of lag or not vs stand-alone.
Test is Unibios boot settings menu (supposed to react in 1 frame in MAME), bottom of the monitor.

windowed (aero enabled):

MAME 6 10 10 10 7 12 10 10 10

9.4 = 39ms

RA 13 13 13 11 13 10 12 11 11 10

11.7 = 49ms

fullscreen:

MAME 7 7 7 8 10 7 4 8 9 5 10 8

7.5 = 31ms

RA (hard sync 0) 12 8 10 7 10 8 8 11 8 7

8.9 = 37ms

gl no hard sync 10 11 11 11 9 8 10 10 10 9

9.9 = 41ms

vulkan 10 10 11 10 11 11 10 8 10 8

9.9 = 41ms

So, remember it’s with G-sync too for both RA-MAME and stand-alone (except for the windowed tests), lowlatency enabled for both in mame.ini.
FCEUMM didn’t show a difference for hard sync 0 or nothing, so I would ignore the slight advantage for it enabled here.
+10ms of lag for RA it is then, something that could be improved in theory.

2 Likes

A bit more testing with FCEUMM and smb still. (run ahead 1 frame)

gsync frame delay 12 glcore 6 6 7 10 9 9 7 4 8 8

7.4 = 31ms

So, frame delay doesn’t seem to do much with gsync…


Now Testing with gsync turned off, 120hz vsync swap interval 2.

gsync off glcore (hard sync off) 20 22 21 19 20 19 20 23 18 19

20.1 = 84ms

gsync off glcore (hard sync 0 frame) 11 9 8 9 9 9 9 10 7 6

8.7 = 36ms

gsync off vulkan swapchain1 10 12 9 11 11 9 7 10 7 9

9.5 = 40ms

gsync off vulkan swapchain2 9 11 8 7 8 11 10 7 6 9

8.6 = 36ms

gsync off vulkan swapchain3 12 9 12 12 6 9 9 5 10 9

9.3 = 39ms

  • outside of gsync, hard sync 0 with gl is working as intended
  • vulkan is fine, 4ms is within a margin of error and it’s not using as much cpu power as gl hard sync does
  • vulkan swapchain images setting doesn’t seem to do much (with nvidia drivers?)
  • gsync seems faster than other standard sync methods, unless you go and add frame delay on top of standard vsync, but it gives a small gain for something a bit unreliable (can be game dependant, cpu intensive)
4 Likes

Did some more tests.

In short, d3d11 is similar to glcore and vulkan with gsync, around 30ms.

More surprising, in windowed mode, bottom of the screen, it’s on win7 aero is enabled, gsync enabled for fullscreen mode only, same fceumm smb run ahead 1 frame (to get a 1 frame minimal internal lag):

windowed glcore default 11 10 9 11 9 8 10 9 12 11

10 = 42ms

windowed hardsync 0 frame (in case it does anything for nvidia drivers) 7 12 8 8 8 9 10 8 11 11

9.2 = 38ms

windowed vulkan 9 12 7 9 9 10 9 10 9 7

9.1 = 38ms

Around 40ms while fullscreen without gsync was giving 84ms (probably triple buffering with default nvidia drivers settings)…
Aero isn’t so bad in the end, most probably because my default monitor refresh is 120hz (or gsync helping with screen composition).

3 Likes

So, it means that with my Gsync monitor, I don’t need Frame Delay ?

Yes, you can drop it back to 0.

1 Like

Something must have changed because Duke Nukem 3D on Beetle Saturn + gsync monitor felt way more laggy to me than the same game on my real Saturn + CRT. Although i am still on RA version 1.9.6.

I can’t measure the lag on the real machine. All i can say is that it didn’t feel as sluggish. I could feel the difference is what i’m saying and it was definently not a placebo.

I even tested the beetle emulator on a CRT monitor and it wasn’t much different than the gsync monitor, so the extra lag doesn’t seem to be from the screen.

I think the emulator is adding about 3 frames of lag on top of everything. Im basing this on Sonic Jam and EWJ2 vs the same games on Genesis with GenesisplusGX. All games have 1 frame of lag on GenesisplusGX but 3 or 4 frames on Beetle Saturn.

1 Like