An input lag investigation

That sucks. Sticking to GL hard sync then.

I’d like to share some input latency tests I did recently, with the focus on which video renderer shows better results: D3D11 vs GL. Unfortunately I can’t test Vulkan…

I used a 120FPS smartphone camera (8,3ms/frame… yeah the results are not super accurate) and took the average value of 5-10 tries (jumps) per test. The test game was Little Samson with the Nestopia core and Run-Ahead enabled. V-Sync is ON for all tests but the last one (#5) resulting in constant frame-pacing and perfectly smooth scrolling.

#1 GL = 117ms

#2 GL + Hard GPU Sync = 67ms

#3 D3D11 - mode A = 67ms

  • This mode means: Windows 7 - Aero is disabled
  • there is a permanent tear-line in a fixed position at the top of the screen, slightly wobbling

#4 D3D11 - mode B = 100ms

  • this mode is enabled by pressing ALT+ENTER after the game runs. Pressing ALT+ENTER again switches back to mode A
  • this mode is always active if you have Aero enabled (with Enable Desktop Composition = ON)

#5 D3D11 + RTSS Sync = 67ms

  • for this test I disabled RetroArch’s V-Sync Option and used the Scanline Sync function of RTSS
  • like in test #3 there is a permanent tear-line, but it can be moved and hidden via RTSS
  • downside: fast-forward can’t be used while Scanline Sync is ON, and you will have to start RTSS manually each time you start Retroarch

Conclusion

#2, #3 and #5 have the same latency of 67ms. #4 is 2 frames behind and #1 is the worst with 3 frames behind.

Performance Test

I was also interested to see how performance intensive the “low latency modes” #2, #3 and #5 are. For that, I limited my max CPU clock speed to 80% (via windows energy options) which lead to 2.1Ghz. I used Beetle PSX HW for this test and “fast forward” shows that it can run at 79FPS with this setup. Now the results:

#2 and #3 drop to 40FPS while #5 can keep the 60FPS. This shows that #2 and #3 need quite some CPU overhead to properly run at the 60 FPS target. On the other hand, RTSS can keep the 60FPS even with a small “fast forward” overhead of 63FPS and it doesn’t drop to 40/30/20FPS when the CPU can’t maintain fullspeed. So this would be the best solution for weaker systems that can barely run a core/game at fullspeed and don’t want to miss out on some lower input lag.

Specs

  • RetroArch 1.8.5
  • Laptop: Lenovo N581
  • OS: Windows 7 - 64-bit
  • Screen Resolution: 1366x768
  • CPU: Intel Core i5-3230M
  • IGP: Intel HD Graphics 4000
  • RAM: 4GB, DDR3-1600
2 Likes

Thank you for those tests Ortega, exactly what I was searching for :slight_smile: . Do you know id mode B is the default setup than you get under Windows 10? (with desktop composition always on) Have you tested Frame Delay to get additionnal milliseconds? Also, do you confirm that turning off Run-Ahead increase the lag by multiples of 16ms?

I’m not familar with Windows 10 and whether there is a similar input lag issue as with Windows 7. Run-Ahead should work as intended, as you can confirm in quick Retroarch test with the method:

pause game > press & hold action/jump button > press frame advance until the action is displayed

That’s a good idea to test Frame Delay in addition, so I just did that with the above mentioned “low input lag modes” #2, #3 and #5. The requirement for an acceptable frame delay value is absolutely no stutter/audio crackling.

First test is with Nestopia, so a low CPU-intensive core. Run-Ahead = 1 and Second Instance = ON. The results show barely a difference:

  • #2 GL + Hard GPU Sync = 12ms
  • #3 D3D11 - mode A = 13ms
  • #5 D3D11 + RTSS Sync = 13ms

There is a noticable difference when using a much higher CPU intensive core, Beetle PSX HW (in software render mode). I ran the test in a game scene where the fast-forward overhead was 88FPS:

  • #2 GL + Hard GPU Sync = 2ms
  • #3 D3D11 - mode A = 3ms
  • #5 D3D11 + RTSS Sync = 5ms

So combined with Frame Delay, the RTTS scanline sync method can offer the lowerst average input lag on my system.

3 Likes

what is this story of direct x 11 mode A and mode B I play under windows 7 with systematically aero deactivated, is that what you call mode A?

The 2 fullscreen modes that I refer to as “mode A” and “mode B” are only available on Windows 7 with Aero disabled and when using the D3D10/11 video renderer in RetroArch.

“Mode A” is the default fullscreen mode, with low input latency but with a visible tear-line at the top of the screen. Now if you press ALT+ENTER on your keyboard while a game is running, it switches to “mode B”. This mode has higher input latency (~2 more frames) but has no visible tear-line.

I assume that pressing ALT+ENTER actually switches from “Windowed Fullscreen Mode” to “Exclusive Fullscreen Mode” which has a differenct v-sync behavior. Though when I check “Settings > Video > Fullscreen mode” the option “Windowed Fullscreen Mode” is always ON… changing it to OFF also doesn’t make a difference.

Alt+Enter is a de-facto standard shortcut in many applications to toggle between windowed and fullscreen mode. Works in many games too.

Let’s necro this… :skull:

Got a new shiny phone so I can do high framerate recording.
My PC is an old i5-3570k@4GHZ with an nvidia GTX770 on win7 x64.
Monitor is LG 32gk850g.
Xbox one Gamepad in USB.

RA is using Exact Sync, Gsync is On.
Vulkan is using max swapchain 2 (supposed to be the fastest).
240fps recording (1 frame = 4.167ms)


FCEUMM runahead 1 in smb

glcore hard sync 7 5 6 5 8 8 9 7 6 7

6.8 = 28ms

no hard sync 7 7 5 7 7 9 6 7 7 7

6.9 = 29ms

vulkan 9 7 4 9 7 8 10 6 7 6

7.3 = 30ms

vulkan no shader 7 5 5 8 5 8 7 8 6 9

6.8 = 28ms

So, I guess the explanation for the lack of difference is I’m using gsync.
It’s recorded on the bottom of my LG monitor tested for 6.4ms lag (so worst lag case), for the xbone gamepad in usb I see 6.9ms from a test.

That adds up nicely: 6.9 + 16.67 + 6.4 = 30ms average if you want to think like that. :thinking:


And I tried to check the Mame main core in RA, to see if it has 1 extra frame of lag or not vs stand-alone.
Test is Unibios boot settings menu (supposed to react in 1 frame in MAME), bottom of the monitor.

windowed (aero enabled):

MAME 6 10 10 10 7 12 10 10 10

9.4 = 39ms

RA 13 13 13 11 13 10 12 11 11 10

11.7 = 49ms

fullscreen:

MAME 7 7 7 8 10 7 4 8 9 5 10 8

7.5 = 31ms

RA (hard sync 0) 12 8 10 7 10 8 8 11 8 7

8.9 = 37ms

gl no hard sync 10 11 11 11 9 8 10 10 10 9

9.9 = 41ms

vulkan 10 10 11 10 11 11 10 8 10 8

9.9 = 41ms

So, remember it’s with G-sync too for both RA-MAME and stand-alone (except for the windowed tests), lowlatency enabled for both in mame.ini.
FCEUMM didn’t show a difference for hard sync 0 or nothing, so I would ignore the slight advantage for it enabled here.
+10ms of lag for RA it is then, something that could be improved in theory.

2 Likes

A bit more testing with FCEUMM and smb still. (run ahead 1 frame)

gsync frame delay 12 glcore 6 6 7 10 9 9 7 4 8 8

7.4 = 31ms

So, frame delay doesn’t seem to do much with gsync…


Now Testing with gsync turned off, 120hz vsync swap interval 2.

gsync off glcore (hard sync off) 20 22 21 19 20 19 20 23 18 19

20.1 = 84ms

gsync off glcore (hard sync 0 frame) 11 9 8 9 9 9 9 10 7 6

8.7 = 36ms

gsync off vulkan swapchain1 10 12 9 11 11 9 7 10 7 9

9.5 = 40ms

gsync off vulkan swapchain2 9 11 8 7 8 11 10 7 6 9

8.6 = 36ms

gsync off vulkan swapchain3 12 9 12 12 6 9 9 5 10 9

9.3 = 39ms

  • outside of gsync, hard sync 0 with gl is working as intended
  • vulkan is fine, 4ms is within a margin of error and it’s not using as much cpu power as gl hard sync does
  • vulkan swapchain images setting doesn’t seem to do much (with nvidia drivers?)
  • gsync seems faster than other standard sync methods, unless you go and add frame delay on top of standard vsync, but it gives a small gain for something a bit unreliable (can be game dependant, cpu intensive)
4 Likes

Did some more tests.

In short, d3d11 is similar to glcore and vulkan with gsync, around 30ms.

More surprising, in windowed mode, bottom of the screen, it’s on win7 aero is enabled, gsync enabled for fullscreen mode only, same fceumm smb run ahead 1 frame (to get a 1 frame minimal internal lag):

windowed glcore default 11 10 9 11 9 8 10 9 12 11

10 = 42ms

windowed hardsync 0 frame (in case it does anything for nvidia drivers) 7 12 8 8 8 9 10 8 11 11

9.2 = 38ms

windowed vulkan 9 12 7 9 9 10 9 10 9 7

9.1 = 38ms

Around 40ms while fullscreen without gsync was giving 84ms (probably triple buffering with default nvidia drivers settings)…
Aero isn’t so bad in the end, most probably because my default monitor refresh is 120hz (or gsync helping with screen composition).

3 Likes

So, it means that with my Gsync monitor, I don’t need Frame Delay ?

Yes, you can drop it back to 0.

1 Like

Something must have changed because Duke Nukem 3D on Beetle Saturn + gsync monitor felt way more laggy to me than the same game on my real Saturn + CRT. Although i am still on RA version 1.9.6.

I can’t measure the lag on the real machine. All i can say is that it didn’t feel as sluggish. I could feel the difference is what i’m saying and it was definently not a placebo.

I even tested the beetle emulator on a CRT monitor and it wasn’t much different than the gsync monitor, so the extra lag doesn’t seem to be from the screen.

I think the emulator is adding about 3 frames of lag on top of everything. Im basing this on Sonic Jam and EWJ2 vs the same games on Genesis with GenesisplusGX. All games have 1 frame of lag on GenesisplusGX but 3 or 4 frames on Beetle Saturn.

1 Like

Im not arguing the Saturn might have higher input lag than the Genesis. But the Saturn emulator definently has higher input lag than the real console.

Also, the input lag in the Mupen core seems to be on par with the real console. Some games have up to 4 frames of lag but others like Smash Bros have zero lag, which means the enulator doesn’t add any additional lag.

But i haven’t found a Saturn game that has less than 3 frames of lag on Beetle Saturn, which makes me think it adds that amount in all games no matter what.

This is exactly what I was thinking earlier today, I’m not sure the console has this amount of input lag. It would be a definitive way to tell the difference.

I am confused now, I thought vulkan was supposed to have the lowest input lag of all the video drivers? Your tests show GL is more responsive and DX11 is as well?

Weird.

it’s all very driver-dependent. vulkan and d3d11 are usually similar.

If you have have Vsync off in RetroArch with Vulkan, you don’t get proper frame pacing, so you need to leave it on. I don’t think there’s any benefit turning it off for other video drivers either. On Nvidia cards people recommend setting Vsync on globally in conjunction with Gsync, so that games never go above your refresh rate and tear (I usually leave in game Vsync on too unless that particular game has some issue with it on). However, doing that will disable fast forwarding in the D3D drivers in RA, so you can set the retroarch.exe profile to use “Application Controlled” to get around that. I’ve never seen recommendations for AMD hardware VRR settings, so not sure what’s best there.

Borderless window is usually bad for Gsync support too, since Nvidia’s windowed mode Gsync often doesn’t work right or at all. Exclusive or flip model fullscreen modes are the best with Nvidia hardware.

V-Sync is required in retroarch for proper VRR.