An input lag investigation

Yeah, I bought one a good while ago. It’s really nice in most ways (look, feel, 250 Hz USB polling by default), except for one important aspect: D-pad sensitivity. I noticed immediately when playing Street Fighter II that when rocking your thumb left and right, there’s a very high likelihood of performing an involuntary jump or crouch. This phenomenon is not nearly as likely to occur on my 8bitdo controllers or my original SNES Mini controllers.

I’ve not noticed any performance degradation, but I’ve not run any formal tests on it. I would guess that if there is any measurable performance impact, it would only be seen while any button/stick is being pressed. I guess there might also be some risk that certain devices don’t like being polled at 1kHz. It would be nice if this could become a new default for RetroPie, but it certainly needs thorough testing.

Good info. Thanks.

@Brunnis

I have not noticed any d-pad sensitivity issues so far.

Recently completed Super Castlevania for Snes.

Yeah, it could of course be my sample that is particularly sensitive.

1 Like

This could be the same problem you are describing that Level1online mentions in his review of both the US version and the Japanese Famicom.

I myself have two J versions and don’t have this problem.

So is there any chance waterbox save states could be implemented in the MAME core to eliminate all input lag?

1 Like

Hello! I’m doing some measurements right now (RetroArch, Windows 10, LCD…) with a custom led SNES classic controller + raphnet adapter with my ROM test (NES) and Xperia 960fps HD vidéo. I will give you my conclusions later (french google trad, sorry).

3 Likes

Hello! I made a short comparison video (only with favorable input timing). Details in descriptions and pinned comment.

https://youtu.be/NyrcPyZtfMg

and another to illustrate the concept of favorable and unfavorable input timing

https://youtu.be/YOMIV6PAyR0
4 Likes

Well of course RetroArch is going to be the fastest with Run Ahead = 1.

Question is, how fast is it without it and only GPU sync ON?

1 Like

In theory without Run Ahead it’s +16 frames Xperia, without Frame Delay +12.5, without Hard GPU Sync +32.

Vulkan with “max swapchain images” on 2 should provide the same latency as Hard Sync to 0 in gl without the increased cpu cost (I think it was just under 20%).

It would be interesting to see if that’s working. :smirk:

1 Like

I cant seem to modify “max swapchain”. It is locked to 3 on my android phone. It is better or worse than 2? Thanks!

3 is worse, 1 additional frame of lag.

That sucks. Sticking to GL hard sync then.

I’d like to share some input latency tests I did recently, with the focus on which video renderer shows better results: D3D11 vs GL. Unfortunately I can’t test Vulkan…

I used a 120FPS smartphone camera (8,3ms/frame… yeah the results are not super accurate) and took the average value of 5-10 tries (jumps) per test. The test game was Little Samson with the Nestopia core and Run-Ahead enabled. V-Sync is ON for all tests but the last one (#5) resulting in constant frame-pacing and perfectly smooth scrolling.

#1 GL = 117ms

#2 GL + Hard GPU Sync = 67ms

#3 D3D11 - mode A = 67ms

  • This mode means: Windows 7 - Aero is disabled
  • there is a permanent tear-line in a fixed position at the top of the screen, slightly wobbling

#4 D3D11 - mode B = 100ms

  • this mode is enabled by pressing ALT+ENTER after the game runs. Pressing ALT+ENTER again switches back to mode A
  • this mode is always active if you have Aero enabled (with Enable Desktop Composition = ON)

#5 D3D11 + RTSS Sync = 67ms

  • for this test I disabled RetroArch’s V-Sync Option and used the Scanline Sync function of RTSS
  • like in test #3 there is a permanent tear-line, but it can be moved and hidden via RTSS
  • downside: fast-forward can’t be used while Scanline Sync is ON, and you will have to start RTSS manually each time you start Retroarch

Conclusion

#2, #3 and #5 have the same latency of 67ms. #4 is 2 frames behind and #1 is the worst with 3 frames behind.

Performance Test

I was also interested to see how performance intensive the “low latency modes” #2, #3 and #5 are. For that, I limited my max CPU clock speed to 80% (via windows energy options) which lead to 2.1Ghz. I used Beetle PSX HW for this test and “fast forward” shows that it can run at 79FPS with this setup. Now the results:

#2 and #3 drop to 40FPS while #5 can keep the 60FPS. This shows that #2 and #3 need quite some CPU overhead to properly run at the 60 FPS target. On the other hand, RTSS can keep the 60FPS even with a small “fast forward” overhead of 63FPS and it doesn’t drop to 40/30/20FPS when the CPU can’t maintain fullspeed. So this would be the best solution for weaker systems that can barely run a core/game at fullspeed and don’t want to miss out on some lower input lag.

Specs

  • RetroArch 1.8.5
  • Laptop: Lenovo N581
  • OS: Windows 7 - 64-bit
  • Screen Resolution: 1366x768
  • CPU: Intel Core i5-3230M
  • IGP: Intel HD Graphics 4000
  • RAM: 4GB, DDR3-1600
2 Likes

Thank you for those tests Ortega, exactly what I was searching for :slight_smile: . Do you know id mode B is the default setup than you get under Windows 10? (with desktop composition always on) Have you tested Frame Delay to get additionnal milliseconds? Also, do you confirm that turning off Run-Ahead increase the lag by multiples of 16ms?

I’m not familar with Windows 10 and whether there is a similar input lag issue as with Windows 7. Run-Ahead should work as intended, as you can confirm in quick Retroarch test with the method:

pause game > press & hold action/jump button > press frame advance until the action is displayed

That’s a good idea to test Frame Delay in addition, so I just did that with the above mentioned “low input lag modes” #2, #3 and #5. The requirement for an acceptable frame delay value is absolutely no stutter/audio crackling.

First test is with Nestopia, so a low CPU-intensive core. Run-Ahead = 1 and Second Instance = ON. The results show barely a difference:

  • #2 GL + Hard GPU Sync = 12ms
  • #3 D3D11 - mode A = 13ms
  • #5 D3D11 + RTSS Sync = 13ms

There is a noticable difference when using a much higher CPU intensive core, Beetle PSX HW (in software render mode). I ran the test in a game scene where the fast-forward overhead was 88FPS:

  • #2 GL + Hard GPU Sync = 2ms
  • #3 D3D11 - mode A = 3ms
  • #5 D3D11 + RTSS Sync = 5ms

So combined with Frame Delay, the RTTS scanline sync method can offer the lowerst average input lag on my system.

3 Likes

what is this story of direct x 11 mode A and mode B I play under windows 7 with systematically aero deactivated, is that what you call mode A?

The 2 fullscreen modes that I refer to as “mode A” and “mode B” are only available on Windows 7 with Aero disabled and when using the D3D10/11 video renderer in RetroArch.

“Mode A” is the default fullscreen mode, with low input latency but with a visible tear-line at the top of the screen. Now if you press ALT+ENTER on your keyboard while a game is running, it switches to “mode B”. This mode has higher input latency (~2 more frames) but has no visible tear-line.

I assume that pressing ALT+ENTER actually switches from “Windowed Fullscreen Mode” to “Exclusive Fullscreen Mode” which has a differenct v-sync behavior. Though when I check “Settings > Video > Fullscreen mode” the option “Windowed Fullscreen Mode” is always ON… changing it to OFF also doesn’t make a difference.

Alt+Enter is a de-facto standard shortcut in many applications to toggle between windowed and fullscreen mode. Works in many games too.

Let’s necro this… :skull:

Got a new shiny phone so I can do high framerate recording.
My PC is an old i5-3570k@4GHZ with an nvidia GTX770 on win7 x64.
Monitor is LG 32gk850g.
Xbox one Gamepad in USB.

RA is using Exact Sync, Gsync is On.
Vulkan is using max swapchain 2 (supposed to be the fastest).
240fps recording (1 frame = 4.167ms)


FCEUMM runahead 1 in smb

glcore hard sync 7 5 6 5 8 8 9 7 6 7

6.8 = 28ms

no hard sync 7 7 5 7 7 9 6 7 7 7

6.9 = 29ms

vulkan 9 7 4 9 7 8 10 6 7 6

7.3 = 30ms

vulkan no shader 7 5 5 8 5 8 7 8 6 9

6.8 = 28ms

So, I guess the explanation for the lack of difference is I’m using gsync.
It’s recorded on the bottom of my LG monitor tested for 6.4ms lag (so worst lag case), for the xbone gamepad in usb I see 6.9ms from a test.

That adds up nicely: 6.9 + 16.67 + 6.4 = 30ms average if you want to think like that. :thinking:


And I tried to check the Mame main core in RA, to see if it has 1 extra frame of lag or not vs stand-alone.
Test is Unibios boot settings menu (supposed to react in 1 frame in MAME), bottom of the monitor.

windowed (aero enabled):

MAME 6 10 10 10 7 12 10 10 10

9.4 = 39ms

RA 13 13 13 11 13 10 12 11 11 10

11.7 = 49ms

fullscreen:

MAME 7 7 7 8 10 7 4 8 9 5 10 8

7.5 = 31ms

RA (hard sync 0) 12 8 10 7 10 8 8 11 8 7

8.9 = 37ms

gl no hard sync 10 11 11 11 9 8 10 10 10 9

9.9 = 41ms

vulkan 10 10 11 10 11 11 10 8 10 8

9.9 = 41ms

So, remember it’s with G-sync too for both RA-MAME and stand-alone (except for the windowed tests), lowlatency enabled for both in mame.ini.
FCEUMM didn’t show a difference for hard sync 0 or nothing, so I would ignore the slight advantage for it enabled here.
+10ms of lag for RA it is then, something that could be improved in theory.

2 Likes