Let's talk about Latency (and Game Mode)

On my Windows machine I am using RTSS scanline sync and I am pleased with the result.

Basically what you do is turn vsync off in retroarch and in your GPU control panel, then RTSS injects variable amounts of delay into the OGL/Vulkan/DX pipeline to keep the horizontal tear line in a fixed position at the top/bottom of the frame. Actually it’s hidden completely in the blanking area so you don’t even see it, and you can control the position of it manually as well with hotkeys while you’re playing the game.

The catch is the game has to have consistent render times between frames otherwise you’ll briefly see the tear line appearing before RTSS hides it away again.

I use it with Retroarch NEStopia and standalone Dolphin, GTX1070 Windows 7. Also use it for other PC games.

1 Like

What is the benefit of RTSS over regular Vsync? In theory, doesn’t it increase latency, as it waits for the frame? Just like Vsync would do.

From the reading it sounds like VSync, but if the game cannot deliver the frames, then it stops being Vsync and instead shows the tearing. So no slowdown caused by syncing. I guess that is the benefit, right?

1 Like

In theory I’d imagine a latency-optimised implemenation of vsync with no prerender queue would only add up to a maximum 1 frame of latency as that is the maximum time to wait until the current frame’s refresh interval is finished and it can be swapped with the next frame.

Anyway, scanline sync feels to me identical to vsyncless in terms of latency. So if you can’t feel any diff between vsync on or off, then you don’t need to bother with scanline sync.

I suppose the amount of latency added by scanline sync might be affected by how much delay scanline sync has to add to shift it to the edge and out of sight. For instance if your tear line is already near to the edge it might only need to delay it by 2ms to move it that final bit and thus out of sight.

I’ve seen the option for “GPU hard sync” in retroarch and will have a play around with that later tonight as it sounds like it might potentially be as good as scanline sync.

NVidia driver also has something called “fast vsync” which might work a similar way but I think it only works when the game is rendering fps above the refresh rate which obviously isn’t the case with emulation.

2 Likes

For reference here are the RTSS settings I’m using (it’s the version that comes packaged with MSI Afterburner)

Alright had a play with this. I found both vsync and hard gpu sync must be enabled for it to work, and yes, it definitely feels to me like a big improvement in latency vs hard sync disabled. I have sync frames set to 0.

Subjectively I would say scanline sync still feels “just noticeable difference” more responsive than hard sync. But it’s very close, and hard sync has the advantage of not suffering the possibility of tearing if render times become inconsistent. In that scenario I would expect hard sync might drop a frame?

In any case for now I’ll switch to hard sync and see if there are any other issues with it. I am only using NEStopia core at the moment.

edit: after further playtesting with SMB1 and Zelda2 I’m feeling like hard sync and scanline sync are pretty much neck and neck.

edit2: I noticed there’s a retroarch setting under power management called “frame rest” which says it’s meant to be used with scanline sync, but I don’t really understand it - says it reduces vsync CPU usage. But scanline sync is meant to be used with vsync disabled .

2 Likes

@pneumatic I don’t think RTSS or other tools can reduce the RetroArch latency any lower. Have you measured your results, for example with a high-speed phone camera? Differences of 1 frame of latency are too small to be consistently measured by the naked eye.

I have made some measurements (without “Frame delay”) a few years ago about what I believe is the best case scenario for RetroArch. You can see the measurements here: An input lag investigation

Other than the “Frame delay” feature, one thing that should, in theory, reduce input lag slightly further is enabling VRR.

3 Likes

I wish that I had the tools to do this, but all I can do right now is jump around in mario and “feel”, which isn’t very accurate.

I have gsync on my pc monitor but I’m using a HTPC TV for retro games and that only supports 60hz fixed.

My EVGA 1070 just died yesterday, puff of smoke and PSU auto shutdown. I’m still salty about it but at least it didn’t destroy other parts. Put an old R9 380 in there and it feels like the same amount of latency.

The TV itself doesn’t have much internal latency (Samsung 768p plasma) and I’m happy with the feel even without game mode it feels quite responsive to me as long as I run hard sync or scanline sync.

I will have to read the thread you linked and get up to speed before I can comment further.

If I had to guess I’d say you’re right and that Retroarch latency with hard sync is already maximally optimised.

Ouch! Sorry about your GPU.

You don’t need to read the whole thread. It’s huge, and earlier posts are full of outdated information.

Many modern smartphones have slow-motion modes in their cameras. As long as they go as fast as 240fps it should be enough to measure latency between the button press and the action on screen. Try to see if your phone camera has a slow-motion mode. Don’t forget that Super Mario Bros has 2 frames (I think) of internal input lag though. EDIT: It’s only 1 frame of lag for SMB after all.

Regardless, if I were you I would play in game mode. And if you use Vulkan or other drivers, you should set Max Swapchain Images to 2.

1 Like

Hey it does, I didn’t know that. But it’s only 120fps…so I guess that means it will be inaccurate by around 8ms or something? Well it might still be useful… I’ll have to play around with it - thanks for the tip.

I would but the colours in that mode are garish and it feels quite responsive in Movie mode. If I recall the lower end Samsungs used a cheaper Mediatek processor which for whatever reason has less lag than the higher end models. I think the windows mouse cursor has something like hard vsync and it doesnt feel rubberbandy in Movie mode.

1 Like

Yes, with a 120fps recording and a careful frame-by-frame analysis you can get a pretty good ballpark for the input lag I think.

1 Like

I believe nyquist is involved here :smiley:

2 Likes

The results are in!

Keep in mind this is with my old R9 380 which might be adding latency due to it being slower with the CRT shader - will retest next week when 3060 arrives. The wireless Wavebird may be adding some too, but these are the conditions I play under so I want to know “buttons to pixels”

I start counting from 0 when the button is fully depressed. i.e first frame of button depressed = frame 0. Repeated this 6 times and took the average.

Vsyncless
9F, 9F, 8F, 8F, 7F, 7F = 66ms

Scanline sync
11F, 11F, 10F, 10.5F*, 10F, 11F = 88ms
* because image contained a blend of 2 frames due to plasma subfields

Hard vsync
14F, 13F, 12F, 12F, 13F, 12F = 105ms

Normal vsync (off in Radeon Settings, on in Retroarch)
14F, 15F, 15F, 15F, 16F, 16F = 126ms

Trivia: the little white specular reflection on the A-button was critical to knowing when the button was actually depressed, otherwise it was too ambiguous

3 Likes

Interesting that RTSS is giving you lower input lag than Hard Vsync. Not sure why that happens.

Why not test Vulkan with Max Swapchain images set to 2? That should give you the lowest possible input lag without the performance penalty of Hard Vsync.

I’ve never tested OpenGL much because I never had reason to use it, so I don’t know much about it’s lag.

Here are my results for Vulkan with max swapchain = 2.

Vsyncless
8F, 7.5F, 8F, 9F, 7F, 7F = 64ms

Scanline sync
11F, 10.5F, 10F, 11F, 10.5F, 11F = 88ms

Normal vync (off in Radeon Settings, on in Retroarch)
10F, 11F, 10F, 11F, 11F, 10F = 87ms

Normal vsync but with “Radeon Antilag” enbled in Radeon Settings
11F, 10.5F, 10F, 10F, 11F, 10F = 86ms

Looks like Vulkan vsync is the way to go!

I’ll have to retest with the 3060 next week but this is looking pretty promising. 88ms would mean my TV has 55ms of latency in Movie mode after deducting the 2 frames of SMB internal lag, which seems about right for a low end Samsung TV of that era (2013). I know I know, I should use game mode… but I hate it, the colours are just not to my liking. I’d definitely use it for competitive gaming though.

The main takeaway for me personally is that as long as I have a button-to-pixels no more than 100ms, I’m generally satisfied. I can still feel some latency at 100ms, but it feels to me ok for casual play. Adding an extra frame or two top of that is where I start to feel it intruding into the experience even for casual games. If the game doesn’t involve any timing challenges then I might tolerate more if it was the only way to get a smooth framerate.

2 Likes

Yeah, this is the important thing. As long as you get it below your personal threshold, you’re all set.

I’d be curious to see your game mode results. Also, don’t forget about runahead, since it’ll shave off entire frames at a time :wink:

2 Likes

I have done the same test as proposed here: https://www.youtube.com/watch?v=tnD56BI-ZGA

But with RetroArch, instead of a MiSTer. I’ve gotten 3 frames of lag with a desktop PC and regular TV and 3~4 frames with my notebook. I used:

  • Video driver: Vulkan
  • Max Swapchain Images: 3
  • Input Poll Type: Early
  • Game mode: Enabled
  • A wired (not wireless + cable) USB controller

Everything else was default. If I set the max swapchain images to 2, I reduce the input lag by a whole frame on both devices.

3 Likes

That’s not a good test to be done with the slow-motion camera method because that specific “Lag Test” menu has 3 whole frames of internal input lag (assuming it’s the SNES version). You can verify that with the frame-step feature by pressing K to advance frame-by-frame and see how many frames the game takes to react to input.

A better test, which is what I usually use, is the “Horiz/Vert Stripes”, which has 0 frames of internal lag. Of course you can still use the “Lag Test” one, but you need to be careful to always subtract the internal game lag to your measurements.

@pneumatic Speaking of internal lag, I’m sorry, but I believe Super Mario Bros on the NES has only 1 frame of internal lag, and not 2. My bad.

Anyway, good to know that using Vulkan has reduced lag for you!

3 Likes

Just did 2 runs of measurements in Game mode and got the same result as without Game mode. I know that can’t be correct as I recall using Game mode back in 2014 for GTA IV online matches and I could definitely feel the improvement back then. Checked all picture processing was turned off. The only thing is that my GTA IV days may have been before I changed the model number in the service menu to unlock extra colour controls, so maybe doing that has somehow neutered the Game mode. Another possibility is the Xbox 360 may have been outputting a different signal type like 4:4:4 vs 4:2:0 vs RGB. The TV processor downscales internally to 4:2:2 even when signal is 4:4:4, and I can’t use RGB cause colours are bad in that mode too. Anyway I lost motivation to troubleshoot this further.

2 Likes

Finally got around to measuring the 3060 - the results were ok for OpenGL, basically the same as the R9 380, but with Vulkan it has a couple of issues:

1. I couldn’t get it to disable vsync so couldn’t do any testing of RTSS scanline sync
2. The lowest latency with vsync was 20ms higher than on the R9 380

Seems like AMD drivers work better with Vulkan?

Overall I am not that impressed with 3060 as I’m having other issues with it in relation to shader compilation stutters in Dolphin which were not present on the 1070.

Probably the only notable thing in the above table is the NVCP OpenGL triple buffering setting actually lowered the latency when using vsync + hard GPU sync. However I know that will affect frame pacing so I’d still slightly prefer scanline sync and leave triple buffering on.

5 Likes

That’s a nice set of measurements! I’m wondering, have you subtracted the game’s internal input lag? If not, what game are you using?

Vulkan should be the go-to for reducing input lag as much as possible. It’s a bit strange that Nvidia gives you such high latency values. Are you sure you set max swapchain images to 2?

Depending on the driver and OS, the max swapchain images setting might not be honored. If you open RetroArch through the command-line you can check the log to see what is the effective number of swapchain images in use. It should be 2 for the minimum lag possible.

1 Like