An input lag investigation

Terry · 11 April 2018 12:07

Yes, if you can’t use OpenGL for hard_sync, the RivaTuner its better than nothing.

I think i’m pretty lagless with my setup, using hard_sync=0, look-ahead=1, 2 or 3(depending the game) and 120hz CRT pc monitor (native game resolution via crt_emudrivers) with black frame insertion, but still managed to add a frame delay of 3 (when using 120hz i can use less than half frame delay you could normally use on 60hz, and of course, using look ahead new feature, also impacts on this number further) with my modest AMD a10-7850k apu.

I know its kind of overkill but I think that the frame delay could help with any posible gamepad lag, bluetooth mostly since when wired I’ve set it to 1000hz pull rate. but maybe i’m wrong and should not add frame delay?

Dwedit · 11 April 2018 01:37

Maybe this youtube video from Microsoft has a little bit of information about DX12 in Windows 10, present modes, and latency. Maybe seek to 12 minutes in if you don’t want to watch the whole thing.

Sir_Kevith · 11 April 2018 16:43

I tried a little experiment last night to try and get low lag vsync without Rivatuner by turning off vsync and looking at the behavior of the tear line. If it travels up then your framerate is higher than your refresh rate and with vsync on you will get more lag. So I decided to adjust a combination of settings to try and get that tear line to slowly travel down so by getting the framerate to just below the refresh rate. Once that tearline was slowly traveling downwards, I enable vsync and record.

I enabled audio_sync first as this seems the best way to control the framerate and then adjusted the frame delay and audio buffer to try and lower the performance in as little increments as possible. I would go back and forth adjusting these settings until the tear line was very slowly traveling downwards. One thing I noticed is that controlling the framerate in this way isn’t that reliable. The tear line jumps around too much even if it’s slowly travelling down, it didn’t look nearly controlled as by capping the framerate with Rivatuner.

When I measured the lag, the results were mixed as I expected due to the jumpy tear line. I used Castlevania Bloodlines as a test because I could test it on real hardware as well. Hitting the attack button has 1 frame of lag on real hardware and using this method I would have anywhere from 1-3 frames of lag even if the game looked perfectly smooth so the results are better than nothing but Rivatuner works better overall for this.

Anyways, if anyone knows if there is a more accurate way to control the framerate within Retroarch other than audio_sync I would be willing to experiment again. I didn’t have much success with this method on a SNES core either.

Video of the result: https://photos.app.goo.gl/0R93T7O60579xz8l2

Video of real hardware: https://photos.app.goo.gl/stEwmo7dmZnZxu573

Dwedit · 11 April 2018 16:57

Which driver? D3D (9, 10, 11, 12?) or OpenGL?

Sir_Kevith · 11 April 2018 17:02

Sorry, using D3D9 since my card doesn’t support OpenGL in Windows 10 very well. I can’t get exclusive fullscreen to work with my setup with anything other than D3D9 and I needed to see the tear line.

Terry · 12 April 2018 06:19

Try setting your refresh rate the closest to 60hz, I think the sound sync works best when the refresh rate its 60. What tool do you use to set your resolution to the pvm? did you tried crt_emudriver or CRU?

rafan · 16 April 2018 09:47

Hi,

Since this is really the investigation thread for these kind of issues I thought to flag the high (internal) latency of PPSSPP here also.

Is it known that PPSPP has 6 frames of internal latency? I’m just wondering whether this is a known fact, and whether somebody has an explanation for it?

https://forums.libretro.com/t/ppsspp-6-frames-of-internal-latency/

Ryunam · 17 April 2018 08:29

Just a heads-up to all the readers of this thread: I have submitted a new pull request yesterday consisting of a new “Latency” menu placed under the Settings tab, where all settings affecting video, audio and input latency are grouped all together.

This was done in an effort to make it simpler to strike the right balance between all the different configurations on a per-core basis, reducing the amount of menu navigation required.

If you download the latest nightly, you will be able to find all the familiar settings under this new category. Here’s what it contains so far:

Max Swapchain Images
Hard GPU Sync
Hard GPU Sync Frames
Frame Delay
Audio Latency (ms)
Poll Type Behavior
Run-Ahead to Reduce Latency
Number of Frames to Run Ahead
Runahead Use Second Instance

Here is a picture of the end result.

rafan · 17 April 2018 10:14

Wow, that’s an excellent improvement, very convenient. Thanks

On a technical note Is there any way to show which of the options in the latency menu are not doing anything for the current / active core?

What I mean is this:

Runahead does not work for cores that do not have save state ability, like PX68K, BlueMSX and others.
Hard GPU sync and max swapchain images do nothing when the D3D11 driver is active

To prevent any needless tweaking / testing with these options, it would be very nice if that could be made visible to the user.

Tromzy · 17 April 2018 10:19

Awesome idea Ryunam ! (And very cool wallpaper BTW !)

Tatsuya79 · 17 April 2018 15:20

For info about each options you have the sub-labels and some additional explanation with the “select” button sometimes.

Dwedit · 17 April 2018 15:20

I actually tried to display status messages at the bottom of the screen indicating if the core did not support savestates, but couldn’t get that working, so i took it out.

Dwedit · 17 April 2018 15:26

So I thought of another possibility for reducing Input Lag. Sometimes, the game does the input polling very early in the frame, but takes a while before it actually does anything with the value.

For example, Super Mario Bros reads the joypad during vblank time, but doesn’t use the joypad data until scanline 33.

So the idea is to put off reading input until the game actually uses the input, rather than the first time the game reads the controller. Then do a RAM poke to change the variable or variables that hold the joypad state.

Not sure if it would matter though, since a NES emulator takes microseconds to get from vblank period to scanline 33 anyway.

(Also runahead dwarfs these things anyway)

Ryunam · 17 April 2018 16:26

That is very interesting and I imagine it would likely provide more noticeable results with other systems / cores, where the wait time between vblank time and “input use” might be longer.

fastfade · 20 April 2018 15:01

Big thread didn’t read all the posts yet sorry but wanted to chime in with my little simple test I just did… This is how it looked for me:

Bilinear + Vsync = 5-6 frames (83-100 ms) Integer + Vsync = 4-5 frames (67-83 ms) Integer - Vsync = 2-3 frames (33-50 ms)

Hardware: Acer Chromebook 14 (2016) 8Bitdo SFC30 gamepad (Bluetooth Xinput) Moto X 2015 phone (120fps SlowMo)

Software: Retroarch 1.7.1 32bit (Windows) Wine staging 32bit (Windows 7) GalliumOS 64bit (Xubuntu)

I did a quick test with my 8Bitdo Bluetooth gamepad, recording the screen with my Moto X 2015 phone camera and watching the built-in software framerate counter in RetroArch. I don’t have the best equipment or method here but with this simple test video I could still get some fairly objective results studying the video using VLC frame-by-frame playback.

I did not use hard GPU/CPU sync, did not change any polling options or disable WiFi or anything, this is how it runs on this Chromebook with this wireless gamepad. Of course I turned down all the extra frame delay options in the options to 1 or 0 frames but otherwise no special tweaks… I could not get the native Linux build to work on my system, it didn’t even start up, so I am using the Windows build through Wine instead, which runs perfectly! Input options are set to Raw and Xinput.

Sorry for not editing the video, it’s just an upload of the recording captured on the phone, it’s 4x slow motion… (11 minutes might be slow to watch). If you download the video you can use for example VLC frame-by-frame playback to see the difference easier like I did. Otherwise you’ll have to take my word on those results

It looks like using this wireless controller adds less than one frame of lag, in fact when I compared with the keyboard input I could not really tell any difference from filming it. Basically my tests looked the same using the built-in laptop keyboard buttons or the bluetooth gamepad, I did not expect that!

I’ve tried laggy bluetooth keyboards and mice in the past that had significant input latency compared to wired or dedicated RF dongles. I guess these newer bluetooth devices makes it practically placebo for me here. There’s probably some extra lag using bluetooth but I could not detect it through recording it with my phone at least. It must be less than one frame, maybe it’s up to 8ms extra if it’s polling at 125Hz but I don’t know, would need a faster camera I guess, but either way I’d say the difference is practically insignificant in total input latency. Maybe the built-in laptop keyboard has extra lag too?

Anyway it looks like Vsync is adding 2 frames and Bilinear is adding a third frame on top. Now 5-6 frames of latency sounds like a lot but compared to most modern games it’s normal, for me it’s OK! However it is objectively worse than what we had in the 80’s and 90’s so of course if it can be reduced all the better!

Turning of Bilinear looks better in some games you see the pixel art perfectly but the picture can get a bit noisy and harder on the eyes in some games. The filter is cleaner and more comfortable to look at in general, although a little blurry… Personally for a default setting I would leave this on when playing.

As for the Vsync, having it enabled looks so perfect in those old 2D side scrolling games so I can’t really recommend turning that off either in general, the screen tearing and stuttering can be quite distracting, it’s up to you!

Hmm, this also makes me think, would it be possible to implement some sort of frame interpolation, like doubling 60fps to 120fps like that soap opera effect on modern TVs? There are some software players also able to do this, of course this as any post processing will add some lag, but for those who can run 120Hz it might be cool! However I think it might be more difficult to do well compared to standard video playback…

When using OpenGL it could be nice to have an option to use real triple buffering. This uses two backbuffers which are constantly swapped and overwritten, one backbuffer is being actively drawn to and the other one is locked and ready for display. Compared to double buffering which only has one backbuffer that is locked and can not be overwritten, by design that’s how it’s supposed to work.

With double buffering it means if you have 200fps or 2000fps (each frame takes 0.5ms render time) doesn’t matter. The next successive frame will be 17ms old every time because drawing was started right after the previous frame (when the frontbuffer and backbuffer gets swapped).

With real OpenGL triple buffering (not a triple frame buffer queue) the next frame will be 0.5ms old because it’s constantly overwriting and updating the dual backbuffers every 0.5ms or however fast it can render. The downside is that with lower framerates triple buffering will give less smooth motion because it is not synced with the refreshrate. However with old games running very high framerates the juddering effect is probably negligible!

Double buffering runs like clockwork because there’s only one frontbuffer which is used by the screen and one backbuffer which is used for storing the next frame. So it has to wait for the front/back buffers to swap each frame. Each frame is therefore started exactly 16.7ms apart (at 60fps) which gives as stable motion as possible.

Triple buffering might judder because the frames being displayed on the screen are not 16.7ms old each time (at 60fps). Because it uses dual backbuffer slots only one frame is locked/finished and the other slot is free. It does not have to wait, it keeps overwriting/swapping each of the two backbuffer slots in the background. The screen is being drawn using the third slot, the frontbuffer. Two locked slots, one free, that’s why you need three (triple) to keep rendering constantly without waiting/syncing with the screen.

So with triple buffering you usually get a little motion judder but it does effectively lower the input lag, as long as the possible framerate is significantly higher than the refreshrate. And it also eliminates tearing like standard Vsync.

Triple buffering would be an option somewhere between double buffered Vsync and the new Runahead method. Actually triple buffering is similar to runahead in that it renders multiple frames in the background and finally picks the best one to display while discarding the rest, if I understand it right in that regard, one is working in parallel and the other in serial. Runahead is probably the ultimate method if you have enough parallel processing power for it. But I think triple buffering could be a very good choice to have on slower computers and it would be compatible with all games like standard Vsync.

It’s pretty interesting that the Raspberry Pi gets more lag than Windows, could it be simply because of difference in processing speed? What about an older Pentium III, P4 or Athlon XP, does Windows still have less input lag then?

(I’m rocking a Pentium N3160 on this system I think it’s around 3 times as fast compared to Pi3 but it’s hard to compare x86 to ARM directly)

Dwedit · 20 April 2018 15:39

So if we wanted to add in the low-lag rolling sync feature, we’d need to add in partial screen updates, and delays until specific scanlines? Any other features that would be needed to pull it off?

It would also be next to impossible for the low lag rolling screen update feature to be used at the same time as runahead.

hunterk · 20 April 2018 15:43

I think we might be able to do the rolling sync thing with cores that are already paced with libco. I think ParaLLEl-N64 might be a good test-case, since it uses libco and has the scanline-based Angrylion renderer.

And yeah, I think it and runahead are pretty fundamentally incompatible. Runahead should work fine with the vsync-less 15 khz support, though.

Twinaphex · 20 February 2020 21:21

or we could try the waterbox approach for savestates (https://github.com/TASVideos/BizHawk/tree/master/waterbox) and have runahead work with all cores

Dwedit · 20 April 2018 17:16

What exactly is “waterbox”? I can’t find any documentation (the readme.txt there is threadbare and doesn’t tell me anything), and I can’t find anything on a google search.

Twinaphex · 20 February 2020 21:20

There is not much, they basically explained to me once in #bizhawk on freenode but seems I don’t have the logs.

Basically waterbox saves contain the whole emulator state, not just the game, so it can be used even with emulators that don’t have such functionality.