An input lag investigation

No wonder why my lag test with Mupen64-Libretro and any plugins on both Project64 and Mupen64plus are different. Doom64 and Quake games are easier to play in Mupen64-Libretro with no vsync. It seems to add 4 frames on these standalone emulators the last time I tested.

[QUOTE=Tatsuya79;47205]If you’re interested in N64 emulation, there is this Mupen + GlideN64 libretro core that’s quite great except for a really high input lag. When the frame buffer is ON we got about 80ms lag (probably more than 100ms in real conditions).

If you feel like checking where the input polling occurs (as you should be the expert for that :slight_smile: ) the github and issue are there: https://github.com/loganmc10/GLupeN64/issues/55[/QUOTE] I’ll have to decline the offer for the time being. I really don’t have much time right now and I’ve already spent a lot of time on investigations like this lately. I haven’t posted in a while, but I have been busy with some stuff that I’m not ready to show just yet: I’m testing the new (and still experimental) VC4 OpenGL driver for Raspberry Pi to determine if it has any positive effects on input lag. While I have been able to do some tests, I have also run into what seems to be a bug in the driver. I’m currently awaiting feedback on that, but unfortunately there’s no timetable.

Ok, hope you’ll have success with that.

https://www.phoronix.com/scan.php?page=news_item&px=VC4-Job-Shuffling

Damn!, thats some big performance boosts!!

Is there a definitive setup for the lowest latency and highest accuracy? I know those two goals are often at opposite ends, but I think it would be great to have a definitive guide for the “Lowest Latency” and another guide for “Highest Accuracy”.

I’ll see if I can write a wiki page on this topic soon, but here’s a quick general guide:[B]

Linux[/B]

Important: Run RetroArch from an X-less terminal. This requires a working DRM video driver, which most modern systems appear to have. See https://github.com/libretro/RetroArch/wiki/KMS-mode

Important #2: You may get performance issues unless you set your CPU to max frequency. This is because the CPU’s power management thinks the CPU is idle enough to be downclocked. In Ubuntu, you can run sudo cpufreq-set -g performance to do this. You may want to put this in a startup script.

In retroarch.cfg set:

video_driver = “gl” video_vsync = true video_threaded = false video_max_swapchain_images = 2 video_frame_delay = See description further down

Windows

In retroarch.cfg set:

video_driver = “gl” video_vsync = true video_threaded = false video_fullscreen = true video_windowed_fullscreen = false video_hard_sync = true video_frame_delay = See description further down

Note on video_max_swapchain_images setting

When using the OpenGL (“gl”) video driver, this setting switches between using two or three buffers for rendering. Without going into details, a setting of 3 allows the emulator to run ahead and prepare the next frame before the current one has even been shown. This improves performance (i.e. makes framerate hiccups less likely), especially on slow hardware, but increases input lag by one whole frame in the general case.

So, the general rule is to use a setting of 2 if the system can handle it. It will shave off one frame of input lag compared to the default setting of 3. Please also note that a setting of 2 forces vsync on.

For OpenGL, this setting only applies under Linux KMS/DRM. Using “video_hard_sync = true” is similar on the Windows side.

Note on video_frame_delay setting

This setting delays the running of the emulator by the specified number of milliseconds. This sounds bad, but actually improves input lag, since it pushes the input polling and rendering closer to when the frame will actually be displayed. For example, setting video_frame_delay = 10 shaves off 10 ms of input lag.

The general rule here is to use the highest value possible that doesn’t cause framerate or audio issues. This is highly system dependent. The faster your system is and the less demanding the emulator is, the higher you can push this setting. On my Core i7-6700K, I can put this setting at 12-13 ms when using snes9x2010, but not nearly as high when using bsnes-mercury-balanced.

Please note that the frame delay value can’t be higher than a frame period (which is 16.67 ms at 60 Hz). I believe the GUI caps this setting to a maximum value of 15.

I would also advice to play with this setting last. It takes a bit of trial and error to find a good setting, and unless you’re willing to make per game settings, you might not be able to find a setting that works well in all situations while still giving a worthwile improvement.

A general note on GPU drivers

Input lag can vary depending on GPU driver, so it’s not possible to guarantee a certain input lag without testing the particular combination of hardware and GPU driver. For example, I have measured different input lag when just upgrading from one GPU driver version to another.

Note on Raspberry Pi [SIZE=2] The Raspberry Pi is sort of a special case. In general, it’s too slow to use anything other than the default value for[/SIZE] video_frame_delay (which is 0). Also, unless you’re using the DispManX driver or OpenGL via the experimental open source driver (VC4), the video_max_swapchain_images setting has no effect.[SIZE=2]

[/SIZE]In retroarch.cfg set:[SIZE=2]

[/SIZE]video_driver = “dispmanx” (“gl” if you require 3D acceleration or shaders. Using the default GPU driver, this will add one frame of input lag compared to the dispmanx driver.) video_vsync = true video_threaded = false video_frame_delay = 0

The settings above are what’s recommended for all of those using the default Raspberry Pi GPU driver. I have some comments coming up regarding the experimental OpenGL driver.

If you’re using DispManX with the default GPU driver or OpenGL with the experimental GPU driver (VC4), you can try setting video_max_swapchain_images = 2. It will reduce input lag by one frame, but framerate will suffer unless you’re running some very lightweight stuff. It seems to work better with DispManX than OpenGL on VC4, probably thanks to lower overhead. If you want to try video_max_swapchain_images = 2 with the DispManX driver, please make sure you’ve rebuilt RetroArch after October 17 2016, since this setting wasn’t enabled on the DispManX driver before that.

Also, I would highly recommend adding the setting force_turbo=1 to /boot/config.txt when using the video_max_swapchain_images = 2 setting. This will force the Raspberry Pi’s CPU to run at max frequency at all times and has been shown to provide much better performance, since the Pi otherwise occasionally tries to downclock to 600 MHz.

Regarding accuracy vs input lag

There’s no real correlation between the two, except that accuracy usually comes with a performance penalty (i.e. frame rendering times increase). This, in turn, makes it less likely that you can use video_max_swapchain_images = 2 and high video_frame_delay numbers. I’d choose the emulator(s) I prefer/need for the games I play and then tweak the above mentioned settings to their optimal values.

Thank you very much for this Brunnis, much appreciated. I do have a couple of questions though. Obviously these settings are for with V-Sync On but what if any settings would be changed if you have V-Sync Off like I do because of having a G-Sync monitor ? The way I understand it (and hopefully understand correctly) is the video frame delay has zero effect when running V-Sync Off. But what about the max swapchain images setting ? Does this have any effect with V-Sync Off ?

Thanks again very much, and I hope you get around to writing up a Wiki page for this soon. I also would like to copy / paste this information over to the Launchbox forums with your permission and all credit to you of course. Or if you like you can just post over there yourself if you prefer, I am sure we would have some appreciative people over there for this information.

Nice summarization.

I would tell to ignore the video_frame_delay setting at first, this is too unreliable and gives less gain. Just for advanced users who want to experiment (and make per game config).

@Brunnis : thanks a lot ! Could you also give us some advice about the pieces of hardware one should pick ? (especially the GPU) Also I read about recent monitor features like “FreeSync” which seems to indicate one could disable VSync without getting tearing. Are those relevant ? Does it enable to switch to simple buffering and get 1 fewer frame of lag ? Thanks again for all your contributions !!

Also, as you said, on the Raspberry, the dispmanx mode is necessary to remove one frame of lag. Is this additional frame of lag something not there when using a regular Linux PC, even when using OpenGL mode and pixel shaders ? Does any of you know if the RPi driver will eventually support KMS ? Apart from Input lag, the RPi is good enough for me as far as emulation is concerned. But input lag is a major issue :slight_smile:

Edit : eheh, I asked the same question as lordmonkus :slight_smile:

[QUOTE=lordmonkus;49147]Thank you very much for this Brunnis, much appreciated. I do have a couple of questions though. Obviously these settings are for with V-Sync On but what if any settings would be changed if you have V-Sync Off like I do because of having a G-Sync monitor ? The way I understand it (and hopefully understand correctly) is the video frame delay has zero effect when running V-Sync Off. But what about the max swapchain images setting ? Does this have any effect with V-Sync Off ?

Thanks again very much, and I hope you get around to writing up a Wiki page for this soon. I also would like to copy / paste this information over to the Launchbox forums with your permission and all credit to you of course. Or if you like you can just post over there yourself if you prefer, I am sure we would have some appreciative people over there for this information.[/QUOTE]

Do you think using a Gsync monitor makes the video frame delay seting useless ? because I have a Gsync monitor too and I always set the video frame delay to at least 7 or higher, resulting in crackling sound sometimes when the value is too high.

Other than that, nice summary Brunnis. :slight_smile: It’s too bad the dispmanx video driver on the Pi can’t use shaders… (By the way, is the dispmanx driver enabled on officiel Pi Lakka nightlies ?)

Thank you so much for the detailed explanation of the settings! I’m looking forward to your information about the experimental GL driver. Can you give any brief information on whether or not it’s worth it?

Also, does Lakka automatically adjust most of these settings that you mentioned to optimal values? (and Linux KMS mode, etc)

Would an ODROID-XU4 have lower latency than the RPi3?

[QUOTE=Brunnis;49145]Note on video_max_swapchain_images setting

When using the OpenGL (“gl”) video driver, this setting switches between using two or three buffers for rendering. Without going into details, a setting of 3 allows the emulator to run ahead and prepare the next frame before the current one has even been shown. This improves performance (i.e. makes framerate hiccups less likely), especially on slow hardware, but increases input lag by one whole frame in the general case.

So, the general rule is to use a setting of 2 if the system can handle it. It will shave off one frame of input lag compared to the default setting of 3.[/QUOTE]

Just two quick questions concerning this setting: would a value of 1 award any benefit in terms of responsiveness? And is this setting even working with the OpenGL driver under Windows?

[QUOTE=Tromzy;49161]Do you think using a Gsync monitor makes the video frame delay seting useless ? because I have a Gsync monitor too and I always set the video frame delay to at least 7 or higher, resulting in crackling sound sometimes when the value is too high.

Other than that, nice summary Brunnis. :slight_smile: It’s too bad the dispmanx video driver on the Pi can’t use shaders… (By the way, is the dispmanx driver enabled on officiel Pi Lakka nightlies ?)[/QUOTE]

It’s not that I think it’s useless. I just seem to recall reading somewhere that it was a setting tied to V-Sync On. I could be entirely wrong though.

Thanks for the detailed post Brunnis.

After some testing i noticed that Mednafen Saturn has significantly more lag than most other cores. I tested Sonic on genesis and the Saturn version in Sonic Jam, using the same settings, and there’s a big difference, i’d say that the saturn version is barely playable even with the best possible settings.

Is this a known problem with the core? I have a real Saturn and it’s not that bad.

Also, is video_max_swapchain_images = “1” better than “2” if the system can handle it?

@GemaH game consoles prior to the 5th gen (saturn, psx, n64) lacked full video frame buffers and hence were able to output frames to the display faster. for most 60fps based games 1 additional frame of latency is to be expected for these newer consoles. mednafen saturn has 2 additional frames of latency, however this extra frame may be unavoidable as it’s probably inherent to the hardware, but i’m not sure, for all i know it could be a design decision in mednafen a la how the snes cores used to be, or an error causing additional frame buffering, etc. from what i know of the saturn’s hardware, which apart from how vdp1 operates is limited, i always assumed that the saturn wasn’t inherently any laggy-er than the psx, but i’m clearly not the expert here and am personally willing to trust mednafen author’s call on this one, though it would def be nice if she or someone else could confirm this for us. i did try to confirm this fact myself but after 30 mins or so of looking thru the source and some tech docs i gave up as it would’ve required more time to investigate than i have available atm. idk, maybe vdp2 adds a frame of latency on top of what vdp1 already adds when it pushes the complete frame to the display, if that’s the case then there’s nothing that can be done about it.

edit: i’m an idiot, it could just as well be software (ie the game), which if that’s the case then there’s def nothing that can be done about it. as i said before many pages back in this thread, to understand where latency is coming from you need to have knowledge of A. how the hardware works, B. how the emulator works (both core and retroarch), and C. how the game works. i completely forgot C, i made the above post under the assumption that the source of 1 extra frame of latency for most games (compared to direct ports on the psx, like rockman 8 for example) is either in the hardware or in mednafen’s implementation, but that’s only true if games made on the saturn follow the same programming patterns as games on psx or other similar hardware, and while i think that’s likely i have no basis for that claim. the standard gfx libs that sega provided (that i’d assume most games used) could be adding a frame of latency over what similar games on the psx had. however, as of right now i’ve yet to come across a game that has a minimum response time of less than 4 frames via frame stepping mednafen saturn, where as on the psx i know for a fact that many 60 fps games were programmed in a way that gave them a minimum response time of 3 frames on both actual hardware and frame stepping mednafen psx.

TLDR: no clue but it’s def worth investigating or at least prodding some more knowledgable ppl on.

also, i just checked sonic jam and wow, you’re right, it’s pretty laggy. sonic 1 has a min response time of 2 frames both on hardware and also via frame stepping in genesis_plus_gx (via retroarch) sonic 1 in sonic jam via frame stepping in mednafen saturn (via retroarch) has a minimum response time of 5 frames! i can’t say for certain, but i believe 1 of the extra frames is from emulated hardware (double buffering), 1 from either emulated hardware (maybe vdp2?) or the emulator (vblank/active ordering or extra buffering? this is the only case that could possibly be eliminated) or software (gfx lib?), and 1 from software. basically no matter what happens the game is always going to be quite a bit laggy-er on the saturn, now add in the extra frames of latency associated with emulating a saturn on pc hardware and you’re getting into no man’s land lag territory.

Hey e-Tank, thanks for the detailed response.

I’m not very knowledgeable of the inner workings of these things, i’m just an end user with a CRT PC monitor, a CRT TV + real consoles, an LCD TV and a lot of free time for testing.

It’s just that after Brunnis latest post the lag problem in Mednafen Saturn only became more noticeable to me. Although, to be fair, Mednafen Saturn is probably the most demanding core. So i can’t use the same options with all cores, i have to make some sacrifices like using GPU_Sync_Frames “1” instead of “0” (i have a i5 4670). The minimum input lag i could manage is almost as bad as it was with SSF. I remember Yabause being a bit better with this, i need to re-test it, it’s just that i can never run it at full speed in software mode for some reason, i always have to use frame skip.

[QUOTE=lordmonkus;49147]Obviously these settings are for with V-Sync On but what if any settings would be changed if you have V-Sync Off like I do because of having a G-Sync monitor ? The way I understand it (and hopefully understand correctly) is the video frame delay has zero effect when running V-Sync Off.[/QUOTE] I actually don’t know enough about how that’s implemented to be able to say for sure. Maybe someone else can chime in on that?

A video_max_swapchain_images setting of 2 will actually force vsync on, as RetroArch will be made to wait until the buffer flip has been performed before generating another frame. I’ve added a note about that in my previous post.

Sure, go ahead and post it over there (and preferably link back to the original post). :slight_smile:

[QUOTE=Tatsuya79;49149]Nice summarization.

I would tell to ignore the video_frame_delay setting at first, this is too unreliable and gives less gain. Just for advanced users who want to experiment (and make per game config).[/QUOTE] Thanks. I’ve added a note about that to my previous post.

You’re welcome! Giving hardware recommendations is pretty hard, actually. One really needs to test both hardware and the GPU driver to know for sure. For minimum input lag (to be able to use video_max_swapchain_images = 2 and a bit of frame delay), I would go for an Intel x86 system and probably use the integrated graphics (mainly to get a low power system that’s durable and easy to cool). Either a Core model or perhaps the upcoming Apollo Lake based systems (some info here). I’m kind of in the process of evaluating hardware for my next build, so I will get back to you on that front. My next step will be testing input lag performance of my Core i7-6700K under Linux KMS/DRM.

I’ve not really read up on the nitty gritty technical details of G-Sync and FreeSync, but I’d assume that they need two framebuffers as well. The difference being that when rendering to a framebuffer has been completed it can be scanned out immediately without waiting for sync. In the case of RetroArch, where we want (need) to output a consistent 60 FPS, I don’t really see the benefit from an input lag perspective. If someone has a different view on this, I’d be happy to hear it.

It probably depends on GPU drivers on Linux as well, but on my Broadwell laptop with integrated Intel graphics, OpenGL has one frame lower input lag than OpenGL (with closed source driver) on the Pi. In other words, it matches the input lag performance of Dispmanx on the Pi. Setting video_max_swapchain_images = 2 on the Broadwell system removes another frame of input lag and makes it even faster than the Pi with Dispmanx.

The reason for Dispmanx being one frame slower than the Broadwell system with OpenGL and video_max_swapchain_images = 2 is that Dispmanx on the Pi is hardcoded to use three framebuffers (i.e. video_max_swapchain_images = 3). RetroArch’s Dispmanx driver doesn’t support the video_max_swapchain_images setting. However, I rewrote the Dispmanx driver to support the setting, but it turns out even the Raspberry Pi 3 is too slow to run every SNES game at full speed with video_max_swapchain_images = 2. So, I decided not to push the updated code.

Yes, KMS is already supported with the experimental open source driver called VC4. I’ll write a separate post about that soon.

As I wrote further up in this post, I’m not sure. If I were to guess, I’d say that it helps even with G-Sync, but I will let someone else confirm that.

I’d prefer to hold on to any preliminary info for now. :slight_smile: Hopefully I’ll have something soon.

No idea what the current situation with Lakka is, sorry.

Well, it’s definitely faster from a processing performance point of view. So it should be better equipped to use video_max_swapchain_images = 2 and maybe frame delay. But it also depends on the GPU driver, so testing would be needed to confirm that.

No, that will not have any effect when using OpenGL. Any setting below 2 is the same as 2 and any setting above 3 is the same as 3.

I’ve been reading some things about Freesync vs Gsync and each apparently has a small latency advantage over the other in certain circumstances. It seems Gsync also has a buffer inside the monitor that can be a source of latency in some circumstances. I think the consensus suggests that Freesync might be preferable for emulation purposes while Gsync is better for typical asynchronous (i.e., game logic is not synced to frame timing) gaming.

Frame delay should still matter on these variable refresh monitors because you’re still trying to hit the ~60 fps target. If audio sync were disabled and it were just running as fast as possible, it wouldn’t matter, but as long as there’s still a set interval on the frames, you’ll want to use frame delay to get as close to that interval as possible.

@Brunnis Thanks again, I have copied and pasted your post here over to a thread on the Launchbox forums while linking back directly to your post and giving you 100% credit for the information.

@hunterk Thanks for your info as well. Good to know that frame delay still is a useful setting even with V-Sync off.

I wish I had the hardware to accurately measure and test with G-Sync. I know it’s one of those things that not many people have in their setups and right now there is little information on it when it comes to emulation. All I can say from my experience with it so far is that it is very good even while still trying to work out 100% optimized settings.

@hunterk

Does max_swapchain_images have any effect in Windows? The blog post about it only mentions Linux.