An input lag investigation

using wireless xbox360 -> Terrible input lag I’m not surprised if you’re using a wireless controller.

I tested GBA core with my real GBA hardware and I use different settings that can have different results of latency. I used Pokemon Emerald and Super Mario Advance as a base and scroll the text selection with D-Pad and from real hardware, on SP and NDS, and scrolling up and down are 4 frames after pressing from hardware. I used my camera that encodes 60fps.

Here are the spec for my desktop: AMD FX 8350 4.0 GHz 8gb RAM Geforce GTX 750 Ti Acer AL2016w with VGA input and outputting full resolution on 1050p 16:10 Windows 10

Exclusive Full screen without V-Sync or options related to it, it is 4 frames exactly like the real hardware.

Exclusive Full screen with V-Sync alone goes up to 5-6 frames.

With Hard Sync, it stays at 5 frames.

On Windowed, regardless of V-Sync settings or Frame Delay, it goes up to 6 frames at least.

I been testing Sega Genesis and SNES cores with 240p test suite and while the Sega genesis still have very good response with V-Sync, the SNES has a delay once V-Sync is applied. Do note that the manual test lag doesn’t determine the actual frame delay, but tells you how the input should feel to be playable. I even tested genesis 240p test suite on my HDTV with two frames later than my laptop screen and I usually get excellent reflexes unless I turn on V-Sync, but my laptop has AMD graphics so drivers may vary with V-Sync.

With the SNES on desktop, I get around 0.6-1.0 with V-Sync on exclusive full screen, but stays around 0.5 with hard sync. I find hard sync to be less effective on windowed display. On windowed, it would be around 2 frames on the lag test.

Also I want to note that for N64 emulators, I find Mupen64plus libretro core a bit more responsive because I tested Doom 64 and Quake 2. I feel more input latency on those games on Project64, even using the slowest Angrylion RDP plugin, I still see a big latency. On Retroarch, It still has a delay, but on windowed, it slightly improved, and full screen without V-Sync improves the most, but it does still have some sort of delay. I don’t know how it plays on the real console but I feel more response on their PC counterparts and open source engines.

Edit: Tested Doom 64 by rotating the player several times.

Retroarch on fullscreen without ysync: 8 frames Retroarch windowed: 10 frames Project 64 on fullscreen: 12 frames Project 64 windowed: 13 frames

I mainly used Gliden64 as well as Glide64 and Jabos. At least Retroarch has better latency.

Perhaps I am being naive - but isn’t there a way to timestamp the actual input being received by the libretro API, then timestamp the respective output from the libretro display buffer? Doing this in software would eliminate the uncertainty involved with LCD response time, input lag created from wireless controllers, etc. Then we could see on a core-by-core basis, the respective lag and then troubleshoot further.

THEIMASTER AND SQUAREPUSHER are you listening?

1 Like

For best input lag with windows , activate disable desktop composition in retroarch or actived windows classic theme, I win one or two frame this option. i don’t no with windows 10 … it is a reason not to move to Windows 10 …

[QUOTE=wasabi;37714]For best input lag with windows , activate disable desktop composition in retroarch or actived windows classic theme, I win one or two frame this option. i don’t no with windows 10 … it is a reason not to move to Windows 10 …[/QUOTE]Or disable windowed full-screen mode and bypass the compositor instead.

Perhaps someone could give the bsnes-mercury implementation of retro_run() a sanity check? To me it seems obscure compared to fx the nestopia core.

The nestopia implentation seems logical: 1. poll input, 2. update emulator, 3. update video…

The bsnes mercury and snes9x core looks weird in comparison.

Bump bump bump it up!

Perhaps a recap would be good. We are curious about two things:

  • How can win10 be 1 frame faster than linux in kms mode (fx lakka)
  • Why are the snes cores a couple of frames slower than all other cores?

History repeats. Like already said. The LAG was already discussed so often. And again also this topic seems to die :frowning:

This is all very interesting, but do we have numbers for the latency on the actual original hardware on a typical crt from the era to compare against? Otherwise how will we know if we are close to the original or miles away? For example if the original SNES had a higher latency that the NES, then it seems unreasonable to expect that the emulators have the same latency.

I can’t even tell there is any input lag anymore with Vsync On, GPU Sync 0, and the highest Frame Delay setting I can get. (of course don’t use Vsync if you don’t need it) That’s even with an 8ms or 9ms 47inch LCD monitor. Obviously there is lag compared to the original hardware, but testing with NEStopia core on LCD comparing with real hardware on a CRT… I could not distinguish any lag by eye/feel. That frame delay setting was the cherry on top.

I’m going to try hooking up the PC to the SD CRT TV when I get the converter. I’m curious how it would compare with that setup the most. That would eliminate having to use any shaders, which would allow an even higher Frame Delay setting… plus eliminate LCD lag. Not sure if it would still have screen tearing and need to use vsync, though.

I’ve only tested mednafen psx thoroughly, and it’s unplayable since version 1.3.2 (might be a fault of the core, however). As stated before I didn’t do any measurements to prove my feelings, but I’ve been playing games for over 30 years, and I trust them.

[QUOTE=Girgl;38898]I’ve only tested mednafen psx thoroughly, and it’s unplayable since version 1.3.2 (might be a fault of the core, however). As stated before I didn’t do any measurements to prove my feelings, but I’ve been playing games for over 30 years, and I trust them.[/QUOTE] I’m curious as to what you are calling unplayable and what games specifically you are testing with. Mednafen PSX runs pretty damn sweet for me. I am hesitant to call it perfect because there is always room for some sort of improvement but mine is running pretty close to it. Maybe you have some setting that is causing you issues or maybe your display is the issue. Not saying you do have a bad setting or display, just saying it as a possibility. I am playing games like Castlevania, Rayman and the MegaMan X series with no issues and everything plays and feels extremely nice. I too have been playing games for well over 30 years now and I trust my feelings too. As a point of reference I recently upgraded my monitor from a standard run of the mill 22" @ 60Hz to an Asus ROG G-Sync gaming monitor with extremely low input lag and it made a fairly big difference. Not saying games went from unplayable to amazing because even before the new monitor things were extremely playable the difference was noticeable.

@Girgl: Mednafen PSX plays good enough. If you have input lag issues, use the settings I posted above. I don’t know if it’s the “best” PS1 emulator but it’s certainly not “unplayable”. Unplayable to me is input lag and/or anything slower than the original framerate… or if there is no audio.

Yeah, I agree, “unplayable” may be harsh. But in Bomberman World for example (there are many more) the input takes what feels like half a second to register. Mednafen PSX used to run great on my machine, and I didn’t change any settings or alter any of the machine’s part, the only things that changed are the versions of Retroarch and Mednafen PSX. But I will try your suggestions.

I tried compiling Lakka and modified its linux package to v4.4.9 and applied the real time kernel patches and configured it with RT_FULL (hard realtime), 1000ms timer etc and ran it on my nuc with nes & snes cores.

It unfortunately made no measurable difference in input latency compared to the official Lakka builds.

I did however notice another thing - with the nestopia core I can set the frame delay setting in retroarch to the max value of 15 and it still runs well and shaves about a frame of input latency off so I can get 4 frames latency in NES using Lakka. (dont know if this high setting works for all nes games though).

I also compared wireless and wired x360 controllers and there were no measurable difference in input latency using a 120fps camera.

Edit: Somehow win10 achieves 4 frames latency with NES by simply enabling hard sync (which for some reason is better than linux with KMS, even with hard real time patches applied and enabled).

@Girgl I just tested using Bomberman World and it seems to be as responsive as any other game for me with no perceivable input lag at all. I am not a Bomberman fan myself so my playtime on it what I think it should be is extremely limited but the character felt like he moved and dropped bombs responsively.

[QUOTE=larskj;39122]I tried compiling Lakka and modified its linux package to v4.4.9 and applied the real time kernel patches and configured it with RT_FULL (hard realtime), 1000ms timer etc and ran it on my nuc with nes & snes cores.

It unfortunately made no measurable difference in input latency compared to the official Lakka builds.

I did however notice another thing - with the nestopia core I can set the frame delay setting in retroarch to the max value of 15 and it still runs well and shaves about a frame of input latency off so I can get 4 frames latency in NES using Lakka. (dont know if this high setting works for all nes games though).

I also compared wireless and wired x360 controllers and there were no measurable difference in input latency using a 120fps camera.

Edit: Somehow win10 achieves 4 frames latency with NES by simply enabling hard sync (which for some reason is better than linux with KMS, even with hard real time patches applied and enabled).[/QUOTE]

There’s some interesting stuff here. May I suggest that you try the BFS scheduler patch with Linux/KMS and test again? Also, Linux with KMS uses GLES/EGL, and it seems many implementations use triple buffering, depending on the context. That would explain the extra frame delay.

[QUOTE=vanfanel;39184]There’s some interesting stuff here. May I suggest that you try the BFS scheduler patch with Linux/KMS and test again? Also, Linux with KMS uses GLES/EGL, and it seems many implementations use triple buffering, depending on the context. That would explain the extra frame delay.[/QUOTE]

I made two changes that might explain why I could set the frame delay setting higher:

  • I changed the build to optimize for speed instead of size. (02 instead of 0s) Does anyone know why a generic x64 Lakka build is set to optimize for size?

  • Change the generic linux kernel v4.4.2 to v4.4.9 with hard real time patches applied from https://www.kernel.org/pub/linux/kernel/projects/rt/4.4/

The only reason I updated to 4.4.9 was to get the real time patches to apply cleanly.

Sure I can give the BFS scheduler a go, would you expect it to be better than the regular rt patch?

If you want the image I can upload it somewhere. (there shouldn’t be any problems distributing it, right?)

I just reverted to the generic Lakka image to try it again and can no longer get away with setting the Frame delay setting high anymore without getting lots of stutter.

Edit: vanfanel, can you please elaborate on the tripple buffering going on which you suspect? And how can we verify it?

@larskj: I suspect GLES/EGL is triple-buffered in some cases because, for example, when I did the plain dispmanx driver for Raspberry Pi, I noticed that, while it used noticeably less CPU than the dispmanx/GLES, it would start showing audio stuttering at around 70% CPU usage, while the dispmanx/GLES driver would be able to show no sound stutteting at a good 90% CPU usage. So there I went and implemented a triple buffering mechanism for the plain dispmanx driver, and got no stuttering at a good 90% CPU usage, just like with the dispmanx/GLES driver: no CPU was being wasted waiting as in a double buffer, so that was put to good use for emulation itself, but it added an 1-frame delay. I don’t know how to be sure about other EGL contexts using double or triple buffering, I have tried googling with no luck. If someone knows, it would be very good to know.

Also, yes, I expect scheduler changes to have more impact than rt patches (for the better or for the worse), since rt patches will, as far as I know, allow you to set realtime priority for a task so it doesn’t get interrupted by other tasks running in the system, but I don’t think that’s related to input latency in an emulator using only a fraction of the CPU and polling input each frame. Maybe I am mistaken, these topics are a bit obscure for me. Scheduler however determines how often a certain task is given CPU time, and I can see relation to input latency there.

I have a 4.4.9 kernel running on my Pi2 with BFS, but I don’t have a precission camera to reproduce your experiments.

However, that extra frame…I still think GLES EGL context using triple buffering is to blame.