An input lag investigation

I compared GL vs. Vulkan on my 970 with 368.39 drivers and found I could get perfectly smooth scrolling in Dragon’s Curse in PCE-Fast with both. But Symphony of the Night in Mednafen-PSX-Software was noticably stuttery with Vulkan compared to GL. Since one core is much more performance heavy I figured it could be performance related, but Vulkan is supposed to have less overhead so…IDK.

I don’t think anyone’s done the comparison to higan. I did some limited testing a long time ago, but I’d need to redo those tests before presenting any findings.

The fixes are enabled by default since a while back, so you don’t need to do anything other than download the cores. :slight_smile: The fixes are completely unrelated to the frame delay feature. Frame delay is an external (frontend) feature and will further lower the input lag.

[QUOTE=e-tank;44628]@Brunnis: i want to start by thanking you for doing these tests and confirming that this issue your patch addresses is indeed not theoretical but actually causes ~1 frame input delay. years ago i brought this up with a few emu authors (you can see how one of those discussions went here: https://web.archive.org/web/20160325102948/https://code.google.com/p/genplus-gx/issues/detail?id=274 , although to his credit ekeeke did change his mind, eventually (ty ekeeke!), you can see the result of the patch required to change the emu’s behaviour here: https://bitbucket.org/eke/genesis-plus-gx/commits/ec554b4b702d337168dfcaf4b4a6248062e2db5b ), albeit w/o any concrete proof to convince ppl, and essentially pretty much everyone told me to piss off. i always hoped to get around acquiring the hardware in order to do these kinds of tests but never had the time, energy, etc. to get it done, so really thank you so much for this. that said your fix kind of does have to do with vsync.

the real issue here is that in order to minimize input latency, when the guest polls input the host should poll input as close to (or better yet after, which we can now achieve in retroarch thanks to the frame_delay parameter) where the guest machine would be in real time. in other words ideally when the guest polls input you’d have the host sync, giving us have realtime >= emutime, and then poll hardware on the host. now, this wouldn’t be very practical to do, but by being clever we can at least minimize the amount of time between realtime and emutime when the guest polls input, which brings me to…

the reason your fix works so well is that retroarch is designed primarily to use vsync in order to sync realtime = emutime. with your fix the guest resumes emulation at the start of its vblank instead of the start its active frame. and since most games on the guest poll input either in vblank or shortly thereafter you’ve effectively cut the time between when the guest polls input and when the host does by ~1 frame, since for most games emutime be only be slightly > realtime when the game/guest polls.

higan on the otherhand syncs realtime = emutime using audio. for example, if you set the audio latency to 8ms in higan (8ms is the lowest my hardware can achieve in retroarch using just audio_sync without audio issues) then once the guest has built up 8ms worth of audio samples it then uses that to have the host sync realtime = emutime. using this setup if higan were to poll input on the host on demand, then it would only ever do so at most 8ms ahead of realtime (emutime > realtime), if this is what higan’s doing your patch would have no affect on stand alone higan (not higan/bsnes in retroarch, that’s different). however, if i understand correctly according to byuu higan only polls input on the host once per frame at the start of emulated active frame. if this is true then your fix would reduce input latency in higan by some amount, i don’t think it be a full frame worth on the avg but i’m not sure, i’m too tired at the moment to do the back of the envelope calculation needed to give a ballpark answer…

anyway, the frame step method you describe is meaningless without context. yes it can help pinpoint problems but you also need to have a good idea of both how the emulator works, particularly when it polls input and when / how it syncs realtime = emutime, and also a basic understating of how the game handles and responds to input.

that said i’m willing to admit i could be wrong, regardless i’m happy to discuss the topic further, well when i’m not so tired and can find the time at least (i’ll try!).

also, i’d like to comment on byuu’s article sometime as from my understanding he’s most def wrong on a few issues.[/QUOTE] Thanks for a detailed response. :slight_smile: Interesting to see the “historic” links. However, I’m afraid you might be overcomplicating the explanation of this particular fix. The reason the fix works is not that it changes the relationship between when host and guest polls. What the fix does is change when the emulator actually uses the input it reads from the host system. Before my fix, the emulator would read the input from the host system, but it would never use that input to produce a frame until the next time the emulator main loop was called. Vsync or no vsync, the calls to the emulator main loop occur 60 times per second, with a periodicity of 16.7 ms. Splitting the input read and the execution of game logic between two calls to the emulator main loop will add additional input lag of almost a full frame, vsync or no vsync (if we assume that the emulator main loop execution time is small compared to the frame period).

So, vsync or no vsync, higan still runs the main loop 60 times per second, with a periodicity of 16.7 ms, which means that reading and caching the input in one call to the main loop and using it in the next will add the same amount of input lag as it does when using vsync. Only disabling all synchronization (i.e. letting the emulator run free at uncapped framerate) would remove the effect of my fix.

Should be pretty simple to test though. I can just disable vsync in RetroArch and test with and without my fix and see if the effect remains. I’ll do it when I can find some time for it. :slight_smile:

My observations/tests are with vsync on, though. Maybe same root cause…

Yep. I wouldn’t really expect triple buffering to have any effect in RetroArch, though, since we’re running capped and vsynced at 60 FPS. We just trigger a call to the emulator’s main loop, render into the frame buffer and then we wait the rest of the frame period until we swap the front and back buffers and call the emulator’s main loop again. There’s really no need for a third buffer in this case. However, with a “correct” implementation of triple buffering, there shouldn’t be any negative effect of having it enabled either (except slightly higher memory usage). The older AMD driver 16.5.2.1 had the expected triple buffering behavior, while the newer 16.7.3 driver doesn’t. Peculiar…

Thanks for testing. Interesting to hear that the issue appears on Nvidia hardware as well.

I’ve been reading https://www.reddit.com/r/emulation/comments/4w4101/complete_guide_to_latency_in_emulators_by_the for the past hour and came across this:

I replied here:

I’m actually quite tempted to give it a try. :stuck_out_tongue: As I wrote to byuu, it would be fun to be right and otherwise good to learn what error I made in my thinking.

EDIT: The hypothesis before doing any work on a higan fix is that the improvement will be approximately: frame_period - frame_render_time. This means that the slower the emulation runs, the smaller the improvement to input lag will be. Since current versions of higan don’t offer performance or balanced versions, you will need a rather beefy computer to fully reap the benefits of the fix. Hypothetically. For example, if free-running emulation FPS is 200 FPS, frame render time is 5 ms. 16.67 ms - 5 ms = 11.67 ms improvement. But we’ll see (maybe). :stuck_out_tongue:

Finally we will get to the bottom of this :slight_smile: Can’t wait to see how it turns out with a patch on the official Higan code. Both of you seem to be confident, and one of you will be proven wrong. Anyway thanks to both of you for your work on this topic :slight_smile:

[QUOTE=Dinofly;44757]Finally we will get to the bottom of this :slight_smile: Can’t wait to see how it turns out with a patch on the official Higan code. Both of you seem to be confident, and one of you will be proven wrong. Anyway thanks to both of you for your work on this topic :)[/QUOTE] Yep, should be fun whichever way it goes. :smiley: I have already made a Windows x64 build of higan with the lagfix, but it’s currently untested (except for having confirmed that nothing obvious is broken). I won’t be able to test the build with my LED rigged controller until tonight at the earliest.

EDIT: I received a comment from a user on byuu’s forum, which explains the situation with higan pretty well. I now have a cursory understanding of how higan syncs to audio. Sorry e-tank, I think you might be right on this one and byuu as well. Will be interesting to see if there’s any effect at all from using the fix. Either way, it could be interesting to compare RetroArch to higan on the same machine to see how they compare. I’ll be back…

Thanks for your answers and work, Brunnis.

Would it be too hard to write a “Brunnis fix” for Libretro’s Genesis Plus GX core?

re patching higan: i was wrong when i said it would have no affect, what i meant to say is i don’t think it will be significant. i believe there will be a small difference simply due to the fact that without your patch higan holds onto a frame it could otherwise push for small amount of time. worst case the emu syncs to realtime in between that interval adding $AUDIOLATENCY ms worth of latency, but those cases aren’t going to happen every frame or even every other frame. my argument is essentially the same as koubiack’s in that reddit thread Brunnis linked to.

regardless i’m interested seeing actual results, i’m so glad someone is finally doing these kinds of tests, thank you! if i can make a suggestion when doing the tests, set the audio latency as low as you can (i believe setting it to 0 in higan, which works on my system, simply has the audio system assign whatever is the lowest value it can have) and use the optimal audio driver for your system.

timing wise setting maximum runspeed to 1x achieves the same effect as using vsync, it makes 1 frame (active portion + vblank) on the host correspond directly to 1 frame on the guest. when using audio sync this is no longer the case, where the host syncs no longer resides on clean frame boundaries on either the host or guest, which is why at this point it’s easier to think in terms of realtime vs emutime. so it would have to be audio sync only.

it already has it, that bug report and patch address the same issue. the snes is not unique in how it operates, as you can see in the discussion the main reason the dev was reluctant to do so (which is that the active frame height of the guest can change mid frame) is the same reason byuu opposes it. byuu’s argument is that since higan doesn’t use vsync to sync realtime = emutime that it’s not worth the trouble, that any gain would only be minimal on his end.

As I mentioned on the previous page, higan’s performance in the context of vsync is still worth investigating, both with and without the runloop reorganization, even if the GUI option for vsync is gone.

no i agree, i’m not trying to take byuu’s side on this here, but basically all i care about is that we finally got these fixes downstream in the libretro cores where it’s needed the most and that we now have the means to test, pinpoint problems, and experiment, i’m very grateful to Brunnis and you and the other core maintainers for this. thank you

also, not that it matters but i disagree with byuu in that i think there are still improvements out that haven’t been explored to their full potential. off the top of my head, methods like using a leap frog approach to emulation, that is emulating 2 frames to push video then going back 1 and continuing in that fashion. it’s hacky and can cause video glitches but for most games i’d imagine it would work fine most of the time. frame_delay like in retroarch is another relatively unexplored option in practice, which reduces the latency associated from using vsync as the primary time sync method. the concept could be improved further by keeping stats that allow us to figure out how much we can safely push the frame up w/o likely going over the edge. also bare metal programming on fixed hardware like the raspberry pi isn’t a pipe dream and lies squarely within the realm of possibility, which opens doors to new methods of reducing latency that can’t be realized on a modern os in a portable manner. etc.

Yep, I considered that as well. On average, the difference will be minimal, though. Probably not even measurable with my method.

You’re welcome! I have already run some tests and have comparison data for 60 ms vs 20 ms. Setting it to 0 ms made the emulation halt… I’ll see if I can get a good comparison up tonight.

Yep, I agree. A question on that: do you enable vsync by simply going into higan’s settings file and setting audio synchronization to false and video synchronization to true?

I only have v097 here, but assuming it hasn’t changed since then, open up settings.bml and in the ‘Video’ block, change ‘Synchronize:false’ to ‘Synchronize:true’.

I’ve been busy testing higan with and without the lagfix and comparing to RetroArch with the bsnes-accuracy core. A few important notes before we start:

[ul] [li]higan exhibited stuttery visual performance in all tests, no matter if I was using the default “Synchronize Audio” or the now hidden “Synchronize Video” (vsync) settings. Looking at my recordings, it drops frames frequently, despite running on a Core i7-6700K capable of sustaining a stable 128 FPS in the test scene (if all synchronization is turned off). The behavior looks very similar to what the Vulkan backend in RetroArch produces.[/li][li]When using “Synchronize Video” instead of “Synchronize Audio”, performance seemed even less predictable with a larger swing between minimum and maximum input lag and a few latency spikes.[/li][li]No difference was found between the unmodified higan build and the lagfix enabled one when using “Synchronize Audio”, as expected.[/li][li]With “Synchronize Video” (vsync), the version with the lagfix tested slightly worse than the unmodified build. However, the test results of the version with the lagfix contains one nasty latency spike and three slightly less nasty ones, while the unmodified build only has one slightly nasty spike. These spikes are probably random, but more testing would be needed to conclude that. If the latency spikes (outliers) are removed from the test results, both versions (with/without lagfix) once again have the same input lag.[/li][li]With no conclusive differences between lagfix/no lagfix and no way of performing the frame advance test (since higan doesn’t have that ability) to confirm that my code even works, I’ve decided to only include the unmodified higan input lag numbers in the graph below. Perhaps there really is no difference, perhaps my code doesn’t work, perhaps the erratic performance skews the results, perhaps there’s something else that prevents this fix from working in higan even when using vsync instead of audio sync. I consider this part of the testing inconclusive and it will probably stay that way, since I don’t intend to spend any more time on testing this.[/li][li]Finally, changing higan’s audio latency down to 20 ms (from the default 60 ms) produced 0.3 frames higher input lag. I decided to leave this result out of the graph below.[/li][/ul] While the testing of the lagfix was inconclusive, it’s still interesting to see how higan compares to RetroArch in terms of input lag:

Test setup

[ul] [li]Core i7-6700K @ stock frequencies[/li][li]Radeon R9 390 8GB (Radeon Software 16.5.2.1, default driver settings)[/li][li]HP Z24i monitor (1920x1200)[/li][li]Windows 10 64-bit (Anniversary Update)[/li][li]RetroArch nightly August 4th 2016 + bsnes-accuracy v094[/li][li]higan v101[/li][li]Super Mario World 2: Yoshi’s Island[/li][/ul] RetroArch settings:

[ul] [li]OpenGL video driver[/li][li]xaudio audio driver[/li][li]Fullscreen (with windowed fullscreen mode disabled)[/li][li]Vsync enabled[/li][li]GPU hard sync enabled[/li][li]HW bilinear filtering disabled[/li][/ul] higan settings: [ul] [li]OpenGL video driver[/li][li]XAudio2 audio driver[/li][li]Fullscreen[/li][li]Video Emulation -> Blurring disabled[/li][li]Video Shader -> None[/li][/ul] For these tests, 20 measurements were taken per test case. The test procedure was otherwise the same as described in the first post in this thread, i.e. 240 FPS camera and LED rigged controller.

Results

Comments

Whether using audio sync or vsync, higan has significantly higher input lag than RetroArch. Despite this, neither test configuration of higan performed satisfactory in terms of smoothness, with frequent distracting frame drops/stuttering. Audio has major issues when using vsync, but that’s to be expected when using a setting that’s not even exposed in the GUI. I would also like to mention that I did not measure higan with the default Direct3D video driver. I wanted to keep things as similar as possible between RetroArch and higan to minimize the risk of external factors skewing the results. However, I did try just playing SMW2 with Direct3D and it definitely had a similar amount of frame drops.

So, to conclude, higan does not seem fully optimized in terms of input lag, at least not when running in Windows. RetroArch not only shaves off 1.4 to 2.6 frames worth of lag, it does so while producing subjectively perfect scrolling.

So the conclusion: everyone should code for Retroarch. :slight_smile:

[QUOTE=Brunnis;44858] I would also like to mention that I did not measure higan with the default Direct3D video driver. I wanted to keep things as similar as possible between RetroArch and higan to minimize the risk of external factors skewing the results. However, I did try just playing SMW2 with Direct3D and it definitely had a similar amount of frame drops.

So, to conclude, higan does not seem fully optimized in terms of input lag, at least not when running in Windows. RetroArch not only shaves off 1.4 to 2.6 frames worth of lag, it does so while producing subjectively perfect scrolling.[/QUOTE]

Thanks for the test. Forgive me if I’m missing something, but you can’t disable desktop compositing in Win10, can you? Wouldn’t Windows 7 have been a better approach here?

Also, did you leave D3D apart for not having a frame delay feature in Higan and using a Radeon card? Open GL is said to perform worse than D3D (+ frame delay) in this regard, isn’t it?

Better as in producing a better input lag result? Possibly. However, Windows 7 is an outdated OS with no more (mainstream) support from MS, so it’s not all that interesting to test. Besides, both applications are tested on the same OS, so it can’t really be considered “unfair”.

I’m sorry, but I don’t really understand. I have not heard that OpenGL performs worse input lag wise compared to Direct3D. On the contrary, really. Also, frame delay delay was disabled in RetroArch during my tests.

That’s pretty similar to my results with higan v094. However, that’s very surprising that the runloop reorganization had either zero or negative impact! Oh well. Null data is still data, so good on you for doing the work :slight_smile:

But one application uses exclusive full screen whereas the other does not. It’s a given that desktop compositing adds lag, so the results were more or less decided beforehand (again, if I’m not missing something).

I’m sorry, but I don’t really understand. I have not heard that OpenGL performs worse input lag wise compared to Direct3D. On the contrary, really. Also, frame delay delay was disabled in RetroArch during my tests.

I’m not sure now about OpenGL. I think I read somebody which made some tests with Groovymame, but I can’t find it now… I found that apparently OpenGL would behave the same as D3D, anyway, and no matter if it’s an ATI or Nvidia card in this regard, so forget it, sorry.

Could you test with a Voodoo2 on win98SE now please? (original and Plus! theme)

Only if he has a solid 2D card to bridge it with, like a Diamond Stealth. S3 cards have buggy drivers which could skew the results.