An input lag investigation

Okay, so I’ve run a camera test on the “Frame Delay” setting. With Nestopia, I could run Mega Man 2 with a frame delay setting of 12 ms on my Core i7-6700K. If everything works as expected, input lag should reduce by 12/16.67 = 0.72 frames. And the test results are as expected (within tolerances):

Without frame delay:

Average: 4.3 Min: 3.25 Max: 5.25

With frame delay set to 12:

Average: 3.4 Min: 2.5 Max: 4.5

This obviously feels great when playing. To understand exactly how good this is and to understand how much room there actually is for improvement, let’s make a simple calculation. We’ll start with the average result (in milliseconds) and remove all the known quantities:

3.4 * 16.666… = 56.67 ms -4 ms (average time until USB poll) -8.33 ms (average time until emulator runs) -4.67 ms (time until emulator finishes first loop and receives vsync. This would be 16.67 ms if the Frame Delay setting was 0, but setting it to 12 has removed 12 ms.) -16.67 ms (time until emulator finishes second loop and receives vsync) -11 ms (time for scanning display from top left until reaching the Mega Man character)

Time left unaccounted for: 12 ms

Although the USB polling time could be decreased slightly by increasing the polling rate, there really isn’t that much to do about the other known quantities listed above. The remaining time could come from other small delays within the system (perhaps specifically the GPU driver/hardware). We also haven’t accounted for any delay within the HP Z24i display I’m using. Even if it’s fast, we can probably expect a couple of milliseconds between receiving a signal at the display’s input and getting detectable change of the corresponding pixels.

What about an actual NES on a CRT?

If we go by the hypothesis that the actual NES hardware also has 2 frames of delay in certain cases (such as during Mega Man 2 gameplay) and that it reads input at the beginning of VBLANK, we arrive at:

-8.33 ms (average time until input is actually read) -16.67 ms (time until NES has finished one frame) -12 ms (time for running through vblank again and scan out the lines until reaching the Mega Man character at the bottom of the screen)

Expected average input lag for Mega Man 2 on real NES and CRT:[B] 2.2 frames

[/B]If the above calculations hold true, our emulated case using an LCD monitor is only 1.2 frames behind the real NES on a CRT. 1.2 frames translates to 20 ms. That’s actually very, very good. :slight_smile:

[QUOTE=Sam33;41747]That’s great work in finding the causes of latency in emulation. Also, I compared mednafen’s version of bsnes source code to the recent Brunnis fix to bsnes-mercury-libretro. Here is the Brunnis fix first to src/system/system.cpp:

And that from mednafen:

It appears that mednafen also used overscan to determine “scheduler.exit”, but the two cpu.vcounter values should instead be decremented by 1. This would mirror the Brunnis fix (save for the condition that exit_line_counter is greater than 100). It may be worthwhile to confirm that the current mednafen changes are not adequate to fully decrease input latency and the effect of exit_line_counter on the snes demos (or a similar use of exit_line_counter to test for increased compatibility).[/QUOTE] Thanks Sam33! I’ll see if I can have a look at that during the day.

[QUOTE=Brunnis;41746] I just tried the frame-advance method on fceumm. I was interested in this, since I tested this emulator on RetroPie with my old camera test setup and seemed to get higher latency than Nestopia. Guess what? fceumm does indeed have one frame higher input lag than Nestopia! In the menus of Mega Man 2 it has 2 frames lag (compared to 1 with Nestopia) and in actual gameplay it has 3 frames lag (compared to 2 with Nestopia).

To be honest, I’m not particularly keen on digging into the fceumm source code as well. However, I have created an issue report (https://github.com/libretro/libretro-fceumm/issues/45) and I’m now hoping that someone else will pick this up and fix it.[/QUOTE] interesting! currently fceumm is the default in retropie. a bit off topic, but can you think of any reason why we shouldn’t just switch to nestopia as the default? i presume they both work fine on the pi, but if nestopia has this advantage…

The only reason I can think of is that Nestopia is slower. My tests (on the i7), indicate that fceumm runs 15-20 percent faster. This is not going to be an issue on the Pi 2 & 3, but it may cause issues with the Pi 1. Would you mind asking the question on the RetroPie forum (or as a GitHub issue) to see if any of the devs would care to comment?

The advantages to fceumm are: a little faster/lighter, better support for a handful of weird chinese pirate mappers and better determinism (for netplay, so not really an issue here). In short: if Nestopia is full speed on RPi 1/0, it’s probably a better choice.

Brunnis keeps going with great findings! Are some devs already involved into this?

[QUOTE=xadox;41800]Brunnis keeps going with great findings! Are some devs already involved into this?[/QUOTE] Thanks! Here’s another one: I believe I just found and fixed the lag issue in fceumm. Pull request is here: https://github.com/libretro/libretro-fceumm/pull/46

Guess I can be counted as a dev now… :stuck_out_tongue:

I’ve tested the fix and it performs as expected, i.e. it removes a full frame of lag and brings fceumm up to the same level of input lag performance as Nestopia. Talk about small fix (moving one line of code one line up…).

EDIT: Repo with the fix can be found here: https://github.com/Brunnis/libretro-fceumm

EDIT: I can see that twinaphex just merged the fix into the fceumm master. Yay!

I’ve spent the better part of the day looking at bsnes-mercury and the viability of my first fix. The problem with that one was that it could break compatibility if a game were to change the overscan setting mid-frame. Apparently, no commerical software does, but still… So, I went in again and devised what I believe to be a much better solution. For the details, please see this pull request:

I’d really appreciate some feedback. The code is available in this repository:

Below are downloads to all core variants (accuracy, balanced, performance) for Win x64. I would very much appreciate if you helped test these out. If you do, please use the frame advance method to confirm the improvement.

Accuracy Balanced Performance

Cheers!

That sounds like a much safer/smarter fix. Good work, dude :slight_smile:

Thanks a lot, Brunnis, that does seem very interesting and I’d love to test it out immediately! However, the links you have posted for the Win x64 DLL’s don’t seem to be functional, because they require the user to type the corresponding decrypt-key.

Ouch, how noobish of me… Not at home right now, but I’ll fix it as soon as I’m back.

EDIT: Links updated! Here they are again:

Accuracy Balanced Performance

Also, from Alcaro’s comment in the pull request:

Does anyone feel up to testing this and reporting back? Perhaps with screenshots?

Do we need to test any specific game with regards to the bottom scanline and overscan features?

No, I don’t think so. Just one game with overscan and one without.

Hi ! Really enthusiastic to read what’s going on here. Great work !! Input lag has made me upset for years ! (which is why I tended to give up on emulation and use real hardware or FPGA machines). Just a stupid user question : what does the frame_delay option do ? I thought it had to be set to 0 to reduce lag ? Also, if would be great if other devs, inspired by your findings, could check other libretro cores… There are many games that are made difficult (or almost unplayable)… For instance, Star Soldier on the PC Engine ; Gigawing (MAME)…

Thanks!

Actually, it’s the other way around. What the frame delay setting does is that it delays the running of the emulator core (and the polling of input) a specified number of milliseconds after receiving vsync. This is actually a good thing. Normally, when frame delay is set to 0 ms, the emulator runs and generates the next frame immediately after receiving the vsync event. It then idles until the next vsync event. If the emulator runs in a short time, such as 1 ms, the wait period is almost a whole frame. It’s not until the next vsync event that the frame that was generated is actually passed on to the graphics pipeline for display. So, after receiving vsync, the ideal thing to do is to wait with polling input and running the emulator for as long as possible and instead run it as close to the next vsync as possible. This way, the time you’d otherwise spend on just waiting can be used to accept input.

The frame delay setting is specified in milliseconds and it’s the performance of the system/emulator (i.e. the time it takes to run the emulator loop) that decides how high you can set it. The faster the emulator runs, the higher you can put this setting. I managed 12 ms on my Core i7-6700K when running Nestopia and snes9x-next. This corresponds to a reduction of input lag of 12 ms or ~0.7 frames.

Noobish question : do I have to change frame delay to something else than zero, if I have a Gsync monitor ? Or is that irrelevant ?

Thanks for the quick reply ! Hmm, if I understand correctly, if you wait for half a frame (about 8 ms)after the vsync, you need twice the CPU power to render the frame in half the time ? So, when using a slow CPU (Raspberry Pi) one has to check the ideal value for each platform I guess. For instance on my Pi 3 it seems a value of 8 is good for the Neo Geo ; but the PGM platform requires me to set it very low.

I’m not 100% sure, but I would guess that you’d want it at zero in the Gsync case.

[QUOTE=bidinou;41845]Thanks for the quick reply ! Hmm, if I understand correctly, if you wait for half a frame (about 8 ms)after the vsync, you need twice the CPU power to render the frame in half the time ? So, when using a slow CPU (Raspberry Pi) one has to check the ideal value for each platform I guess.[/QUOTE] Yup, that’s correct. The frame delay basically decreases the time you have available to render the frame. On the Pi you don’t have that much extra performance. I would guess that a Pi 3 running snes9x-next could accept 4 ms tops and maybe 8 ms if running fceumm or Nestopia.

The main problem with this setting is that you have to manually tweak it for each host system and emulator.

May I ask a last question ? Did you measure any input lag difference with the video_threaded option ?

Thanks again ! I’m happy to know I can tweak one more setting :slight_smile:

Edit : value of 7 seems OK for Neo Geo and SNES on the Pi 3. I’ll try to pay more attention to subtle sound distortion and try more demanding titles. Edit : after further tests on Pi 3 / RetroPie : frame delay of 9 seems ok for Neo Geo & PSX, 5 for the SNES, 10 for the PC Engine & Megadrive (on Pi 3). No Edit : final values : SNES : 4 // Neo Geo & PSX : 9 / PC Engine : 9 // Megadrive : 10

Nope, I haven’t. I may give it a shot, but don’t count on it. Getting a bit sick of testing right now, to be honest. :stuck_out_tongue:

[QUOTE=bidinou;41847]Edit : value of 7 seems OK for Neo Geo and SNES on the Pi 3. I’ll try to pay more attention to subtle sound distortion and try more demanding titles. Edit : after further tests on Pi 3 / RetroPie : frame delay of 9 seems ok for Neo Geo & PSX, 5 for the SNES, 10 for the PC Engine & Megadrive (on Pi 3). No[/QUOTE] Thanks for testing. Give Yoshi’s Island a try. It’s a SuperFX game and it seems pretty demanding. It was the only game where I noticed slowdown on the Pi 2. Even if the Pi 3 is decent amount faster, there can’t be a huge margin left for frame delay.

Indeed, there is a slight slowdown on the menu with a frame delay of 5. It’s OK with 4. I give up to set this for MAME games :slight_smile: