An input lag investigation

Did you try it?

I’m not getting same results.

px68k in “F12” menu shown in my previous post:

[down+k] = cursor moves

mega man 2 menu on NES (mesen core):

[down+k] + [k] = cursor moves

One frame difference between the two. It would be great if you could take a closer look / try my example.

I haven’t had the time to test yet, but last time I tested Mega Man 2 in the menu it definitely only needed a single frame advance to show the result. Maybe the response depends on which menu view you’re in. I tested on the view with the blue background where you choose between going to the stage select screen or entering a password. Try pausing there, then pressing/holding the Start button and then ’k’. It should respond immediately.

I just did a quick test: Mega Man 2 takes two frames to respond on the stage select screen, BUT only one frame on the Start/Password select screen. Also, single frame response on the very first screen with text when you start up the game.

Thanks, I could replicate it now!

Hi…i am a new user here. As per my knowledge RetroArch and Canoe themselves should both produce a response on the third frame after receiving input, since that’s how Super Mario World behaves even on a real console. We know for a fact that snes9x2010 doesn’t add any extra lag on top of this. I think it’s pretty safe to assume that Canoe and RetroArch perform the same here.

pcb assembly service

A new feature of RetroArch 1.7.1 is adjustable audio resampler quality. Previously this was set to Normal on PC, now you can set it lower than that, to Lower or Lowest.

You guys should check if it helps any with further reducing latency. Also note, this audio resampler quality level only applies to the Sinc resampler, it has no effect on Nearest and/or CC Resampler.

1 Like

With the official release of 1.7.1 and reports of D3D11 offering significant improvements with performance, I wonder if anyone would be willing to see what type of lag we are looking at, considering this will reduce the level of compute needed for a simple emulation box with latency in mind.

I loaded it up and I don’t see any settings for a Hard GPU sync. Does this make a noticeable difference on the D3D11 driver or are we getting it for “free” like when running KMS/DRM via Linux? I don’t have a 240fps camera/phone available but to me, D3D11 with its lack of Hard GPU sync has a feeling of more lag, even if it lets me set the frame delay higher.

I gotta say, even with a half a frame worse than Brunnis (I can only manage a frame delay of about 8), having access to my PC with its modern CPU, on a low latency monitor, and the raphnet v2, really is impressive. Makes it hard to go back to my nice big TV.

1 Like

Of course zero loop lag from an emulator will be ideal, but under 100mS is totally adaptable hence playable thanks to our easily fooled brains. :slightly_smiling_face:

If you want to test how much lag you can adapt to, you can try using a shader that only displays the output of a previous frame (i.e. PREV6) to add extra frames of delay to the video output.

1 Like

Neither swapchain control or GPU Hard Sync have been implemented yet in D3D11/D3D12 drivers. So take that into consideration.

We’d like to support it though.

1 Like

I have been doing some experimentation today with RetroArch running on the Raspberry Pi VC4 open source driver (and GL on KMSDRM on the Retroarch side), anI found out that, after precise monitor refresh rate measurement, I can leave VSYNC OFF and still I get no tearing at all! Is that expected somehow? Have you guys seen it with another KMSDRM implementations?

Also, I take hard_gpu_sync has no sense on KMSDRM, right?

@Brunnis @Twinaphex

On Blurbusters there’s a pretty interesting article on how to improve input lag in emulators and match the latency of the original device. See here

Eliminate Input Lag on PC-Based Emulators: Matching the Latency of the Original Device .

And more in-depth discussion in their forum here: Emulator Developers: Lagless VSYNC ON Algorithm .

From what I read it seems a bit focused on using Windows features / API’s, so I’m not sure how well this would be suited for Retroarch. Interesting concept nonetheless.

1 Like

The biggest issue with that algo for RA/libretro, I think, is that it requires running emulators in fractions of a frame, just a few scanlines at a time. Libretro is typically set up for a single frame worth of emulation, so I’m not sure how we would be able to break that up.

I hear that the APIs that get the status of the raster (scanline position) do not always work, and fail on some devices.

It’s also mutually exclusive with using run-ahead to compensate for game internal lag, and eliminating the game internal lag is much more powerful.

That looks gorgeous!

Great to see frame time and deviation exposed so conveniently. I have a feeling I’ll be spending a lot more time playing with these stats than playing games :slight_smile:

Ah, possibly I misunderstood?

Is Frame Time supposed to display the time in ms that RetroArch spent to create the frame, or the time in ms that the frame spends on the screen?

The latest nightly on macOS seems to be showing time-on-screen: https://imgur.com/a/ngdYr

Just check the sourcecode I guess to be absolutely sure. These values were previously reported at RetroArch exit if you invoked it from the commandline, all I did was hook them up to be able to be seen ingame.

I can clearly see that it’s not showing the time it took to create the frame.

The relevant times would be:

  • Time spent running the emulator core, excluding time spent in video callback
  • Time spent in video callback uploading the texture
  • Time running the shaders
  • Time spent waiting for Present/SwapBuffers to finish
1 Like

In case you’d like to improve its behavior through a pull request, let me give you some pointers -

gfx/video_driver.c (line 2399):

Here is where frame_time gets set.

Then, later on, we set video_info.frame_time here -

This is the value that gets used in the statistics.

Good news! I found a way to do beam racing in a cross-platform manner (see video at bottom of post):

Two emulators have successfully implemented the BlurBusters lagless VSYNC ON experiment (via tearingless VSYNC OFF); so it’s actually successfully validated:

– Toni’s WinUAE now has real time beam chasing. 40ms input lag reduced to less than 5ms! http://eab.abime.net/showthread.php?t=88777&page=8

– Calamity’s experimental (unreleased) change to GroovyMAME, patch, same-frame lag:

Related developer-oriented forum thread (read all forum pages)

Toni made his Beam racing compatible with GSYNC and FreeSync, with my help, via fast-beamracing. So you can have even less lag with VRR. VRR still scans top-to-bottom, just faster. So the emulator CPU runs ultra-fast (e.g. 4x faster) whenever a refresh cycle is scanning out (e.g. 1/240sec scanout top-to-bottom of a “60Hz” refresh cycle). It’s like Einstien where everything is relative, it’s synchronizing 1:1 between emulated raster and real-world raster, just faster-scanouts followed by longer-pauses between refresh cycles. As a result, you can have slightly less input lag than the original device being emulated, if you combine VRR + fast-beamracing. Beam racing can also be done on selective refresh cycles (e.g. every other refresh cycle during 120Hz), via surge-emulator-CPU-executes in synchronization with a fast-scanning-out real-world raster.

For understanding the LCD scanout, GSYNC beam racing instructions (raster scan line synchronization to real-world raster position on a GSYNC scanout or FreeSync scanout). Page1, Page2 – though for practical considerations, it works best ONLY on 120Hz+ VRR displays due to a an annoying graphics driver quirk. Toni of WinUAE asked me many questions and I’ve successfully helped him implement beam-racing with GSYNC/FreeSync to have even less lag. That said, beam racing works on slow refresh cycles, fast refresh cycles, and variable refresh cycles – as long as there’s a refresh cycle that’s pretty close to the emulator interval, beam racing can be done on specific chosen refresh cycles – basically catching the caboose of a passing train of a display scanout (or triggering your own scanout in the case of VRR), if you will.

And it doesn’t have to be perfect synchronization between emulator raster and real-world raster, it can be done at the frame-slice level:

So a lagless VSYNC ON emulated via VSYNC OFF, with zero tearing, because the tearing occurs on duplicate frameslices, and duplicates has no effect, so no tearline – viola!

Toni said it was easier than expected to add beam-racing support in WinUAE to successfully synchronize the emulator raster with the real-world raster. And apparently, you can use higher Hz just fine too (just selectively beam-race the appropriate refresh cycles, in an accelerated surge cycle, it’s simply faster top-to-bottom scanouts) – CPU performance and GPU performance willing. I can do up to 7,000 frameslices per second, so my lag from emulator-pixel-render to photons hitting my eyes can be as little as 2/7000ths of a second plus whatever the pixel response is (at least for bufferless gaming LCD monitors and for CRT displays). Buffered LCDs will add more lag, but won’t interfere with beam racing the video signal, so that’s not our emulator author’s concern to worry about how the display buffers the frames, but most good desktop gaming LCDs don’t have buffer latency anymore (at their highest Hz) and are capable of synchronous panel refresh to signal scanout, with only GtG (pixel response) lag.

Although right now RetroArch is fully frame-based, there’s no reason why it could (eventually, slowly, carefully, over the years) add support for optional raster hooks later on in the coming years.

But given the potential rearchitecturing issues, I’d suggest waiting for other simpler emulators to pave the way first, before inserting beam racing workflows into Retroarch. Let’s finish a cross platform beam racing implementation first.

I’m achieving precision tearline positioning with only a microsecond clock offset from VSYNC timestamps, so I don’t need access to a raster register (that’s only cake frosting):

And that’s without access to a raster scan line register! Just generic VSYNC OFF + precision clock counter offsets from a VSYNC timestamp. (Many ways to get a VSYNC timestamp on many platforms, even while running in VSYNC OFF mode). VSYNC OFF tearlines are always raster-exact. – You’re simply becoming a tearline jedi once you understand displays as well as some of us do.

2 Likes

Why this is fantastic!
Don’t be surprised if you’ll see it on Nvidia cards in a few months, unpatented good ideas are stolen swiftly.