An input lag investigation

I now tested Wayland. RA is only able to configure 3 swapchain images. If I set it to 2, RA hangs.

Hmmm interesting! Seems to be driver dependent then. I’ll try amdvlk later to see if it helps.

I’ll give it a try. Is it compatible with Slang shaders?

Here it doesn’t crash, but I would rather it crash to signal something is not right than not crash and not being able to know it’s using an incorrect number of swap images.

Thank you for the help by testing on your Nvidia! :slight_smile:

Yes. In fact, it only supports Slang. Unlike the “gl” and “gl1” drivers, “glcore” requires modern OpenGL support by the GPU driver. If your GPU supports Vulkan, then modern OpenGL support should in theory be no problem for it.

Edit:
As a side note, it turns out KWin with Wayland is still not ready. If the compositor crashes, or even just resets due to a GPU driver reset, it kills all applications. This is still listed as a showstopper:

https://community.kde.org/Plasma/Wayland_Showstoppers

Even worse, KWin locks itself to 80FPS when an application uses fullscreen (I use 120Hz for the desktop.)

It seems to me in will take a while yet until kwin+wayland is ready to be actually used.

1 Like

I have multiple screens at different refresh-rates and I game with VRR, so Wayland is basically a necessity for me. And input lag under VRR is the same as X, so that is a non-issue.

But yeah, there’s still quite a few showstoppers to make it stable. That list used to be huge. It is improving at an astonishing rate. I tried it back in 2021 and it was borderline unusable on KDE. Now I use it daily for work and games and it rarely causes me any issues.

Fedora started shipping it by default too. A bit premature IMO, but in the end it helped speed things up.

Might be an Nvidia specific issue. I have a 180Hz screen and all apps and games run at 180Hz in fullscreen, RetroArch included (before loading a core, ofc).

No idea why that’s happening to you, but I’ve heard Nvidia’s drivers are terrible on Wayland. They’re taking very long to become compatible. Intel and AMD drivers are miles ahead.

Unfortunately I can’t recommend Wayland to anyone using Nvidia for the time being.

SUCCESS!! :slight_smile:

At last! I achieved the lowest possible latency on Wayland! Same as on KMS and Windows exclusive fullscreen.

It seems RADV was not playing nice by supplying 4 swap images instead of the requested 2. I installed AMDVLK and now RetroArch gets the requested 2 swap images (I checked on the log). And direct scan-out also seems to be working because now I get only 54ms of latency on Wayland, the theoretical minimum on my system!

So here is the updated table:

  • 95ms (5.7 frames) on Composited X (RADV)
  • 77ms (4.6 frames) on Kwin Wayland (RADV)
  • 75ms (4.5 frames) on Weston (RADV)
  • 65ms (3.9 frames) on Uncomposited X (RADV)
  • 54ms (3.3 frames) on Kwin Wayland (AMDVLK)
  • 53ms (3.2 frames) on KMS (RADV)
  • 51ms (3.0 frames) on Windows exclusive-fullscreen

It is quite possible X also benefits from the AMDVLK swapchain, as it was getting 3 images in RADV instead of the requested 2, but I’ll leave those tests for another day. I’m tired of counting literally thousands of frames by hand :tired_face:

These values also mean direct scan-out is probably working correctly on Wayland, otherwise there would be an additional 16.6ms of latency (at 60fps).
EDIT: Direct scan-out has no impact here. See my next post.

Anyways, I’m happy that I can enjoy KMS levels of latency on my Wayland desktop now. Next I’ll tighten the Frame Delay setting as much as possible and I’ll call it a day.

Thanks a lot for helping me figure this out! @e-tank @RealNC

I hope this info reaches people trying to reduce latency as much as they can on their Linux PCs!

4 Likes

I did one last measurement with AMDVLK to verify if X also benefited from the reduced swapchain images, and sure it did!

Here is my (hopefully final) table:

  • 95ms (5.7 frames) on Composited X (RADV)
  • 87ms (5.3 frames) on Composited X (AMDVLK)
  • 77ms (4.6 frames) on Kwin Wayland (RADV)
  • 75ms (4.5 frames) on Weston (RADV)
  • 65ms (3.9 frames) on Uncomposited X (RADV)
  • 54ms (3.3 frames) on Uncomposited X (AMDVLK)
  • 54ms (3.3 frames) on Kwin Wayland (AMDVLK)
  • 53ms (3.2 frames) on KMS (RADV)
  • 51ms (3.0 frames) on Windows exclusive-fullscreen

So, it seems Windows, KMS, Uncomposited X and Kwin Wayland on AMDVLK, all reach the theoretical best-case latency!

If my numbers are correct, I would say that time has come already. Sure my system is just one example, and as we saw, this is very driver dependent. But it would be nice if other AMD users would confirm if their cards also have this swapchain discrepancy between RADV and AMDVLK.

Also, apologies for me insisting so much on direct scan-out. I’ve read a bit more about what it does and it does not do what I thought it would. It’s just a small optimization in the compositor. It does not change anything about the swapchain. It reduces the processing needed when an app is fullscreen, so it might indirectly help with latency if you use the Frame Delay feature, though.

3 Likes

these are the vulkan presentation modes. 0 is immediate, no vsync so you’ll get tearing. 1 is mailbox, vsync’d but lets the program keep submitting images and only ever uses the last one submitted (useful for fast forwarding). 2 is fifo, vsync’d and blocks when full, which is crucial for timing in RA and other emulators & retro games. EDIT: Also, 2 is the only mode the vulkan spec requires to always be available

i’m glad u got it sorted out and were able to provide all these interesting results in the process. though i do think it would be a good idea to open an issue on the radv driver over this here: https://gitlab.freedesktop.org/mesa/mesa/-/issues

if the amdvlk driver can manage to provide true double buffering in fifo mode under a compositor then there’s really no reason why radv shouldn’t be able to either. it’s a really important feature to have for running fixed rate content with low input latency

1 Like

Thanks for the explanation!

Yeah, I’ll search Mesa’s issues and open one if it does not exist already.

Completely agree, double-buffering is very important and I have no idea why it’s not working in RADV. Mesa’s RADV is typically superior to AMDVLK in performance in pretty much anything else though. Maybe it’s a problem specific to the RX 6000 series cards because they’re new.

Since AMDVLK is much less performant than RADV, even on RetroArch, I went back to testing and tried with the glcore driver with Hard GPU Sync enabled, but it gave me an average of 71ms. It’s a pretty bad value compared to my results with Vulkan with Max Swapchain = 2 unfortunately.

I really have to find out why RADV is getting 4 images instead of 2.

EDIT: I also found corruption on some handheld border shaders on AMDVLK.

The RADV additional latency has been identified as a problem on Mesa.

Anyone interested can follow the discussion here:

@nfp0 Can you please build and test this for GL on Wayland?

I have no instruments to test input lag, but this should improve over simply doing eglSwapBuffers() and let MESA decide.

Sure thing! But what test do you want me to do exactly?
What should I build?

Please build and test this RetroArch fork, and test with GL on Wayland:

Test for input lag improvements.

2 Likes

Alright! I’ll get back to you as soon as I can.

This might be a stupid question…

My video card have DVI-I output (the one with the “extra” 5 pins ) so I think it can natively output an analog signal, this will makes it easy to connect to a crt monitor (DVI-I -> VGA for example). The question is: will the crt pc monitor shave some input delay compared to my “normal” hdmi out to my ips monitor?

Usually the signal processing delay of monitors today is between 2 and 5ms. That’s the amount you’re looking to shave off with a CRT.

Of course your video card needs to actually output DVI-A, not DVI-D. “DVI-I” is just the connector type, and the actual signal is either DVI-A (analog) or DVI-D (digital.) DVI-I supports both, but the question is what does your video card output.

2 Likes

@RealNC well… 5 ms is less than a third of a frame, I guess at the moment I don’t think is worth the effort for me.

And to be fair I can’t complain much about input delay: from a (very, very rough) test DoDonPachi (with 1 frame of run-ahead) gives me about 2 frames of delay (button press to animation). But some other games are not that responsive (some dreamcast games for example) and I was hoping that maybe I could shave a couple of frames with a CRT…

Thanks for the answer!

What video driver are you using? If gl or glcore, are you using GPU Hard Sync set to 0? If Vulkan, are you using 2 max swapchain images? Are you using exclusive fullscreen? How about frame delay? Depending on your setup, you may be able to shave at least one more frame off.

Depends:

with fbneo I use GLcore with Hard GPU sync ON, Auto frame delay ON and framedelay 0.

with flycast Vulkan with Max swap chain 2, Auto frame delay OFF and framedelay 0.

Always exclusive fullscreen and v-sync ON (tearing is not for me and I don’t have a gsync/free sync monitor).

Polling behavior I let it default (late), is there a way to understand witch is better (early, normal, late)?

As I said fbaneo with run-ahead is very responsive and no complains, with flycast is more a game by game case. for example Ikaruga feels good while mars matrix not so much…

I assume you are not using a default value of run-ahead with fb-neo. Some games have zero frames of lag themselves so dunno how run-ahead would affect them, negatively.

Also, i have no idea how the polling option works, i never managed to feel a difference.

1 Like