An input lag investigation

e-tank · 30 March 2022 02:25

interesting results for sure, tho i think the next thing to try out would be to compile retroarch with debugging enabled in order to verify that you’re actually getting 2 images in your vulkan swapchain on Kwin or Weston. max_swapchain_images is just what RA will request and not what it’s guaranteed to get. the following will compile the build:

./configure --enable-debug
make

with this build (you can just run it straight from the directory you built it in for our purposes) if you specify either --verbose on the command line or log_verbosity = “true” in your config it should tell you what you need to know

one caveat, i had issues with the debug build of RA crashing on me when trying to create a suitable vulkan context, and in order to fix this issue i had to resort to commenting out all the code in between the 3 #ifdef VULKAN_DEBUG sections in the following function in the following file:

gfx/common/vulkan_common.c:1840:bool vulkan_context_init(gfx_ctx_vulkan_data_t *vk, enum vulkan_wsi_type type)

other than that you may also want to look into trying the alternate amd vulkan driver from what you’re currently using. there’s 2 for linux, radv and amdvlk (i believe the former is what valve has gone with on their hardware) and you can find more info about that here: https://wiki.archlinux.org/title/Vulkan

after writing all the above i had completely forgotten until rn that one of the main RA devs wrote a blog post a while back about his experiences with vulkan on various surface types and driver stacks: https://themaister.net/blog/2018/09/09/the-state-of-window-system-integration-wsi-in-vulkan-for-retro-emulators/ see the section on mesa - wayland - linux, he mentions being provided w/4 images in the swapchain for fifo mode and yet only 3 were ever used O_o ya, looks like this stuff gets real hairy, unfortunately…

nfp0 · 30 March 2022 09:49

I checked that post from Themaister. Very interesting read! Indeed if it is using 3 swap images instead of the 2 requested, that would explain the additional frame of latency outside of KMS.

I could swear the release build of RetroArch outputted the information about number of swapchains with verbose logging enabled, but I can’t find it anymore. Has anything in logging changed recently? Or am I remembering the verbose output of the Windows version?

I’m using radv. But I can give amdvlk a try too.

nfp0 · 30 March 2022 11:03

Aaaaand here we are. On Wayland I got:

[INFO] [Vulkan]: Using fences for WSI acquire.
[INFO] [Vulkan]: Using GPU: "AMD RADV SIENNA_CICHLID".
[INFO] [Vulkan]: Queue family 0 supports 1 sub-queues.
[INFO] [Vulkan]: Swapchain supports present mode: 1.
[INFO] [Vulkan]: Swapchain supports present mode: 2.
[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 2560x1440.
[INFO] [Vulkan]: Got 4 swapchain images.

So it seems it’s using 4 images (or 3 going by what Themaister said). This is reeeally bad for latency.

Meanwhile I also tested Weston, KMS and X with the following results:

Weston:

[INFO] [Vulkan]: Swapchain supports present mode: 1.
[INFO] [Vulkan]: Swapchain supports present mode: 2.
[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 2560x1440.
[INFO] [Vulkan]: Got 4 swapchain images.

KMS:

[INFO] [Vulkan]: Swapchain supports present mode: 2.
[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 2560x1440.
[INFO] [Vulkan]: Got 2 swapchain images.

X:

[INFO] [Vulkan]: Swapchain supports present mode: 0.
[INFO] [Vulkan]: Swapchain supports present mode: 1.
[INFO] [Vulkan]: Swapchain supports present mode: 2.
[INFO] [Vulkan]: Swapchain supports present mode: 3.
[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 2560x1440.
[INFO] [Vulkan]: Got 3 swapchain images.

Not sure what each swapchain present mode represents, but it seems RetroArch is always requesting mode 2.

Summing it all up, the number of swapchains are, in order:

4 on Kwin Wayland
4 on Weston
3 on X
2 on KMS

This all matches up with the latency numbers I measured on my other post. Indeed it seems RetroArch is not able to always get the desired 2 swapchain images.

What’s the path forward from here? From Themaister’s post, I assume RetroArch is working correctly and always requesting 2 images. Then whose door should we knock? Mesa, or the AMD drivers?

RealNC · 30 March 2022 16:28

Hm. On X11 with the proprietary nvidia driver, it seems to be fine:

[INFO] [Vulkan]: Creating swapchain with present mode: 2
[INFO] [Vulkan]: Using swapchain size 2560x1440.
[INFO] [Vulkan]: Got 2 swapchain images.

I don’t have Wayland installed anymore, but I’ll reinstall it just to see what happens there.

Have you considered using the glcore retroarch driver instead? Maybe it helps.

RealNC · 30 March 2022 16:58

I now tested Wayland. RA is only able to configure 3 swapchain images. If I set it to 2, RA hangs.

nfp0 · 30 March 2022 19:29

Hmmm interesting! Seems to be driver dependent then. I’ll try amdvlk later to see if it helps.

I’ll give it a try. Is it compatible with Slang shaders?

Here it doesn’t crash, but I would rather it crash to signal something is not right than not crash and not being able to know it’s using an incorrect number of swap images.

Thank you for the help by testing on your Nvidia!

RealNC · 30 March 2022 20:13

Yes. In fact, it only supports Slang. Unlike the “gl” and “gl1” drivers, “glcore” requires modern OpenGL support by the GPU driver. If your GPU supports Vulkan, then modern OpenGL support should in theory be no problem for it.

Edit:
As a side note, it turns out KWin with Wayland is still not ready. If the compositor crashes, or even just resets due to a GPU driver reset, it kills all applications. This is still listed as a showstopper:

https://community.kde.org/Plasma/Wayland_Showstoppers

Even worse, KWin locks itself to 80FPS when an application uses fullscreen (I use 120Hz for the desktop.)

It seems to me in will take a while yet until kwin+wayland is ready to be actually used.

nfp0 · 30 March 2022 20:57

I have multiple screens at different refresh-rates and I game with VRR, so Wayland is basically a necessity for me. And input lag under VRR is the same as X, so that is a non-issue.

But yeah, there’s still quite a few showstoppers to make it stable. That list used to be huge. It is improving at an astonishing rate. I tried it back in 2021 and it was borderline unusable on KDE. Now I use it daily for work and games and it rarely causes me any issues.

Fedora started shipping it by default too. A bit premature IMO, but in the end it helped speed things up.

Might be an Nvidia specific issue. I have a 180Hz screen and all apps and games run at 180Hz in fullscreen, RetroArch included (before loading a core, ofc).

No idea why that’s happening to you, but I’ve heard Nvidia’s drivers are terrible on Wayland. They’re taking very long to become compatible. Intel and AMD drivers are miles ahead.

Unfortunately I can’t recommend Wayland to anyone using Nvidia for the time being.

nfp0 · 30 September 2022 13:46

SUCCESS!!

At last! I achieved the lowest possible latency on Wayland! Same as on KMS and Windows exclusive fullscreen.

It seems RADV was not playing nice by supplying 4 swap images instead of the requested 2. I installed AMDVLK and now RetroArch gets the requested 2 swap images (I checked on the log). And direct scan-out also seems to be working because now I get only 54ms of latency on Wayland, the theoretical minimum on my system!

So here is the updated table:

95ms (5.7 frames) on Composited X (RADV)
77ms (4.6 frames) on Kwin Wayland (RADV)
75ms (4.5 frames) on Weston (RADV)
65ms (3.9 frames) on Uncomposited X (RADV)
54ms (3.3 frames) on Kwin Wayland (AMDVLK)
53ms (3.2 frames) on KMS (RADV)
51ms (3.0 frames) on Windows exclusive-fullscreen

It is quite possible X also benefits from the AMDVLK swapchain, as it was getting 3 images in RADV instead of the requested 2, but I’ll leave those tests for another day. I’m tired of counting literally thousands of frames by hand

~~These values also mean direct scan-out is probably working correctly on Wayland, otherwise there would be an additional 16.6ms of latency (at 60fps).~~
EDIT: Direct scan-out has no impact here. See my next post.

Anyways, I’m happy that I can enjoy KMS levels of latency on my Wayland desktop now. Next I’ll tighten the Frame Delay setting as much as possible and I’ll call it a day.

Thanks a lot for helping me figure this out! @e-tank @RealNC

I hope this info reaches people trying to reduce latency as much as they can on their Linux PCs!

nfp0 · 8 August 2024 18:23

I did one last measurement with AMDVLK to verify if X also benefited from the reduced swapchain images, and sure it did!

Here is my (hopefully final) table:

95ms (5.7 frames) on Composited X (RADV)
87ms (5.3 frames) on Composited X (AMDVLK)
~~77ms (4.6 frames) on Kwin Wayland (RADV)~~ EDIT: Today, RADV is now as fast as AMDVLK below.
75ms (4.5 frames) on Weston (RADV)
65ms (3.9 frames) on Uncomposited X (RADV)
54ms (3.3 frames) on Uncomposited X (AMDVLK)
54ms (3.3 frames) on Kwin Wayland (AMDVLK)
53ms (3.2 frames) on KMS (RADV)
51ms (3.0 frames) on Windows exclusive-fullscreen

So, it seems Windows, KMS, Uncomposited X and Kwin Wayland on AMDVLK, all reach the theoretical best-case latency!

If my numbers are correct, I would say that time has come already. Sure my system is just one example, and as we saw, this is very driver dependent. But it would be nice if other AMD users would confirm if their cards also have this swapchain discrepancy between RADV and AMDVLK.

Also, apologies for me insisting so much on direct scan-out. I’ve read a bit more about what it does and it does not do what I thought it would. It’s just a small optimization in the compositor. It does not change anything about the swapchain. It reduces the processing needed when an app is fullscreen, so it might indirectly help with latency if you use the Frame Delay feature, though.

e-tank · 31 March 2022 10:15

these are the vulkan presentation modes. 0 is immediate, no vsync so you’ll get tearing. 1 is mailbox, vsync’d but lets the program keep submitting images and only ever uses the last one submitted (useful for fast forwarding). 2 is fifo, vsync’d and blocks when full, which is crucial for timing in RA and other emulators & retro games. EDIT: Also, 2 is the only mode the vulkan spec requires to always be available

i’m glad u got it sorted out and were able to provide all these interesting results in the process. though i do think it would be a good idea to open an issue on the radv driver over this here: https://gitlab.freedesktop.org/mesa/mesa/-/issues

if the amdvlk driver can manage to provide true double buffering in fifo mode under a compositor then there’s really no reason why radv shouldn’t be able to either. it’s a really important feature to have for running fixed rate content with low input latency

nfp0 · 31 March 2022 10:21

Thanks for the explanation!

Yeah, I’ll search Mesa’s issues and open one if it does not exist already.

Completely agree, double-buffering is very important and I have no idea why it’s not working in RADV. Mesa’s RADV is typically superior to AMDVLK in performance in pretty much anything else though. Maybe it’s a problem specific to the RX 6000 series cards because they’re new.

nfp0 · 1 April 2022 17:39

Since AMDVLK is much less performant than RADV, even on RetroArch, I went back to testing and tried with the glcore driver with Hard GPU Sync enabled, but it gave me an average of 71ms. It’s a pretty bad value compared to my results with Vulkan with Max Swapchain = 2 unfortunately.

I really have to find out why RADV is getting 4 images instead of 2.

EDIT: I also found corruption on some handheld border shaders on AMDVLK.

nfp0 · 5 April 2022 16:51

The RADV additional latency has been identified as a problem on Mesa.

Anyone interested can follow the discussion here:

vanfanel · 13 September 2022 18:27

@nfp0 Can you please build and test this for GL on Wayland?

I have no instruments to test input lag, but this should improve over simply doing eglSwapBuffers() and let MESA decide.

nfp0 · 15 September 2022 15:30

Sure thing! But what test do you want me to do exactly?
What should I build?

vanfanel · 16 September 2022 15:57

Please build and test this RetroArch fork, and test with GL on Wayland:

Test for input lag improvements.

nfp0 · 16 September 2022 16:23

Alright! I’ll get back to you as soon as I can.

Hari-82 · 23 September 2022 14:39

This might be a stupid question…

My video card have DVI-I output (the one with the “extra” 5 pins ) so I think it can natively output an analog signal, this will makes it easy to connect to a crt monitor (DVI-I -> VGA for example). The question is: will the crt pc monitor shave some input delay compared to my “normal” hdmi out to my ips monitor?

RealNC · 23 September 2022 17:58

Usually the signal processing delay of monitors today is between 2 and 5ms. That’s the amount you’re looking to shave off with a CRT.

Of course your video card needs to actually output DVI-A, not DVI-D. “DVI-I” is just the connector type, and the actual signal is either DVI-A (analog) or DVI-D (digital.) DVI-I supports both, but the question is what does your video card output.