1 is the fastest, but some drivers just straight-up won’t give it to you, and even if you get it, you might have stuttery scrolling.
Which is the best of both worlds? The one that gives acceptable input lag and no stuttering, is it 2?
Try it with 1 (your driver may give you 2 anyway ) . If it’s too stuttery, change it to 2.
Whats the max_swapchain option?, wheres that located and what does it do?
It’s in settings > latency, but it’s only visible with drivers that expose it (vulkan and d3d11/12 IIRC). It controls the maximum number of images that can buffered, which–as with most buffering–is a tradeoff between latency and quality (in this case, smooth scrolling).
I have absolutely no idea of the technical aspects behind this great feature, but would it be possible to implement something that makes it increase the frame delay value as long as the game runs without sound crackling or lagging ? For example, if frame delay was set to 0 and automatic frame delay to ON, the starting point would be 8, but as long as the game runs fine, it would go up to 9, then 10, etc. until it reaches the point were it starts lagging ?
I talked to sonninnos about exactly that and he didn’t seem too interested in it vs just setting it high to begin with and then letting it self-correct downward.
Yeah starting high and going down is best, already with the current auto frame delay every once in a while drops some fps. I tried in some shmups (ddp, ketsui) and it doesn’t feel stable. Personally I keep it off, if I feel I need some “boost” I set it manually, plus frame-delay doesn’t have a good ms saved-resources used ratio.
I’m trying to compare the latency of Vulkan Wayland VS KMS on RetroArch, but I need to set a specific frequency on KMS for consistency purposes.
I have a 4K@120Hz display, and I can boot it in 4K@60Hz mode, but RetroArch changes it to 4K@120Hz when I open it in KMS. I really would like to run it at 60Hz though, for consistent tests among other 60Hz-only setups.
Can anyone help me to force 60Hz? Sorry if this was already requested on this thread.
EDIT: Also, it seems my USB keyboard doesn’t work when I open RetroArch in KMS mode. I tried with the X and UDEV input drivers.
Even though I would still love to know how to force a certain frequency on RetroArch in KMS, I have found another way to conduct the Wayland VS KMS tests on 60Hz, but what I found out seems strange.
I’m using Wayland KDE Plasma (5.24) on Manjaro with RetroArch in fullscreen mode, and assuming direct scan-out is working correctly on RetroArch, I believe the latency should be the exact same as KMS. But that’s not what I verified.
I used my phone’s 480fps slow-motion mode to capture 10 samples each of the latency between a button press on my wired keyboard and my display. I used the exact same RetroArch configuration and hardware for the test. I’m using a Radeon RX 6800 with the AMDGPU driver. Here are my averages:
- KMS: 53ms (3.2 frames at 60fps)
- Wayland: 77ms (4.6 frames at 60fps)
That’s a 1.4 frames of additional delay on Wayland. Where am I losing that much? Assuming direct scan-out, shouldn’t it be technically the same, or very close? Could it be perhaps RetroArch is not engaging Kwin’s direct scan-out code, or an internal RetroArch issue?
I’m sorry if this was already covered in the thread sometime before, but a search turned nothing specific to my case. Please link me up if that is the case.
Wayland has lag. It’s just not made for gaming. It’s made for watching video and browsing the web. The people who develop it are not gamers, don’t understand gaming nor really care about it much. Gaming is a second class citizen there.
If you want low latency, use X11 and disable compositing. This will basically give you the equivalent of Windows 10 with DWM disabled (like when using fullscreen, where Windows turns off DWM.) The DWM equivalent of Wayland cannot be disabled. It’s always there, adding lag.
Aside from the exception of allowing tearing (which is in development for Wayland), both X11 and Wayland have the exact same input lag assuming direct scan-out is working correctly.
Xaver himself wrote an excellent article explaining and testing the input lag of X11 vs Wayland, even going as far as adding XWayland to the mix. I recommend you check it out:
As you can see, Wayland, and even XWayland have the exact same latency as X11.
You’ll probably notice that FIFO latency seems higher on Wayland, but that’s exactly what I was talking about previously. As Xaver points out on the notes below the latency tables, at the time of testing, dmabuf
was not implemented yet:
due to increased buffer bloat (the queue for presentation being one frame bigger) the latency with
fifo
is higher by one frame than on uncomposited X. This should disappear once all the necessary parts for dmabuf feedback are implemented in Mesa
The thing is, I believe dmabuf
is now already implemented in Mesa and Kwin, so direct scan-out should be working with RetroArch and should have the exact same latency as uncomposited X11, or even better. Or am I wrong?
I conducted X11 measurements to corroborate what I’m talking about. Here are the averages together with my previous results:
- X composited: 95ms (5.7 frames at 60fps)
- X uncomposited: 65ms (3.9 frames at 60fps)
- KMS: 53ms (3.2 frames at 60fps)
- Wayland: 77ms (4.6 frames at 60fps)
So there’s a difference of 12ms between uncomposited X and Wayland on RetroArch, which might be that missing frame from direct scan-out. I don’t know why uncomposited X can’t match the low latency of KMS though (53ms). Might be an error on my part, or some driver shenanigans.
Anyone here versed with how RetroArch works internally knows if there is anything missing to trigger direct scan-out? Also, does RetroArch use FIFO or Mailbox?
Or does anyone knows how to verify that an app is triggering direct scan-out on Wayland?
If the applications need to “trigger” direct scanout, then that’s bad.
Also, your reply is kinda funny. It begin with saying wayland is just as low latency as X11, and then shows it’s not
From my understanding, the application doesn’t have to do anything other than being fullscreen. It’s a great feature.
It seems to have been merged by Xaver on time for KDE Plasma 5.22, so I should have it (I’m on 5.24) and RetroArch should be benefiting from it.
You missed the point. My latency under Wayland being slower than X11 is precisely what I’m trying to figure out. It should have the same values under direct scan-out, as demonstrated by Xaver himself, but clearly something is not working correctly on my end. I was hoping any other Wayland user on this thread help me figure it out.
I’m also trying to understand how RetroArch’s Vsync works in more detail.
that seemed to be the case at one point but the devs are taking it seriously now thanks in part to valve and the steam deck
with vsync enabled in RA it’ll use fifo, which is usually what you want for applications designed to run at a fixed frame rate, like emulators and retro/retro-like games, which is the kind of stuff LR was primarily designed to for. not saying there aren’t but I’m not aware of any cores rn that are designed to run in and take advantage of the benefits mailbox mode can offer. anyway…
as a reference for your tests, with these settings:
video_vsync = "true"
audio_sync = "false"
vrr_runloop_enable = "false"
run_ahead_enabled = "false"
video_threaded = "false"
video_frame_delay = "0"
video_frame_delay_auto = "false"
video_max_swapchain_images = "2"
video_hard_sync = "true"
video_hard_sync_frames = "0"
and with any of these drivers:
video_driver = "vulkan"
video_driver = "glcore"
video_driver = "gl"
and on a 60 hz display, if you were to load up super mario bros 1 (in fceumm, nestopia, or mesen) the theoretical average time to see a response on mario from any given input when he’s near the bottom of the screen should be just under around ~58.33 ms (3.5 frames * 1000ms / 60fps) EDIT: assuming a crt display, so most likely this + some (hopefully small) ms display lag
it would be bad, but i’m pretty sure that’s not the case and i just assume they meant via their compositor, which brings me to…
at this time, i do not. but might i suggest you try running RA in weston instead of Kwin/KDE to compare, jic. (just install weston via your package manager and you should see an option for it somewhere on the login manager screen)
Got it. Makes sense.
Why disable audio sync? Would this cause any kind of video delay?
I believe you mean video_frame_delay = "0"
and video_frame_delay_auto = "false"
Alright, I’ll give it a try! This is the best case scenario, right? With no frames lost in any kind of compositing (except for Mario’s own known 1 frame of delay).
I’ve been conducting my tests on the 240p Test Suite on the Horiz/Vert Stripes test pattern emulated on the 2014 bsnes core. The picture reacts with zero frames of delay, so it’s good to count input lag.
I assume currently those ~58.33ms best case scenario is only possible on Windows exclusive fullscreen, or Linux KMS, correct?
Yeah, I believe this is all the compositor’s responsibility. As seen on the merge request I linked on my previous post here.
Alright, I’ll give that a try too. I’ve never used Weston though. How do I start it up after I start the session?
Thanks a lot for the help!
EDIT: Don’t mind me. I’ve figured Weston out.
To quote from https://invent.kde.org/plasma/kwin/-/merge_requests/502:
I wouldn’t worry too much about additional input lag in the low single digit range.
So don’t worry. I on the other hand do worry, so I use X11. If some day Wayland becomes capable of zero overhead output, I’ll consider it and recommend it. For now, I can’t.
it shouldn’t under normal circumstances and if configured properly (which it typically is) but there’s a lot of variables to that. on the other hand disabling it definitely won’t add any lag, so it just removes something completely from the equation for testing purposes
yep ty, will fix
correct, it’s the theoretical best case for those particular settings given above. it’s technically possible for a compositing window manager to keep up and match but i haven’t seen any results to reflect this and am not surprised i haven’t yet. though the numbers in the article u linked to look promising.
i did forgot to add that estimate doesn’t take into account monitor/display lag, so unless you’re using a crt or some other stupidly low response time display it should be that + some (hopefully small) amount of ms. we’re talking + something in the low single digits for a decent gaming monitor.
if you were to measure smb1 on native nes hw hooked up to a crt it comes out to (on the average) just under around ~41.66 ms (2.5 frames * 1000ms/60fps. i’ve come across test results of the game that reflect this a few times now but i can’t recall where off the top of my head…) anyway, the 1 extra frame over native hw is due a fundamental difference in how frames are generated and pushed out in modern applications on frame buffered display hw compared to old frame buffer-less raster on the fly hw like the nes. that’s where runahead and frame delay come into play, as a way of chipping away at or in some cases exceeding that limit. there are other ways too, such as beam chasing that the blur buster ppl came up with, but that’s not something applicable to LR/RA.
pretty much, i’ve seen results that backup the theoretical numbers on both a few times now, though as mentioned above ideally uncomposited x and wayland would match given enough performance overhead, hopefully they will in time
I’ve done additional tests using your setup (which is pretty much what I was already doing). The only thing I changed was keep using the 240p Test Suite methodology as I was doing before, for zero frame in-game latency. I always measure my values on the 2nd half of the screen.
I measured averages for Windows on exclusive full-screen and Weston. So, putting it all together now we have, in order (with frames at 60fps):
- 95ms (5.7 frames) on Composited X
- 77ms (4.6 frames) on Kwin Wayland
- 75ms (4.5 frames) on Weston
- 65ms (3.9 frames) on Uncomposited X
- 53ms (3.2 frames) on KMS
- 51ms (3.0 frames) on Windows exclusive-fullscreen
Note: All of this was measured on Vulkan with max swapchain = 2.
Noise aside, we can conclude then that KMS and Windows have the exact same latency, which is not surprising.
We can also conclude that Wayland on Kwin has the exact same latency as Weston. Not sure what to make of this, as I don’t know how Weston’s presentation queue works.
I also don’t understand why Uncomposited X has more input lag than KMS, as the queue should be the same size. Same for Wayland. All 3 should have the same latency.
Sure thing. I’ve made all the measurements on the same monitor, so that’s a fixed variable. My only purpose was direct comparison. This is a pretty old monitor, so expect some 5~10ms of delay from it, but that’s irrelevant here.
Yeah, I’m aware of the other additional methods, and the additional delay on our modern buffered hardware. But thank you for the thorough explanation!
And it seems my numbers also back that up.
If we remove the 5~10ms latency of my monitor and add the known 1 frame delay in Mario (16,6ms) to the KMS and Windows latency, we kinda reach the ~58.33ms best case number you talked about.
Now the real question is, assuming all is working as expected, where are Uncomposited X and Wayland losing that extra frame? I believe performance is not the issue here. There must be an additional frame in a queue somewhere. I think I’ll take this up to the Kwin developers to try and figure out if direct scan-out is working or not.
There’s one additional test I can conduct though. On Wayland, a windowed RetroArch instance should, in theory, have one additional frame of latency VS fullscreen. If the value turns out the same, then it’s a guarantee direct scan-out is not working. But I’ll do it later. Today I’m tired of all this measuring.
Agree, I wouldn’t. But these are double-digit differences though. There’s a 24ms difference between Wayland and KMS on my system, when they should be the same. It’s a pretty big disparity.
Either something is wrong, or maybe I misinterpreted what direct scan-out is supposed to do.