I see, does this happens only with RetroArch? (FPS cap)
I’ve recently built a NES/SNES low-latency build out of a Raspberry Pi 4. I wanted to see what I could achieve with this hardware together with Lakka. I am now running Lakka 3.6 and have finished the setup and run some tests. The setup is:
- Raspberry Pi 4 4GB (CPU overclocked to 1.9 GHz, otherwise stock)
- Flirc case
- Lakka 3.6
- Rasphnet USB to Wii controller adapter + Nintendo SNES Classic controller (with 1 ms USB polling and 1 ms adapter polling)
The system/Lakka settings are:
- /boot/config.txt: arm_freq=1900
- Forced global 1000 Hz polling for USB game controllers via kernel command line (/boot/cmdline.txt): usbhid.jspoll=1
- Set emulator Nestopia for NES
- Set emulator snes9x2010 for SNES
- RetroArch: Audio driver = alsa
- RetroArch: Threaded video = off
- RetroArch: Max swapchain images = 2
- RetroArch: Frame delay = 2
- RetroArch: Run-ahead 1 frame (and use second instance enabled)
- RetroArch: Enabled zfast-crt shader
With these settings, I ran input lag tests on Super Mario World and Mega Man 2. I used my iPhone 12 and recorded the screen and me pressing the jump button at 240 FPS. I then used the app “Is It Snappy?” to carefully analyze the result, with 20 samples for each of the two tested games. Before testing, I calculated what the input lag (in 60 FPS frames) would have been on an original console with a CRT, taking into account the character’s placement on the screen. The expected input lag on a real console+CRT:
Super Mario World:
- Avg: 3.2 (i.e. ~53 ms)
- Min: 2.7
- Max: 3.7
Mega Man 2:
- Avg: 2.1 (i.e. ~35 ms)
- Min: 1.6
- Max: 2.6
For this testing, I used my trusty old 22 inch Samsung LCD TV. From previous tests and comparisons I’ve made with other displays, I’ve determined that this display’s input lag when running at 1080p native resolution is approximately 1.05 frames.
The results of my Lakka tests, after subtracting the known input lag of the display (1.05 frames) are:
Super Mario World:
- Avg: 3.2
- Min: 2.7
- Max: 3.7
Mega Man 2:
- Avg: 2.1
- Min: 1.7
- Max: 2.7
As you can see, these results are within measurement tolerances from the “real deal”. This setup, with a lowly Pi 4, performs like the original console and even manages to have the zfast-crt shader active. The one compromise I had to make was to use snes9x2010 instead of snes9x, but I believe snes9x2010 is actually a quite fine core. I can’t say I’ve tried that many games, but two of the heaviest SNES games, SMW2 and Star Fox, seem to work fine, with no stuttering, audio issues, etc.
Using this with my LG OLED65CX, the average input lag will be approximately 5 ms (0.3 frames) worse than the original NES/SNES running on a CRT. That’s quite okay, right?
Big thanks to the RetroArch and Lakka developers for making this possible. I bow in respect.
Wow, great results, and solid methodology as always. Thanks for reporting!
If I compare to my chain of lag on my PC, and just using gl hard sync 0 or vulkan, I’d say it could be faster by 10ms.
(even 20ms but that’s with vrr or frame delay ~10)
What I mean is simply that I estimated what the input lag from button to pixel would be on the original consoles with a CRT. For example, for Super Mario World:
- Average time from button press until game sampling the input: 0.5 frames
- “Internal lag”, i.e. number of frames where the game does not respond to input: 2
- Scan-out to display (top to bottom, 16,67 ms in total) until reaching the character’s position: 0.7
So, total button to pixel lag on a real SNES with CRT, given the exact same game scene would be ~3.2 frames or ~53 ms.
Well, it only works that way if you’re at parity with the original console before applying run-ahead. One problem with modern frame buffered hardware, at least if you don’t have a VRR display, is that the frame is rendered first and then scanned out to the display. This gives an inherent 1 frame input lag deficit.
In my case, with the Raspberry Pi 4, I start by getting rid of excessive buffering and threading related latency (max swapchain images = 2 and threaded video off). I minimize the USB polling latency by maximizing the polling frequency. I then further reduce the input lag by 2 ms using frame delay. This compensates roughly for the remaining 1-2 ms input lag caused by my controller + adapter. At this point, the system performs 1 frame slower than “the real deal”. I then add 1 frame of run-ahead to reach my final goal of parity with real hardware.
Yep, when I talk frames it’s native display frames, so ~60.09 on SNES and ~60.0 on the Raspberry. I’ve clarified slightly by editing my previous post.
Yes, 53 ms including the Samsung LCD TV. If you’re at 14 ms, you’re significantly faster than the real console ever was. That’s certainly possible nowadays when using run-ahead, but I personally aim to stay as close to the original consoles as possible.
It’s worth mentioning that where and what you test may influence the results. For example, Mega Man 2 responds on the next frame when you’re in the menus, while it responds on the second frame when you’re in game. My tests were both in-game, jumping with the characters.
No issue here, so this is definitely an issue about your os, drivers, or setup. Maybe you should avoid the smug act.
I have the same issue but with an nvidia card (I think @burner is on amd), i’ve read some other people having the same behaviour. I will also say that for me is not a “real” problem because I usually keep vsync on…
Yes, that’s not the first time i hear about people having those 30fps issues, however that’s obviously a os/drivers/setup problem since some other people aren’t affected.
Fuck off. I posted a video made with OBS. That not good enough for you? Seriously, fuck off.
Yeah, I’ve tested a PAL SNES with LED rigged original controller and I also asked the author of Is It Snappy? to test his NTSC SNES. This was years ago. The input lag is still dependent on where you are located on the screen. Once you know the behavior of the real console and the internal latency of the specific game, the resulting lag is easy enough to calculate.
I didn’t have to adjust my method to fit my results. I know since previously how the real console performs, but wanted to show how/why I had setup the Pi 4 to perform the same. I did it the proper way and made an hypothesis of where the Pi 4 ”should be” given my settings. I then went on to test my setup and arrived at these results without having to ajust anything.
I’ve probably run thousands of input lag tests through the years (this is ”my thread”, as you may have noticed). I have a pretty good handle on it at this point.
As i said, that’s not true, my setup run vulkan at 60fps without enabling vsync. Your 30fps issue might be a windows thing (i’m using linux, @RealNC too), i don’t know. Please reflect on your behavior and how you make fun of people who don’t experience your issues, toxicity isn’t welcome here, that’ll be my only warning.
On a sidenote, if there is really an issue preventing vulkan to run above 30fps on windows without vsync, that’s something that should be reported at https://github.com/libretro/RetroArch/issues
In the video he posted, the game was running at 60fps. There was no 30fps lock.
Yeah, it’s definitely a thing on Windows. Besides myself (with an Nvidia GPU) I’ve seen a few others report choppy framerates with vulkan and Vsync off in Windows. Mainly on Discord. I don’t experience any downside to having it enabled with Gsync, so it doesn’t bother me.
If that’s a RA bug and not something common to all vulkan applications running on windows, someone should definitely report this at https://github.com/libretro/RetroArch/issues
Hello again, Brunnis
It’s a pleasure reading you again. Your reports are most welcome here, and they are like fresh air now that certain toxic individuals are trying to pollute this much needed thread with strange Windows/NVidia findings (pure nonsense if you ask me, because a closed source driver on a closed source OS makes anything impossible to be diagnosed, who cares about a fully-closed ecosystem? Who knows what absurd software is it running?).
That said, since we both used to make the most of our Pis in the past, have you tested the Vulkan driver? It’s getting mature and it’s indeed faster by now. Using RetroArch with Vulkan, you can safely disable VSYNC (no special g-sync monitor needed or anything, your old trusty one will do, and you won’t get any tearing) and set max_swapchain to 1. Remember that: Vulkan, VSYNC OFF, max_swapchain=1. Magic, really. Use latest MESA for this (2.3.0 as of this writting). Keep using ALSA for audio, and you can also take the audio latency down to 32ms, use 44100Hz audio and adjust internal audio rate on the emulators accordingly (in FBNeo, for example) so no internal audio resampling has to be done.
You will be surprised by the results with regards to performance and input lag in out beloved working-class microcomputer!
If you need any help for MESA building etc, ask and I will humbly try to help as much as I can. You are the original master in the input lag front testing.
@vanfanel : Hi ! I’m sorry to hijack the topic but I’ve been following it for years
Am currently using RA from a common Linux distro / x86. I’m wondering if it’s possible to use the KMS version of RA without disabling the X server (or Wayland) ?
Also, on this very machine, I’ve just tried Lakka (regular X86 PC, amdgpu Vega 8, open source drivers) and, I do get a smooth scrolling with no VSync / swapchain 1 / vulkan, and it automagically switches to the right frequency, that’s great (50 Hz on the Amiga for instance) - while on X I have to add cores overrides and cannot disable vsync, etc. But there are tons of graphical artifacts and the display explodes in a couple of mins (flickering lines / green screen / black screen). Probably related to the amdgpu driver.
So much progress lately ! (runahead, auto frame delay, KMS, vulkan, improved cores…) Not sure I need my FPGA machine anymore.
Cheers
Thank you for pointing that out. I do hope that people learn not to rely on closed sourced software, because of reasons such as that. We should make Open-source hardware & software a priority, why keep bending over and troubling ourselves with closed-source bs.
Also what exactly is everyone gauging their their results on? It seems like some are using a flawed way of going about it. I wish this issue was properly addressed, but now it has a toxic air to it here instead of a community working together to solving a problem. I can’t even follow this thread anymore because it’s been devolved into misinformation and bickering.
We should be providing as much technical specifications and info for our findings. You can’t properly troubleshoot an issue with a lack of info.
Hi! I don’t consider your comment a hijack at all, since KMS/DRM is supposed to be the low-latency enviroment by default, and this thread is about low-latency enviroments (which of course can’t be guaranteed or replicated on a sane way on a closed system, so even if someone has low latency results on Windows with an NVidia driver, it has no importance or meaning because it’s an empty achievement no one can technically replicate because we lack any meaningful information, as @Joystick2600 pointed out).
That said, KMS/DRM and X11 are mutually exclusive: you can run a KMS/DRM program on a different TTY, however, without killing the X11 server.
Wayland uses the same interfaces to access the hardware that KMS/DRM uses, so you should have the same latency on Wayland that you would get on KMS/DRM, I guess.
Wayland is great for desktop systems and KMS/DRM is for embedded-like GNU/Linux scenarios, where a machine runs RetroArch or any SDL2 games (SDL2 also supports KMS/DRM but libretro/RetroArch is WAY better for emulation) and nothing more.
So, if you want to have a desktop on the same system installation where you use your emulators, use Wayland.
If you are like me and don’t need a desktop system to play, use KMS/DRM.
That’s what I can recommend you
About FPGAs, well, I love both RetroArch on KMS/DRM on a system booting in 1 second without ANY services (only GNU/Linux makes it possible!) but I also love FPGA solutions. Hardware replication is beyond awesome, even if for playing MegaCD all you need is a Pi4 with RetroArch on KMS/DRM and a single buffer configuration without vsync.
@Joystick2600 Let’s reconduce this important thread then! It seems the Windows/NVidia person polluting it has got tired at last. If we ignore toxic people, it can be a great source of information again, now that sir @Brunnis is at it again!
The lowest latency environment tested is on a Windows PC, it’s as simple as that. This is just dependent on your settings and your hardware configuration, so the notion that this “can’t be guaranteed or replicated on a sane way on a closed system” is absurd.
The elitism in here is honestly quite embarrassing, especially when Linux is not yet a realistic or viable option for general gaming, and only accounts for about 1% of the gaming market (according to Steam surveys)
This isn’t about elitism, this is about you trolling someone for no reason, aside from being narrow-minded.
And who cares about what steam says ? Who talked about general gaming ? Linux is a very popular platform for emulation, especially with all the linux-based projects born over the last decade (lakka, retropie, recalbox, …) and sbc devices (raspberry pi, odroid, …). RetroArch is also available on lots of other platforms (macos, old and new consoles, smartphones, tablets, tv box, …). I wouldn’t be surprised if windows users were actually a minority as far as retroarch usage is concerned.