Ahhh, great info! That’s pretty exciting that we’re only 1 frame behind hardware already.
Thanks for going through the trouble to run these tests.
Ahhh, great info! That’s pretty exciting that we’re only 1 frame behind hardware already.
Thanks for going through the trouble to run these tests.
Amazing! How about RetroArch on a CRT? You can do it using the composite out on the Raspberry PI. Maybe that will even shave of the last frame.
Why choose SNES9x2010? Does the SNES core have much to do with latency?
Interesting. DId you use the PAL Yoshi’s Island rom with Retroarch? Wouldn’t it have been better to use the BSNES core since it has less latency?
Ideally, I’d like to test my J4205 system using a CRT monitor, but I don’t have access to one anymore. But a Raspberry Pi test on the TV could also be interesting.
I use it on my comparatively weak J4205 system because it offers enough compatibility and allows me to use some frame delay. There’s no difference in latency between snes9x2010, snes9x and any of the bsnes cores.
The results for the J4205 system were done previously (late last year) and I used the NTSC ROM. There’s no difference in latency between the bsnes cores and snes9x2010. Where did you hear that?
i believe composite out on a pi adds some processing, but using the GPIO is better. something like this: http://www.rgb-pi.com/ (there are a few different 3rd party solutions and maybe one is better than the others)
some discussion on this here: https://github.com/raspberrypi/firmware/issues/683
i’m not entirely sure where i got the idea that composite adds processing vs GPIO - that is likely wrong. i suppose with the latency test setup you could find out
Thx again Brunnis for this test. I think I have to try a real pc instead of the pi3. On the pi3 it never felt relay smooth for me. But maybe my TV is having to much lag.
do you recall what the situation was with raspberry pi input lag re: dispmanx vs GL(ES) vs DRM?
i recall dispmanx was previously a frame or so better than GL(ES), but then there was a DRM driver that may or may not be viable (i guess it needs the v4 kms driver to be the default in raspbian?). plus i see that the dispmanx driver had some recent changes regarding buffering that could affect things.
if you still have a pi setup, it might be worth retesting. it would be nice to have some sort of definitive answer.
BTW thanks for all of this! still the best input lag investigation around so many myths around input lag and emulation.
The Dispmanx and DRM drivers are one frame faster than the default GLES driver on the Raspberry Pi. Those two drivers match the fastest drivers I’ve tested on the PC side. However, the PC can improve input lag by 1-2 additional frames, since it’s fast enough to use video_max_swapchain_images=2 (improves input lag by one frame) and video_frame_delay (improves input lag by 0-0.9 frames). In the case of my Pentium J4205 system, it’s fast enough to improve the input lag by ~1.5 frames over the RPi.
The Dispmanx driver is slightly faster than the DRM driver (processing wise), which means that in some situations the DRM driver will run at less than 60 FPS while the Dispmanx driver runs at full framerate.
The recent changes that were made shouldn’t affect input lag. I can see that the Dispmanx driver was changed in February to disable triple buffering completely, due to stability reasons. That was not very good for performance reasons, since it’s the same as using video_max_swapchain_images=2 and the RPi is too slow to handle it. I can see this was reverted on April 29, though, so the Dispmanx driver should now be back to how it worked previously (when I did my tests).
Here are approximate input lag results (based on previous testing), using my HP Z24i monitor, snes9x2010 and Yoshi’s Island:
Raspberry Pi with default BCM (GLES) video driver: 7.2 frames Raspberry Pi with Dispmanx video driver: 6.2 frames Raspberry Pi with DRM video driver: 6.2 frames
The figures above can be improved in the case of the Dispmanx and DRM drivers, using video_max_swapchain_images=2 and/or video_frame_delay. Both are likely to cause performance issues, though. This is where more powerful x86 hardware comes in handy and it’s the reason I built my dedicated Pentium J4205 system for RetroArch. That’s still not a super-fast system, but it’s much faster than the RPi and it’s completely passive.
I was basing this off of your first input lag investigation where you say “However, testing on Windows suggests that bsnes-mercury-balanced is quicker than snes9x-next”. However, I didn’t look closely enough at your updated input lag investigation where you improved your test methodology and I can see that in the new tests, input lag for Yoshi’s Island with Snes9x-next Bsnes-Mercury balanced is basically the same.
Note with the way the dispmanx driver works now, this no longer holds true. “Triple” buffering is really just double buffering, the 3rd frame is never used so video_max_swapchain_images=2 is equivalent to video_max_swapchain_images=3 performance wise.
I’m not sure why it allocates the useless 3rd buffer, I just went in and fixed the race conditions so I could turn on the “triple” (actually double) buffering support again since after the revert dispmanx was unusably slow.
In the end, the dispmanx driver is quite wonky but it works well enough for low latency 8 bit era console emulation, which is really all I personally use my RPI3 for. I was tempted to do a full rewrite of the driver, including text rendering support for retroarch messages, but I’m not sure it’s really worth the time.
For anything 16/32 bit, folks should really just get a low end intel. The RPI3 is certainly cheap, but it provides a compromised experience for anything other than 8 bit console emulation.
EDIT: To be clear, when I say compromised, I’m referring to input lag. If you don’t care/notice input lag, the RPI3 can perform quite well.
EDIT 2: I went ahead and fixed the useless 3rd buffer thing and restored “max_swapchain_images” to the way it (correctly) worked originally: https://github.com/andrewlxer/RetroArch/commit/d0b54f97aa8ca30e0fbf0b2f7f71eff89f100d83
I’ll submit upstream after some stability testing.
Ahh, I see, that explains it.
I was just going to write a post saying that if you rewrote the dispmanx driver to have the same performance with max_swapchain_images set to 2 and 3, input lag would have to have increased in the case of max_swapchain_images=2. I see that you’ve fixed that now, which is great (I haven’t tested it, though).
Have you confirmed that the driver is back to its previous state of lower performance when using max_swapchain_images=2? If so, that’s a good sign that it’s working correctly and the input lag has decreased, since the lower performance is a necessary side effect.
I have a RPi2 and it isn’t fast enough to handle SNES with low-latency settings. Threaded video especially causes huge lag. So a RPi3 doesn’t significantly improve on that? That’s disappointing.
The RPi3 is fast enough to not have to use threaded video. However, it’s not fast enough to use video_max_swapchain_images=2 in all games (though some may work okay). Frame delay is not worth bothering with on any RPi.
Hmm well it’s a similar situation then. I can turn off threaded video in some 16-bit games but not others. It’s not much fun optimizing for every single game though. Dispmanx helps too - but not having CRT shaders is a big trade-off.
I remember reading about some experimental new GL driver for RPi, any idea if anything ever came of that?
Unfortunately, the VC4 driver still seems to be under development and I don’t know when it will actually become the standard driver in Raspbian.
Could you test this on Retroarch mode KMS, please??
The test was done in KMS mode. I’ll add that to the post.
How does Windows 10 (with the proper settings, windowless fullscreen, hard gpu sync, etc) fare against Linux KMS mode these days?
I’ve measured exactly the same results on Windows 10 as Linux in KMS mode. However, it depends on the GPU drivers, so each driver needs to be tested to know for sure, really. I have run into GPU drivers that would perform worse, input lag wise, on both Windows and Linux. In the Windows case, it was a new AMD driver that suddenly introduced 1-2 frames of extra input lag. I reported this to AMD, but don’t know if they ever fixed it. In the Linux case, it was a new Intel GPU driver that seemed to require so much more system resources that I had to turn down the settings in RetroArch (swapchain_images, frame delay). I think it was when I upgraded from kernel 4.8 to 4.10 I saw this regression. I ended up rolling back to 4.8.
So, unfortunately, things are a bit volatile when it comes to input lag. In my case, I ended up just building a dedicated box, for which I confirmed low input lag through measurements, and which I intend to keep static for years (i.e. no OS/driver updates). That way I will at least know that input performance is guaranteed.