An input lag investigation

Brunnis · 15 January 2017 07:28

You know what I’m doing now (well, in 10 secs)? Actually gaming, not performance testing, on my RetroPie setup.

Heffer · 15 January 2017 07:28

What settings are you using on your Raspberry Pi to get minimal input lag in addition to the emulator lag fixes?

These are my settings:

Raspberry Pi 3 [B]arm_freq=1200 core_freq=500 sdram_freq=500 gpu_freq=400

runcommand.sh cpu setting = performance

Input driver = udev Joypad driver = udev Video driver = dispmanx Audio driver = sdl2

Audio Latency = 32 Audio Output Rate = 44100[/B]

Vsync = On - games tear and stutter way too bad for me with this disabled Frame Delay = Depends on game - I set it as high as I can before audio starts to crackle HW Bilinear Filtering = Off

All shaders and overlays disabled

I’m able to get most NES games up to about frame delay 10 but SNES games can vary from 1 to 7.

Brunnis · 15 January 2017 07:28

[QUOTE=Heffer;41871]What settings are you using on your Raspberry Pi to get minimal input lag in addition to the emulator lag fixes?

These are my settings:

Raspberry Pi 3 [B]arm_freq=1200 core_freq=500 sdram_freq=500 gpu_freq=400

runcommand.sh cpu setting = performance

Input driver = udev Joypad driver = udev Video driver = dispmanx Audio driver = sdl2

Audio Latency = 32 Audio Output Rate = 44100[/B]

Vsync = On - games tear and stutter way too bad for me with this disabled Frame Delay = Depends on game - I set it as high as I can before audio starts to crackle HW Bilinear Filtering = Off

All shaders and overlays disabled

I’m able to get most NES games up to about frame delay 10 but SNES games can vary from 1 to 7.[/QUOTE] Only things I’ve changed on RetroPie are:

Use dispmanx video driver for 1 frame less input lag.
Set NES frame delay to 6 (could perhaps extend a little further). I don’t use frame delay on SNES, since even a setting of 2 caused audio issues in Yoshi’s Island.
video_threaded = ‘false’. Probably no effect on input lag.
video_smooth = ‘false’. Shouldn’t affect input lag.

The only settings I know for sure affect input lag on the Raspberry Pi are the video driver and the frame delay setting.

Heffer · 15 January 2017 07:28

[QUOTE=Brunnis;41875]Only things I’ve changed on RetroPie are:

Use dispmanx video driver for 1 frame less input lag.
Set NES frame delay to 6 (could perhaps extend a little further). I don’t use frame delay on SNES, since even a setting of 2 caused audio issues in Yoshi’s Island.
video_threaded = ‘false’. Probably no effect on input lag.
video_smooth = ‘false’. Shouldn’t affect input lag.

The only settings I know for sure affect input lag on the Raspberry Pi are the video driver and the frame delay setting.[/QUOTE]

I haven’t done any testing but it “feels” as though there is some input lag tied to the audio latency. With audio set to 32 instead of 48 or 64 it seems more responsive since the audio and video are synced.

I can’t figure out how to do the pause test method on the RPi as I have to use the hotkey enable button to increment the frames and it won’t allow me to input any button presses while the hotkey button is held.

hunterk · 15 January 2017 07:28

Audio latency does matter but it’s a perceptive issue rather than a visual/response issue in this case. If you see and hear 2 things that are supposed to happen at the same-ish time (within an approx. 80 ms window), your brain says “these two things happened at the same time,” matched to the slowest response. So, if you’re feeling like it’s more latent with higher audio latency, you should be able to mute it and feel the latency melt away, as if by magic.

So, you’ll want to get your audio latency down as low as possible, as well, to get the best feel. The JACK audio driver should allow for very low audio latency in Linux-based systems.

bidinou · 15 January 2017 07:28

About overclocking… As the RPi 3 tends to heat a lot, it might have the opposite result after some minutes / hours as it’ll slow down if overheating. Hmm, although only 1 core is used and although the frame delay means even this one core is not used at 100%.

I couldn’t live without the shaders, though (crt-pi !). Maybe I could get used to a scaler with no bilinear and with good scanlines.

Thanks for sharing all this :)I used to be frustrated for months / years ago because so many people said they noticed no lag.

Heffer · 15 January 2017 07:28

[QUOTE=hunterk;41884]Audio latency does matter but it’s a perceptive issue rather than a visual/response issue in this case. If you see and hear 2 things that are supposed to happen at the same-ish time (within an approx. 80 ms window), your brain says “these two things happened at the same time,” matched to the slowest response. So, if you’re feeling like it’s more latent with higher audio latency, you should be able to mute it and feel the latency melt away, as if by magic.

So, you’ll want to get your audio latency down as low as possible, as well, to get the best feel. The JACK audio driver should allow for very low audio latency in Linux-based systems.[/QUOTE]

Anything special needed to get the JACK driver to work with RetroArch on an RPi 3? I’ve set the driver to JACK and set the audio_device to hw:0,0 for the analog jack which is what I’m using and it just seems to crash when launching an emulator. I’m able to successfully start jackd manually from the shell.

EDIT: Never mind I got it working. I had to use: audio_device = “system:playback_1,system:playback_2” instead of hw:0,0

bidinou · 15 January 2017 07:28

Can you set the audio latency significantly lower with jack enabled (as it can already be lowered with Alsa) Is the tradeoff in terms of cpu time worth it ?

Edit : here, with ALSA + a QAudio DAC+ sound addon, lowering the audio delay makes it necessary the reduce to video_frame_delay setting. So the right tradeoff between input and sound delays has to be figured out. No overclocking here. Methinks input latency prevails.

tekn0 · 15 January 2017 07:28

[QUOTE=bidinou;41887]About overclocking… As the RPi 3 tends to heat a lot, it might have the opposite result after some minutes / hours as it’ll slow down if overheating. Hmm, although only 1 core is used and although the frame delay means even this one core is not used at 100%.

I couldn’t live without the shaders, though (crt-pi !). Maybe I could get used to a scaler with no bilinear and with good scanlines.

Thanks for sharing all this :)I used to be frustrated for months / years ago because so many people said they noticed no lag.[/QUOTE]

I agree, CRT-pi is great on the PI3. Try editing the crt-pi.glslp file and change filter_linear0 = “true” to filter_linear0 = “false”.

This sets nearest neighbor and really sharpens the image nicely.

bidinou · 15 January 2017 07:28

You can already set #define sharper, is that similar ? (I know, I can figure it out by myself :D)

What is great is that the shader developer got the most out of the Pi while maintaining 60 fps.

Tatsuya79 · 15 January 2017 07:28

I’m back from several days away. I tested the new bsnes code and I see no problem in games with overscan.

Those I found, they are few:

-The Blues Brothers -Dragon Quest I & II -Dragon Quest V

(I checked the 1st and last scanlines for any problem, took some pics: no difference with standard bsnes.)

Brunnis · 15 January 2017 07:28

[QUOTE=Tatsuya79;41937]I’m back from several days away. I tested the new bsnes code and I see no problem in games with overscan.

Those I found, they are few:

-The Blues Brothers -Dragon Quest I & II -Dragon Quest V

(I checked the 1st and last scanlines for any problem, took some pics: no difference with standard bsnes.)[/QUOTE] Great, thanks! I’ll reference your post in the pull requests for both bsnes and bsnes-mercury. Still waiting to hear back from Alcaro after having implemented his suggestions.

Heffer · 15 January 2017 07:28

[QUOTE=bidinou;41913]Can you set the audio latency significantly lower with jack enabled (as it can already be lowered with Alsa) Is the tradeoff in terms of cpu time worth it ?

Edit : here, with ALSA + a QAudio DAC+ sound addon, lowering the audio delay makes it necessary the reduce to video_frame_delay setting. So the right tradeoff between input and sound delays has to be figured out. No overclocking here. Methinks input latency prevails.[/QUOTE]

JACK seems to bog the emulator down too much even at the default 64ms audio latency. I’ll stick with ALSA for now. How low can you get the latency with the DAC? I’m currently running ALSA at 32ms.

Brunnis · 15 January 2017 07:28

[QUOTE=vanfanel;41198]On the RA/dispmanx side of things, while you build your own RA executable from github sources, you can change the dispmanx_surface_setup() call that starts on line 447 into this:



dispmanx_surface_setup(_dispvars, 
            width, 
            height, 
            pitch, 
            _dispvars->rgb32 ? 32 : 16,
            _dispvars->rgb32 ? VC_IMAGE_XRGB8888 : VC_IMAGE_RGB565,
            255,
            _dispvars->aspect_ratio, 
            2,
            0,
            &_dispvars->main_surface);

As you can see, I simply changed 9th parameter from 3 to 2. That will make it use a double buffer instead of a triple buffer. That could make a difference on the input lag. The main thread will be blocked more often by the lack of free buffers to draw into, but emulation will be kept from running an additional loop in advance, which should reduce the time between new info being feeded to the core and the results being visible on screen. Again, this is how I see it on my head: I may be wrong. I just designed this threaded, triple-buffered approach to get the max of the Pi1 weak CPU which had to time to be wasted waiting for vsync on the main thread, YET I wanted smooth scroll with no tearing. I succeeded, but it’s not academic, it’s all my guess and I can be wrong. I can change the sources for you on github if you want and build a version for you to test, just ask me and I will help you as much as I can.[/QUOTE] It took me a while, but I finally got around to doing this. Long story short: I compiled with double buffering and ran a camera test. No change at all.

So, it’s back to the drawing board. The Linux graphics sub-system is definitely uncharted territory for me, so I think someone else will have to look into this some more (I simply don’t think I have the time to go deep into this…). My hypothesis is that it’s the graphics sub-system that’s causing additional lag compared to Win 10. It’s a pretty nice and even one frame (16.67 ms) that differs between the two operating systems, so I don’t think it can be explained by differences in, for example, the input handling. There’s probably some buffering happening somewhere, but is it in user-space, drivers or firmware?

vanfanel · 15 January 2017 07:28

@brunnis: thanks a lot for trying. So, if there’s no difference between double or triple buffering, I don’t know what could be the problem on the dispmanx video driver side…

I understand there is another extra latency frame with the GLES driver on the Pi, compared to dispmanx, right?

I have just made a pull request for a new graphics driver, called “plaindrm”. You have to boot the Pi in the new, experimental KMS mode (add “dtoverlay=vc4-kms-v3d” to config.txt) and rebuild RA with --enable-plaindrm. It should be pretty good for latency, and it’s not based on dispmanx but on the “standard” low-level graphics stack. Can you try it? I can provide a Pi3 binary if you don’t want to go though the building process: you do a very good work with testing and I don’t want to take your time away.

Brunnis · 15 January 2017 07:28

Yup.

[QUOTE=vanfanel;41974]I have just made a pull request for a new graphics driver, called “plaindrm”. You have to boot the Pi in the new, experimental KMS mode (add “dtoverlay=vc4-kms-v3d” to config.txt) and rebuild RA with --enable-plaindrm. It should be pretty good for latency, and it’s not based on dispmanx but on the “standard” low-level graphics stack. Can you try it? I can provide a Pi3 binary if you don’t want to go though the building process: you do a very good work with testing and I don’t want to take your time away.[/QUOTE] Sounds great! I’ll test, but I’m having a bit of an issue with booting the system after adding the vc4 dtoverlay… With the default RetroPie 3.8.1 image, it stops during boot, alternating between three messages:

A start job is running for LSB: Switch to ondemand cpu governor (unless shift key is pressed) A start job is running for LSB: Raise network interfaces. A start job is running for Load/Save RF Kill Switch Status of rfkill0

After a while, the second message disappears and it keeps alternating between the remaining two. Finally, after 10-15 minutes, the system hangs on the “Switch to ondemand cpu governor” message.

So, I disabled the dtoverlay line in config.txt and proceeded to run rpi-update to update the firmware and then apt-get install update and dist-upgrade. Once again tried to activate the vc4 driver and got a different behavior. The system now hangs with this:

The marker at the bottom is not blinking. Before getting to this screen, the line “map: vt02 => fb0” was showing.

When I pulled the plug and rebooted the system again, it just hung at the “map: vt02 => fb0” line instead.

Any ideas?

EDIT: Ohh, and I haven’t recompiled RA yet. Just wanted to see if I could at least boot with the new driver. Maybe that’s the cause of the issue I’m having right now. I’ll see if I can compile and test again.

EDIT: So, I rebuilt RetroArch from the master (I saw that your PR was just merged). I compiled with --enable-kms and set the video_driver in retroarch.cfg to “drm” (is this correct?). I then activated the vc4 driver again and rebooted. Unfortunately, I ran into almost the same issue as before. In addition to the lines from the photo above, I also got:

[ OK ] Started File System Check on /dev/mmcblk0p1. Mounting /boot…

And then nothing. :-/

vanfanel · 15 January 2017 07:28

@Brunnis: I don’t know RetroPie, nor do I care about it…Maybe it’s not ready for KMS/DRM, but I could pass you a ready-to-use Raspbian image wich will work out of the box with the experimental KMS/DRM driver. What medium could we use to pass you the image? The configure option is --enable-plaindrm (or --enable-drm after backporting, I think?)

Brunnis · 15 January 2017 07:28

[QUOTE=vanfanel;41989]@Brunnis: I don’t know RetroPie, nor do I care about it…Maybe it’s not ready for KMS/DRM, but I could pass you a ready-to-use Raspbian image wich will work out of the box with the experimental KMS/DRM driver. What medium could we use to pass you the image?[/QUOTE] Yep, I don’t know exactly what they’ve changed in the RetroPie setup compared to default Raspbian, so probably better to just test with your image. Easiest would be if you could just put it on a service like Google Drive, Dropbox, mega.nz, etc. and give me a link to it. Or if you have an FTP you’d be willing to put it on.

When I looked at the commit, it seemed like they changed it to --enable-kms. Can’t find any reference to “drm” in config.params.sh.

bidinou · 15 January 2017 07:28

The problem is that you need more CPU power, thus to reduce the video_frame_delay setting.

With a simple game, I managed to get a 24 ms latency with the QAudio Dac+. 16 ms didn’t work. This setting is almost OK with the Neo Geo but with video_frame_delay 0 instead of 9. So one has to choose between reducing the audio latency by 40 ms or reducing the input latency by 9 ms (RPi 3 with no overclocking)

dave_j · 15 January 2017 07:28

[QUOTE=bidinou;41932]You can already set #define sharper, is that similar ? (I know, I can figure it out by myself :D)

What is great is that the shader developer got the most out of the Pi while maintaining 60 fps.[/QUOTE]

Sharper will be some way between the default and tekn0’s suggestion. You could think of them as vaguely resembling:

Default = Small CRT TV fed through SCART Sharper = CRT computer monitor Tekn0’s suggestion = LCD with scanlines

Shaper will cause PI0/1/2s to drop frames but Pi3s might be OK since they clock the GPU faster. (You could also overclock the GPU on earlier Pis.)

If you prefer the look of Tekn0’s suggestion you’re probably better off using the dispmanx driver and an overlay for scanlines since you’ll avoid the extra frame of input lag.

On the subject of input lag…

Whilst immediate mode GPUs can start both vertex and fragment shading as soon as they get a drawing command, tile based ones need to have all the geometry passed to them (and vertex shaded) before they can begin fragment shading. This means they’ll only begin fragment shading once they have been told to there’s no more geometry coming - usually by calling swap buffers. It might be possible to trigger fragment shading earlier by calling glFlush as soon as you’ve sent the last bit of geometry but I don’t know how that would fit with retroarch’s displaying of menus/text overlays/etc.