An input lag investigation

jnsl · 4 May 2017 07:30

Hmm well it’s a similar situation then. I can turn off threaded video in some 16-bit games but not others. It’s not much fun optimizing for every single game though. Dispmanx helps too - but not having CRT shaders is a big trade-off.

I remember reading about some experimental new GL driver for RPi, any idea if anything ever came of that?

Brunnis · 4 May 2017 07:42

Unfortunately, the VC4 driver still seems to be under development and I don’t know when it will actually become the standard driver in Raspbian.

Mauricio · 4 May 2017 11:49

Could you test this on Retroarch mode KMS, please??

Brunnis · 4 May 2017 11:59

The test was done in KMS mode. I’ll add that to the post.

TylerL · 4 May 2017 12:47

How does Windows 10 (with the proper settings, windowless fullscreen, hard gpu sync, etc) fare against Linux KMS mode these days?

Brunnis · 4 May 2017 15:42

I’ve measured exactly the same results on Windows 10 as Linux in KMS mode. However, it depends on the GPU drivers, so each driver needs to be tested to know for sure, really. I have run into GPU drivers that would perform worse, input lag wise, on both Windows and Linux. In the Windows case, it was a new AMD driver that suddenly introduced 1-2 frames of extra input lag. I reported this to AMD, but don’t know if they ever fixed it. In the Linux case, it was a new Intel GPU driver that seemed to require so much more system resources that I had to turn down the settings in RetroArch (swapchain_images, frame delay). I think it was when I upgraded from kernel 4.8 to 4.10 I saw this regression. I ended up rolling back to 4.8.

So, unfortunately, things are a bit volatile when it comes to input lag. In my case, I ended up just building a dedicated box, for which I confirmed low input lag through measurements, and which I intend to keep static for years (i.e. no OS/driver updates). That way I will at least know that input performance is guaranteed.

Mauricio · 4 May 2017 17:17

tnks very much. I would like to see it.

Brunnis · 4 May 2017 18:09

I’m sorry, but you misunderstood me! The original results (i.e. 4.6 frames input lag) are from running in KMS mode.

Mauricio · 4 May 2017 18:23

Do you already try to performer the test by compiling your own kernel and retroarch??

It is a good idea if you compile the kernel and setup timer @1000Hz and setup your cpu. Another thing is that, I always compile retroarch and its cores from github with the following flags. The same to the kernel as well.

By now, I don’t have any complain about it.

pd: if you see any error, my english it is not my main languague.

andrewlxer · 5 May 2017 00:56

Yes, confirmed that the performance drops, and the lag ends up being reduced by 1 frame (as expected) both in measurements and when playing NES Bucky O’Hare (which is essentially impossible to play with any lag).

Note I didn’t rewrite the swapchain stuff that resulted in the 2/3 confusion, that was the original author of the driver (vanfanel). Not sure of the intent of those changes.

andrewlxer · 5 May 2017 04:05

Note, the max_swapchain_images fix has been merged:

The next RetroArch release (1.5.1) should have all of the stability/performance fixes. If you see further stability issues open a bug with details.

vanfanel · 26 May 2017 09:57

@brunnis, andrewlxer and all: Hello again, guys! Time since I don’t come by this thread!

Andrewlxer: thanks for your fixes on the dispmanx driver! I thought it was better left as a low-latency-only driver, so I didn’t bother fixing the stablity problems and made it “simpler” because of that: for other needs I don’t share (performance over latency, in retro gaming, has no place for me) I thought the GLES driver was enough.

Please note I always use max_swapchain=2, dispmanx driver, plain ALSA audio, and linuxraw for joystick. I don’t have the means to measure input lag, so I chose these fixed configuration options (where we know what input lag we have thanks to Brunnis and his VERY interesting experiments) and measure performance.

Someone mentioned that making the kernel timer 1000Hz could give some results: well, at least in a systen with no CPU usage apart from RetroArch, it does not make any difference regarding performance. I have been experimenting with CPU isolation, leaving 1/2 cores of the Pi3 just for RA, but again, since no other processes are using CPU, that’s not making any difference either. Realtime RR scheduling does not make any difference either. So, in a system dedicated to RA without other non-kernel threads running, experiments which involve kernel rebuilding with custom configuration options is not worth it.

However, recently I got an interesnting patch merged into the Raspberry Pi kernel by the Pi kernel guys (4.9 brach, which is the branch you get when you do rpi-update):

This patch enables custom polling frequencies for the joysticks: passing jspoll=2 in cmdline.txt would give a 500Hz polling rate for the joystick. Not bad. It can be verified using evhz, a very simple program found here:

So, Brunnis, could you do a dispmanx input lag test with this, please?

Nesguy · 9 June 2017 00:43

Hey Brunnis, awesome work! Input lag has long been a concern of mine, but it looks like the issue of input lag in emulators has essentially been solved! I was excited enough by the results of your experiments that I had to come out of lurking. A couple questions-

I don’t think you mentioned if you set hard GPU sync frames to 0 or 1 in your experiments. I assume 0?

You mentioned the Apollo Lake NUCs as maybe being good for a low end, low latency machine. However, I’ve been unable to set hard gpu sync to 0 frames without it causing unbearable slowdown on my NUC6CAYS running Windows 10.

What is the difference in input lag, if any, between a setting of 0 and a setting of 1? In your opinion, is it worth investing in more powerful hardware to run hard gpu sync at 0 instead of 1? Also, would x-less Linux on the same machine have less input lag? Thanks again for your work

Brunnis · 9 June 2017 14:09

@vanfanel Sorry for the slow response, vanfanel. That’s great work getting this into the RPi kernel! I’ll try to test this, but it might be a while due to other things going on in my life at the moment. 500 Hz vs default 125 Hz should shave off another 3 ms (0.18 frames) from the average input lag and 6 ms (0.36 frames) at most. Every little bit counts.

@Nesguy Thanks! In my Windows tests (which is where Hard GPU Sync applies), I’ve always used a Hard GPU Sync Frames setting of 0, as you assumed. I haven’t tested with a setting of 1, but it “should” add a full frame period worth of input lag, i.e. 16.7 ms. I would personally invest in a machine that can handle Hard GPU Sync Frames = 0, but if the system is otherwise setup correctly and you have a very low input lag screen, it may not be worth it.

Regarding Apollo Lake, I made a quick test with Windows 10 on my Asrock J4205-ITX based system a long time ago and I believe I got good performance with Hard GPU Sync Frames = 0. On Linux, I use Max Swapchain Images = 2 (similar to using Hard GPU Sync), together with Frame Delay = 8 and things work great. However, I only use Nestopia and Snes9x2010.

There are several things that could cause your performance issue, for example:

Display driver differences (compared to when I tested)
Power saving settings (try setting “High Performance” power plan in Windows)
Maybe you’re using more demanding emulators?
Have you changed the Frame Delay setting to anything other than its default value (0)? This is a very demanding setting.
The NUC6CAYS uses a slightly slower Apollo Lake variant than the J4205-ITX. It’s worth mentioning, but I don’t think that’s what causes your issues.

BlockABoots · 10 June 2017 21:20

What about the new RawInput input driver introduced with RA 1.6?, is that based on the work you have done Brunnis?, or does your method produce less lag?

Brunnis · 11 June 2017 08:34

Not based on anything I’ve done. It’s a different driver to handle input and doesn’t “conflict” with any of the stuff I’ve been doing/testing. If it does give lower input lag it will do so in addition to the stuff I’ve found. Nobody has tested the effect of this yet, though.

Nesguy · 14 June 2017 23:37

Hey Brunnis, it was the frame delay setting- I had it set to 5 instead of 0. Switched it back to 0 and now SNES emulators run fine at hard gpu sync 0 frames. Everything is awesome now, thanks!

I also set the intel built-in graphics settings to “high performance” under the power settings, just in case that makes a difference.

Nesguy · 15 June 2017 22:21

Has any input lag testing been done with overlays? Do you think using overlays will increase input lag?

I’d also be interested in knowing if the CRT-Pi shader introduces any input latency, since it’s the only CRT shader that will run full speed on the NUC6CAYS. I tested it using Fudoh’s 240p Test Suite manual lag test, and found no difference in latency between using the crt-pi shader vs. no shader, with an average latency of less than one frame (16ms) in both cases. Sooo… that’s pretty awesome. It’d be nice to confirm this with a more scientific test, though.

Nope, not getting any work done today!

fbs777 · 17 June 2017 23:04

Hi, im curious about 2 things

1- what is the diff between the retroarch/libreto snes cores input lag and the original snes9x/zsnes emulators input lag?

2- what about the input lag on pi3 using the gpio? Its lower than usb control?

Brunnis · 2 July 2017 12:15

Time for another small update! I’ve just tested the impact on input lag from:

a) Shaders b) “raw” input driver

I used the same test procedure as always (see original post in this thread), using a Core i7-6700K @ 4.4 GHz and a GTX 1080. RetroArch 1.6.0 was used and testing was performed using Windows 10 and OpenGL. 25 samples were taken for each test case.

Shaders

[Input lag below reported as number of frames at 60 FPS]

No shaders: 5.21 avg / 4.25 min / 6.00 max crt-royale-kurozumi (Cg): 5.13 avg / 4.25 min / 6.00 max crt-geom (Cg): 5.22 avg / 4.00 min / 6.25 max crt-geom (GLSL): 5.08 avg / 4.00 min / 6.00 max

There was no difference at all in the amount of input lag between no shader and using shaders. The average, minimum and maximum measured input lag was the same (within measuring tolerances). This means you can use shaders without worrying about introducing extra input lag.

For another data point, I also tested the crt-aperture GLSL shader on my Pentium J4205 system running Ubuntu 16.10 in DRM/KMS mode, using the built-in Intel graphics. I measured input lag with my usual test routine and just like my tests of the other shaders on the GTX 1080 in Windows, input lag performance remained unchanged after activating the shader.

One thing to remember, though, is that running the shader passes takes additional time. In other words, the time required to generate each frame will increase. If you’re using the Frame Delay setting to reduce input lag, you will likely have to decrease the value in order for your computer/device to still be able to finish rendering the frame on time. With my i7-6700 @ 4.4GHz and GTX 1080, I had to turn frame delay down from 12 to 8 when using the crt-royale-kurozumi shader, therefore increasing input lag by 4 ms.

So, while shaders themselves don’t add any extra input lag, the increased processing time might force you to reduce the frame delay which will have a small impact on input lag. The good news is that you’ll know exactly by how much your input lag increases since it corresponds to the amount of milliseconds frame delay you have to remove in order to retain 60 FPS.

EDIT: There might be more to this than I initially thought. See post below by hunterk where he shows results from his own testing, clearly showing a negative impact on input lag with some shaders. I have not been able to reproduce this, despite running additional tests (this post has been updated with those additional results).

The “raw” input driver

The raw input driver was introduced in RetroArch 1.6.0 and the hope was that this driver would reduce input lag. Until today, however, no tests had been run comparing it to the default dinput driver.

Unfortunately, my tests show that the raw input driver provides zero difference in input lag. At least it’s not measurable with this test method and equipment.

By the way, on a completely unrelated note, why does the menu shader get deactivated whenever you load a shader preset? Seems strange that such a basic thing as using a shader disables the beautiful shader used for the menu background…