An input lag investigation

@Brunnis Nice work, I would have loved to see the results for the SNES classic itself though! :slight_smile:

I bought one for the controllers as well (to use with Retroarch Wii U and PC), but I’ve found that I really like having a dedicated SNES box that turns on almost instantly. So I find I’m using it more these days (strictly with Nintendo’s Canoe emulator) but I’ve been wondering about the input lag, although it does “feel” ok to me.

It’s not scientific at all, but I get similar results from the 240p test suite’s manual lag test on my SNES Mini and Nvidia Shield ATV.

I also tried passing a bunch of canoe options that specifically mention affecting latency and saw no improvement in performance vs the default options.

Great work Brunnis and I really appreciate all the work you’ve compiled up here. So in the end do you feel it’s better to have lower input lag or accuracy? I know it’s a subjective question but since you’ve dedicated so much time I figured you’d have some sort of valid opinion.

Also which cores are the ones you’d recommend sacrificing accuracy for speed? perhaps we should make a list for everyone to make that decision themselves. That list with all the relevant lag settings should be stickied in this forum to help all newer people under exactly what these settings do and what combination would work best for their setup.

@Brunnis: Did you test the current RetroArch on the Raspberry Pi 3 using the GL driver with the new working max_swapchain=2 setting? If I am right, we should be have the same low-latency input that we have on the dispmanx driver with max_swapchain=2… but now on the GL driver!

Also, did you do input latency tests on the Snes Mini itself? How good is it?

Yeah, I’ve looked at it and did consider getting one, but decided I don’t want to spend that money right now. Also, would be nice if the device didn’t have hard coded resolution, but rather could default to using native resolution of the connected display.

Thanks for the link, but I believe I read that article back in 2016. :slight_smile:

I only emulate NES and SNES right now. NES is pretty trivial to emulate and I just use Nestopia and have been happy with that. The hardware I use for my main emulation needs isn’t very powerful, so for SNES I’m best off using snes9x/snes9x2010. That means I give up some accuracy, but as long as I don’t notice any obvious glitches, I tend not to worry too much about that. I definitely understand those that strive to go the other way, though. When you do have the hardware for it, such as a high-end i7, the difference (input lag-wise) between using bsnes-balanced and snes9x/snes9x2010 is just 4-8 ms of frame delay, which isn’t exactly huge.

I’m probably not the best to give such recommendations, given my pretty limited emulation needs. I also think it might be hard giving general recommendations. I’d personally use the most accurate core my system can handle, unless it’s unproportionally demanding. For example, if one core provides good accuracy while allowing a frame delay of 12, while another is ever so slightly more accurate and allows no use of frame delay, I’d choose the former.

Not yet! I’ll try to get that done as well. I’ve not forgotten about it. :slight_smile:

I didn’t. I’m guessing that the SNES won’t output DVI via a HDMI to DVI cable, so that excludes using my HP Z24i. I’ll need to use a different display and I’ll also need to test RetroArch at 720p on that same display. I have a small Samsung 1080p LCD TV that I can use for this. If I test RetroArch with the same system and settings as I used in my previous post, we can get the input lag difference between the HP Z24i and the Samsung. Then we can subtract that from the SNES Mini figure to get a feeling for how it would perform on a really fast display.

Don’t know when I’ll have time for such a test, though…

I should mention that I have read the small amount of info there is regarding input lag on the SNES Mini. My current guess, based on what I’ve seen so far, is that they may have been able to get it down to around 4.2 - 4.4 frames for the test case I’m using (i.e. Yoshi’s Island). They’ve probably optimized their emulator to be able to generate the frame and push it to the GPU within a single frame period, i.e. similar to hard GPU sync or max_swapchain_images = 2. Coupled with fast input handling, that’s where I believe they would end up. It’s still not as good as can be accomplished with RetroArch (the difference being mainly the possibility of also using frame delay), but it’s really good. All speculation for now, of course, but would be interesting to run some tests to see if it’s correct or not.

1 Like

I’m happy to say that with tips in this thread, a CRT, and a heap of stubbornness, I’ve achieved real provable next-frame response time — sub-16ms lag!

First, my testing procedure:

RetroArch on Linux in KMS mode

CRT (Sony Trinitron)

iPhone with 240fps slow-mo video capture

Here’s a simple video showing my methodology and results: https://www.youtube.com/watch?v=lBwLSPbHWoc

What’s great about using a CRT for these slow-mo tests is you can see the scanline move down the tube, and know exactly when a frame begins and ends. No need for fractional frame analysis and averaging. At 240fps, you get four captured frames per TV frame. Obviously, the faster slow-mo camera you have, the better. But iPhones and other smartphones do just nicely.

Like @Brunnis said (at some point), it’s unnecessary to hack together wires and LEDs to controllers for personal testing. Deliberately/flamboyantly pressing a button on the controller while simply holding it front of the TV is enough. At 240fps, there’s only about a single slow-mo video frame (or 4ms) of ambiguity as to whether a button is hit or not.

I’ve always used Super Mario Bros. on the NES as my casual lag litmus test, and let my muscle memory be the judge. It seemed logical to use for my slow-mo tests too. I have a real NES (well, an AV-modded Famicom) for comparison using the same CRT.

Early in the process, I discovered two things:

1: On real hardware, Mario has a one-frame lag! I feel completely betrayed all these years having never known this! The soonest you’ll see an input response (jumping, etc) is greater than 16ms after your input.

2: RetroArch’s Nestopia core seems to have an additional frame of lag unaccounted for, bringing the total minimum lag for a Mario jump to over 32ms. QuickNES does not have this additional frame of lag.

With these two findings, I decided to find another game to test. I chose Pitfall! for the Atari 2600, using the Stella core.

With Pitfall, I witnessed a response on the very next frame. In the video linked above, you can clearly see me hit the button near the end of one frame, and on the next, Harry jumps! Essentially no way to improve compared to original hardware. Pack it up. We’re done here :slight_smile:

This method of testing has worked really well for me, and was not very difficult at all. With a video app that allows for frame-by-frame movement (on macOS, I just use QuickTime Player), counting frames is easy. A CRT is of course not required, but watching the beam race down the tube lets you know exactly when a new frame is coming. And hopefully armed with the knowledge that Stella can react with zero lag can help others test their own equipment much easier.

Eliminating lag has been a crusade of mine for years. It had been distracting enough that merely enjoying old games was nearly impossible. Maybe now I can play my old games without “worry”.

Extra stuff:

There’s been talk about digital-to-analog converters (HDMI to VGA, or DP to VGA) being a source of display lag. Yeah sure, if you want to be pedantic (we all do), every signal transformation TECHNICALLY produces additional lag, but in most cases, the lag is beyond minuscule. Less than a single millisecond. Think about it: There’s not much hardware in most cheap converters. Each and every frame of 1080p video contains over 2MB of data. If the adapter has sizable lag, where is it going to PUT those frames it’s supposedly holding onto? :slight_smile:

Full specs:

CPU: i5-5675C

Graphics: Intel Iris 6200

OS: Ubuntu 17.04

Latest RetroArch Nightly

Display: Random DisplayPort to VGA adapter

Extron VSC 500 (not required, but makes my VGA 240p signal look nicer)

VGA to circuit feeding CRT jungle chip

Sony Trinitron TV

Linux Settings:

Custom 1920x240p EDID via DisplayPort

RetroArch launched in KVM mode through basic command-line (not GUI)

Linux RetroArch settings:

VSync On

Maximum Swapchain Images: 2

Frame Delay: as high as you can go without stuttering. 15 works for me for 8-bit systems

Integer Scaling

No filters

Input device: Super Famicom controller

Dual SNES controller to USB adapter - V2, configured to 1ms of lag. Website specs say 2ms minimum, but hey, 1ms was an option in the controls, and Linux reports 1ms poll time…

6 Likes

Beautiful! Thanks for sharing your video verification!

That’s interesting about Nestopia. We’ll have to poke around and see if we can figure out what’s up there.

2 Likes

All updated NES cores (Nestopia, Fceumm, Mesen, QuickNES) react in 2 frames internally here.

Mario checks input at the end of the frame, near scanline 257 (or was it 247?). Even if you press jump in time for that, he won’t jump the next frame, but will jump the one after that. So, on a CRT, if Mario is at the very top of the screen, you might see him jump ~20ms later, but at the bottom of the screen like he typically is, it’ll probably be at least ~30ms until the CRT updates Mario’s sprite. On an emulator (or using anything that doesn’t update the picture in real time like a CRT), this means that the minimum lag will always be at least 2 whole frames, as far as I know.

1 Like

I’m uploading my raw footage of CRT tests for Super Mario Bros, Real NES vs. QuickNES vs. Nestopia, if anyone’s interested in studying further. When this post is an hour old, it should be ready for viewing here: https://www.youtube.com/watch?v=x2Y9orESg8A

I’m going to sleep now, so hopefully the upload doesn’t fail :slight_smile:

FWIW, I noticed the additional lag in Nestopia on earlier LCD screen tests as well, using a different computer running Ubuntu 17.10.

1 Like

The Advanced Test core should also show next frame response with the latency test, if you want to take any possible latency from games out of equation and test the latency of RetroArch itself.

1 Like

This is great work guys. I’ll be switching over to QUICKNES from now on. A nice breakdown of each core would be great to test. Mame vs fba would a great one to try out.

Eversince I tried Retroarch for the first time there was no doubt in my mind that it was the future. It overcame the crippling input lag that plagues many stand alone emulators and the future looks bright as this will only get better as computer hardware gets faster and cheaper so someday we’ll be able to use a 15 frame delay on the BSNES accuracy core. Although with the exception of only a handful of games I’m pretty hard pressed to find inaccuracies in the games that I play between BSNES balanced and SNES9X.

QuickNES is amazing! However it is very bare bones. It doesn’t run any Japanese titles or even have a turbo button function lol. Nestopia may have more input lag, but it has the ability to do so much more. The input latency on Nestopia is already so incredibly low that I don’t feel held back at all, and I’m on an LCD TV. QuickNES will be my go to for American releases for sure. Nestopia will be for everything else.

As always amazing work Libretro!

I tested SMB a bit with Hard GPU Sync 0 frame + the frame delay setting.
It runs fine with Frame delay on 13 with QuickNES and Fceumm, 12 Nestopia, 10 for Mesen.

I have to go in the Nvida panel settings and put Power management to “prefer max perf” or the GPU will regularly throttle making the game slow down.
With that on, I could keep a shader on without problem (crt-geom).

(i5-3570k@4GHz, win7 x64, Geforce 770.)

1 Like

Hi ! Really exciting news :slight_smile:

But that means you cannot have double buffering on, and thus that one get tearing ? Probably not a problem for the Atari 2600 ! It also means work has to be put into the cores themselves ?

Cheers !

@Tatsuya79: I see SMB in Nestopia reacts in 2 frames internally here, too. Doesn’t that mean there’s an additional frame of delay that shouln’t be there?

On the go, so I’ll keep it short (and sorry if I’ve missed anything):

Great tests, @TylerL! Quick comment regarding additional lag in Nestopia: if you pause RetroArch with p press and hold the jump button and then press k to single step frames, how many are required before Mario jumps? If it’s 2 frames, everything is as it should be. You can also test the menu in Mega Man 2. It has next frame latency on Nestopia.

EDIT: Just made a quick test with Nestopia and QuickNES, single stepping the emulators. Both perform the same in SMB, i.e. they respond on the second frame after applying input. This is expected, given that SMB behaves the same on a real NES. But then the question remains: why did you get different results? Are you absolutely sure you used the exact same RetroArch settings?

@vanfanel I believe two frames is what the real SMB has in terms of lag, so I’m not sure how Nestopia could be quicker than that…

Okay, some more comments now that I have slightly more time:

It’s great to see someone else testing and it’s also nice to see you using Raphnet’s adapter as well. Regarding the next frame latency, I think it’s good that someone specifically demonstrates this. If you’ve noticed, my post earlier, testing with the Raphnet adapter, indicates the same thing. Many probably didn’t realize, due to the fact that I’m testing with a game that has built-in latency (and I also had a fairly low frame delay setting). If you remember, I posted this calculation:

65 ms
- 0.5 ms (average until next controller poll)
- 0.5 ms (average until next USB poll)
- 8.33 ms (average until start of next frame)
- 50 ms (3 frames. Yoshi's Island has a built in delay which means the result of an action is visible in the third frame.)
+ 6 ms (since we're using a frame delay of 6 in RetroArch)
- 11.67 ms (0.7 frames. This is how long it takes to scan out the image on the screen until it reaches the character's position in the bottom half of the screen.)

You can easily modify this to account for the built-in latency in Yoshi’s Island and also add additional frame delay:

+ 0.5 ms (average until next controller poll)
+ 0.5 ms (average until next USB poll)
+ 8.33 ms (average until start of next frame)
+ 16.67 ms (1 frame. Let's assume we're using a game with next frame latency.)
- 15 ms (Let's assume we're able to use a frame delay of 15, like @TylerL )
+ 8.33 ms (0.5 frames. This is how long it takes to scan out the image on the screen until it reaches the middle.)

= 19.33 ms or 1.16 frames on average

On an original console running a game with next frame latency on a CRT and using a controller which adds no input lag at all, average latency would be 1 frame. The reason we get 0.16 frames (2.67 ms) extra is mainly because of a very small delay in controller input handling and the fact that we can’t use a frame delay that completely removes one whole frame of input lag (that would require an infinitely fast execution of the emulator).

The reasons I mention all this are:

  1. My results corroborate yours, even if many people probably didn’t realize this when reading my post.
  2. I used Windows for my tests, which many believe is much slower in terms of input lag than Linux with KMS. My experience is that Linux + KMS + max_swapchain_images=2 equals Windows + GPU Hard Sync.

Regarding QuickNES vs Nestopia:

I should mention that I’ve measured input lag using Nestopia on Linux with KMS previously and I have not seen this extra input lag you experienced. I’m pretty sure there’s something else at play here…

1 Like

Very interesting thread.

Seeing that earlier calculation is what inspired me to try for (and test) demonstrable next-frame latency. It looked achievable, and I had the equipment to directly compare performance to real hardware.

I worked on some more tests last night with various systems. I did some preliminary tests with Windows on my CRT but aren’t seeing the performance I’d expect. I’ll come back to it and tweak settings at some point.

As for QuickNES vs Nestopia, I can confirm that stepping through the emulator for a SMB jump gives second-frame response time as expected. Still, I’ve experience that same additional frame of latency across multiple platforms when played at full speed.

Observed on Ubuntu 17.10 on a MacBook Air, Ubuntu 17.04 on a homebuilt PC, and Windows 10 on the same homebuilt PC.

My Ubuntu 17.10 MacBook Air tests were the first time I noticed the difference in latency between the cores. Unfortunately, I didn’t keep my test videos around for this trial.

Here are my Windows 10 core vs. core results, based on a SMB Mario jump. Based on Mario position and input timing, real hardware would be between ~1.5 to ~2.5 frames:

RetroArch Windows 10 LCD 120fps Black Frame Insertion Nestopia:

17 13 16 17 14 15

Avg 15.333333333333333

3.8 frames

RetroArch Windows 10 LCD 120fps Black Frame Insertion QuickNES:

10 12 13 11 12

Avg 11.6

2.9 frames

I hope to revisit my Ubuntu KMS tests in the future, making sure all cores are completely up-to-date (in case something has changed to Nestopia recently) as well as test other NES cores for comparison.

1 Like