Input lag w/Nestopia - SOLVED

PatrickM · 15 January 2017 03:36

Well, I thought everything was running smoothly, so I did some tests of the nestopia core and found that there was significantly more input lag vs. just using nestopia by itself. With nestopia on retroarch I get around 50ms more of input lag compared to the latest build of nestopia for windows.

How does this make any sense? Aren’t they essentially the same program? I just have no clue what could be causing this lag. My cpu isn’t good enough for hard sync, which is the only suggestion I could find online. I’m really bummed now because RA is the only program that I know of that will do full screen mode with evenly spaced scanlines. Christ, it’s. 2014. All I want is to be able to play NES in full screen with no screen tearing, minimal input lag, smooth scrolling, the correct aspect ratio, full controller support and evenly spaced scanlines. As it stands nestopia by itself is the closest thing, but the scanlines look whack because the internal aspect ratio is not correct, so you have to use a dumb billinear filter to get the scanlines looking halfway decent.

What could be the cause of my input lag with RA? I’m running windows 7 64 bit, if that helps.

hunterk · 15 January 2017 05:23

How are you measuring latency? RetroArch shouldn’t be introducing excessive latency anywhere, much less almost 4 frames worth. Is this with all cores or just Nestopia?

PatrickM · 15 January 2017 05:23

Well, I admit that my test is pretty “subjective.” I use Blargg’s reflex timer and do 20 tests and find the average for nestopia, then I do the same for nestopia in retroarch, then I go back and repeat this process 3 times- so I have 3 averages for nestopia and nestopia in retroarch, then I find the average of those averages. My response time with Nestopia is around 250 ms, with retroarch it is closer to 300. I have no idea where lag could be introduced, I don’t even have vsync turned on in RA. I have it forced on through my graphics card with triple buffering so this should result in NO lag compared to simple double buffering.

I don’t want to give up on RA because I like having all the emulators in one place and the shaders are awesome.

PatrickM · 15 January 2017 05:23

Well, I admit that my test is pretty “subjective.” I use Blargg’s reflex timer and do 20 tests and find the average for nestopia, then I do the same for nestopia in retroarch, then I go back and repeat this process 3 times- so I have 3 averages for nestopia and nestopia in retroarch, then I find the average of those averages. My response time with Nestopia is around 250 ms, with retroarch it is closer to 300. I have no idea where lag could be introduced, I don’t even have vsync turned on in RA. I have it forced on through my graphics card with triple buffering so this should result in NO lag compared to simple double buffering. Does RA somehow prevent triple buffering?

I don’t want to give up on RA because I like having all the emulators in one place and the shaders are awesome.

EDIT: okay, I think I just need a new testing method

I redid the test and the difference was a lot smaller, like 20ms. I think there is a difference between retroarch + nestopia and just nestopia by itself, but it’s less than a frame. I’m guessing RA takes just slightly longer to render a frame since the shaders are slightly more complicated - I’m using crt-easymode.cg on RA + nestopia and NTSC filter + scanlines @ 20% + bilinear filter in nestopia by itself.

hunterk · 15 January 2017 05:23

If it’s less than a frame, it could be that RetroArch is rendering the frame longer before vblank, so it feels like there’s a longer gap. I believe that’s precisely what GPU hard sync fixes. It makes sure that you’re using a very recent frame/input poll.

PatrickM · 15 January 2017 05:23

Does triple buffering not work in RA? That’s what it feels like. I’ve been replaying world 1 of SMB for more than 2 hours, going back and forth between nestopia by itself and RA + nestopia. Nestopia by itself feels just more responsive, just enough to make everything slightly easier. It feels like I’m dealing with ~2-3 additional frames of lag, or about how much I get with vsync without triple buffer.

I’m sort of unclear on what the real tangible benefit of hard gpu sync is over true triple buffering plus vsync. With triple buffer and vsync you get no more input lag than you do with double buffering without vsync. It’s really complicated but there are some good explanations here

http://www.anandtech.com/show/2794 http://gamedev.stackexchange.com/questi … -input-lag

Does hard gpu sync result in even less lag than double buffer with no vsync?

Also, what is the difference between Nvidia’s Gsync and hard gpu sync? it sounds like they almost do the same thing but in reverse of each other, with Gsync it sounds like the monitor’s refresh rate is automatically synced to the framerate, with hard gpu sync it sound like you’re syncing the framerate (gpu) to the refresh rate of the monitor. Is this about right?

hunterk · 15 January 2017 05:23

Yeah, I don’t think triple buffering helps with RetroArch, but I don’t think it will generally help with any emulator because of how they work.

Emulators tell your CPU to work as hard as it can to emulate one frame of video and one frame’s worth of audio samples. This usually takes significantly less than 1/60th of a second. The rest of the time, your CPU just sits there until it’s time to push out the frame/samples and start working on the next one (this idle time can cause odd behavior with aggressive power-savings settings, but that’s another issue). For triple buffering to work, the emulator would need to keep cranking out frames to fill those buffers, which can’t really happen without running the game ahead.

Hard GPU sync makes the core wait and just keep checking for input until the last possible moment before cranking out its frame, that way it’s nice and fresh. This reduces input latency by as much time as your CPU would be idle, typically.

Gsync only refreshes your monitor when it has a full frame from the GPU, which should alleviate the audio/video sync issue that plagues emulators, but it won’t have much/any effect on latency unless the emulator does something like hard GPU sync, I think.

PatrickM · 15 January 2017 05:23

hunterk:

Yeah, I don’t think triple buffering helps with RetroArch, but I don’t think it will generally help with any emulator because of how they work.

Emulators tell your CPU to work as hard as it can to emulate one frame of video and one frame’s worth of audio samples. This usually takes significantly less than 1/60th of a second. The rest of the time, your CPU just sits there until it’s time to push out the frame/samples and start working on the next one (this idle time can cause odd behavior with aggressive power-savings settings, but that’s another issue). For triple buffering to work, the emulator would need to keep cranking out frames to fill those buffers, which can’t really happen without running the game ahead.

Hard GPU sync makes the core wait and just keep checking for input until the last possible moment before cranking out its frame, that way it’s nice and fresh. This reduces input latency by as much time as your CPU would be idle, typically.

Gsync only refreshes your monitor when it has a full frame from the GPU, which should alleviate the audio/video sync issue that plagues emulators, but it won’t have much/any effect on latency unless the emulator does something like hard GPU sync, I think.

I’m just completely stumped as to why nestopia by itself feels faster than nestopia through retroarch. But after extensive “testing” by playing numerous games for hours, I am more than 95% sure than I’m getting around 30-40ms of additional lag using RA.

I don’t know how this can possibly be - it’s the same program, right? The only difference is the shader that is being applied. I’m only using the single pass shader crt-easymode.cg.Can the shader really add 30+ms of lag?

Also I think you are referring to the “render ahead” method of triple buffering which is different from “page flip.” With page flip the image you see is as old as your gpu is capable of cranking out frames - if it’s 2 ms to render a frame, the image seen is 2 ms old. The back buffer that gets displayed is always the first one that is rendered after input, the other back buffer is just discarded when they are swapped and a new frame is immediately drawn to it. Vsync never “stalls” because there are two back buffers to swap from, so one can remain locked to the front buffer while the other one is being drawn to, which also eliminates screen tearing. With triple buffering, as soon as a frame is drawn, it gets displayed. There is no waiting for vsync other than the amount of time it takes to render the frame, is my understanding.

My main question is: what does RA+nestopia do differently from nestopia by itself? That might point to a source of the input lag I’m experiencing.

hunterk · 15 January 2017 05:23

Shouldn’t be the case, but I guess you could test it by turning off the shader and playing some more.

EasyMode · 15 January 2017 05:23

I hope not

If you want an idea of how fast a shader is running, you could try the following:

install/run FRAPS to display your framerate
disable vsync (in control panel as well) & hard GPU sync
load this Win64 image viewer core: http://www45.zippyshare.com/v/82163770/file.html
load a screenshot such as this one:

On my GTX 580, that shader runs at 1050 fps (or under 1ms).

Edit: Actually in Retroarch 1.0.0.2 it runs at 1850 fps with default settings.

PatrickM · 15 January 2017 05:23

Shouldn’t be the case, but I guess you could test it by turning off the shader and playing some more.[/quote]

Okay, turning the shader off didn’t seem to make any difference. Here are the actual test results I got using Blargg’s reflex timer. I tested retroarch second so that the test would be biased in favor of RA (since I would only get better at the test with repetition).

Standalone Nestopia. -bilinear filter -NTSC filter -scanlines 20% -vsync ON -triple buffer ON -sync to refresh rate ON

Test results, standalone Nestopia (total response time in ms)

258 291 258 275 275 291 291 258 291 291 291 324 275 291 275 291 275 275 258 275

average: 280.45

Nestopia, Retroarch. -vsync OFF in RA -triple buffer and vsync ON in control panel -hard gpu sync OFF

Nestopia Retroarch, test results

296 312 296 296 279 296 296 296 296 345 312 329 296 296 312 312 312 279 296 296

average: 302.4

The results are pretty unequivocal. Nestopia in RA has an average additional input delay of around 20ms compared to Nestopia standalone - or around 1.25 frames of input lag. What’s more, the lowest response time achieved with Nestopia RA is 279, compared to 258 for Nestopia standalone. The worst time for standalone Nestopia is 324, while the worst time for RA Nestopia is 345. The difference in both cases is 20ms, same as the difference between the averages.

Any ideas?? It sure would be nice to solve this since RA has so many good features.

hunterk · 15 January 2017 05:23

Have you tried it with triple buffering and vsync turned off in your control panel and vsync turned on in retroarch (with or without hard gpu sync, which should only remove <16 ms at best)?

anon24419061 · 15 January 2017 05:23

I wonder why even use triple buffering. RetroArch doesn’t even have frameskip. Triple buffering is only useful on performance limited scenarios, allowing games to run at sub 60fps without dropping all the way to 45/30/15 fps. Honestly you should only mess with CP settings in cases you can’t run at steady 60fps and the game doesn’t offer a triple buffering option.

Your measurements might be accurate but I wonder, are you running in full screen? Windowed mode with aero can cause issues with vsync.

Anyway I tried the reflex test, I was amazed at my consistency!

246 246 246 279 246 279 246 246 263 246 279 263 263 263 246 246 230 230 230

That was with Retroarch in Windows 8.1 using OpenGL, full screen, vsync and hard sync to 1

anon24419061 · 15 January 2017 05:23

With hard sync 0 it was even more consistent and got even better results

213 197 213 213 230 246 213 197 197 197 230 279 230 213 213

Went ahead and tried with hardsync off and it worsened, 279 accross the board with an oddball 312 here and there but pretty consistent again.

There are many factors affecting this by the way. The controller you use, the screen you use (since it’s an HTPC I guess it might be a TV, try gaming mode if available), the driver combination, and to that you’re adding your custom CP settings.

PatrickM · 15 January 2017 05:23

turning vsync off in my CP overrides RA and forces it off, resulting in screen tearing and making the game unplayable for that reason.

PatrickM · 15 January 2017 05:23

I have confirmed that I’m not running anything in windowed mode. I’m using an original NES controller connected via a USB adapter. I am using game mode on my TV and it is rated as having 25ms of input lag by displaylag.com, although since I am looking for the source of the difference in input lag between Nestopia in RA and Nestopia by itself running on the same system, lag added by the controller and the display is the same for each and can be ignored.

I did some more test and there were some interesting findings.

Disabling vsync in CP and enabling it in RA showed no improvement and resulted in screen tearing, making it unplayable anyway.

The first interesting result is that disabling triple buffering in my CP - much to my surprise - resulted in a reduction in input lag. I thought I had the lag-reducing kind of triple buffering since I have an nvidia card, but apparently even nvidia will make the mistake of referring to “render ahead” as “triple buffering.” This confuses the hell out of everyone- ONLY “page flip” triple buffering should be called triple buffering, but I guess it’s way too late for that now.

The second interesting result is that Nestopia Retroarch STILL SHOWS a significant input delay compared to Nestopia Standalone. Here are the results:

Nestopia RA -same settings as first test except: -triple buffer OFF in CP -vsync ON in CP -vsync OFF in RA -GPU sync off RA

results in MS: 263 296 279 296 263 329 279 263 312 279 263 279 296 279 279 263 296 312 263 263

average: 282.6

Nestopia, standalone same settings as first test, except: -triple buffer OFF in CP -vsync ON in CP -vsync ON Nestopia -triple buffer OFF Nestopia

results in MS: 24 2 275 258 275 258 234 258 275 258 275 258 258 291 275 275 275 258 275 258 242

average: 268.15

The difference in the average is 14.45 ms. The lowest result from Nestopia RA is 263 compared to 242 for Nestopia SA. The highest result is 329 for Nestopia RA and 291 for Nestopia SA.

We can conclude from the above that triple buffering is not the source of the input delay difference between Retroarch running Nestopia and standalone Nestopia. We can also conclude that triple buffering as implemented by my graphics card adds to the input delay.

So, the question remains open: what is causing the additional input delay in RA?

anon24419061 · 15 January 2017 05:23

Can’t tell… I can’t reproduce the issue and I certainly don’t notice any input lag but my numbers are markedly better whan yours. And yeah told ya triple buffering adds one frame of input lag. It’s not important on current gen games but it’s really noticeable in twitch based games and platformers.

PatrickM · 15 January 2017 05:23

Ok So I tested Nestopia in Retroarch again, this time with the following settings: -everything the same as the previous test except: -in control panel: vsync set to “use application setting.” -in retroarch: vsync set to ON

results in ms: 263 279 263 312 263 279 263 296 279 279 263 246 296 296 279 263 279 263 263 279

average: 275.15

These are the best results attained in RA thus far, and less than 1/2 a frame additional delay compared to the best results attained with nestopia standalone (see previous test). This is so close to the results attained in the previous test of nestopia standalone that I will have to do 1-2 more tests.

I also will have to play for at least an hour with these settings to confirm that no performance issues result.

Andres: Triple buffering per se does not add input lag. “Render ahead” triple buffering adds input lag; “page flip” triple buffering reduces it. The problem is that both methods are referred to simply as “triple buffering” by both graphics card manufacturers and developers. That’s my understanding based on the anandtech article “Triple Buffering: Why We Love It.”

I think maybe the “render ahead” method might be more common, leading to the widespread perception that “triple buffering” per se adds to input lag, but that’s just a theory.

anon24419061 · 15 January 2017 05:23

It doesn’t really reduce it either, it’s pretty much game dependent. What it does is allow the game to run with vsync but without the video pipeline stalling the process. (ie, decoupling input latency from video rendering).

There is no way triple buffering would lower latency compared to a game running without vsync.

Awakened · 15 January 2017 05:23

Skimming this thread, I was wondering if using those settings would help any. I’ve read that in some cases using the application’s vsync method instead of the Nvidia driver’s forced vsync can be better.

You might find some insight into this issue from an old thread I made about input lag where maister first added the hard sync option. Reading through that I remembered that one difference between stand alone emulators and RetroArch is that RA uses dynamic rate control to sync audio and video. I remember that being a problem back when I used stand alone Nestopia; after a certain amount of time the audio would lag behind the video. RA’s dynamic rate control fixes that. I don’t know enough about that feature to know if it could add display lag on certain setups though.