Hi ! Really exciting news
But that means you cannot have double buffering on, and thus that one get tearing ? Probably not a problem for the Atari 2600 ! It also means work has to be put into the cores themselves ?
Cheers !
Hi ! Really exciting news
But that means you cannot have double buffering on, and thus that one get tearing ? Probably not a problem for the Atari 2600 ! It also means work has to be put into the cores themselves ?
Cheers !
@Tatsuya79: I see SMB in Nestopia reacts in 2 frames internally here, too. Doesn’t that mean there’s an additional frame of delay that shouln’t be there?
On the go, so I’ll keep it short (and sorry if I’ve missed anything):
Great tests, @TylerL! Quick comment regarding additional lag in Nestopia: if you pause RetroArch with p press and hold the jump button and then press k to single step frames, how many are required before Mario jumps? If it’s 2 frames, everything is as it should be. You can also test the menu in Mega Man 2. It has next frame latency on Nestopia.
EDIT: Just made a quick test with Nestopia and QuickNES, single stepping the emulators. Both perform the same in SMB, i.e. they respond on the second frame after applying input. This is expected, given that SMB behaves the same on a real NES. But then the question remains: why did you get different results? Are you absolutely sure you used the exact same RetroArch settings?
@vanfanel I believe two frames is what the real SMB has in terms of lag, so I’m not sure how Nestopia could be quicker than that…
Okay, some more comments now that I have slightly more time:
It’s great to see someone else testing and it’s also nice to see you using Raphnet’s adapter as well. Regarding the next frame latency, I think it’s good that someone specifically demonstrates this. If you’ve noticed, my post earlier, testing with the Raphnet adapter, indicates the same thing. Many probably didn’t realize, due to the fact that I’m testing with a game that has built-in latency (and I also had a fairly low frame delay setting). If you remember, I posted this calculation:
65 ms
- 0.5 ms (average until next controller poll)
- 0.5 ms (average until next USB poll)
- 8.33 ms (average until start of next frame)
- 50 ms (3 frames. Yoshi's Island has a built in delay which means the result of an action is visible in the third frame.)
+ 6 ms (since we're using a frame delay of 6 in RetroArch)
- 11.67 ms (0.7 frames. This is how long it takes to scan out the image on the screen until it reaches the character's position in the bottom half of the screen.)
You can easily modify this to account for the built-in latency in Yoshi’s Island and also add additional frame delay:
+ 0.5 ms (average until next controller poll)
+ 0.5 ms (average until next USB poll)
+ 8.33 ms (average until start of next frame)
+ 16.67 ms (1 frame. Let's assume we're using a game with next frame latency.)
- 15 ms (Let's assume we're able to use a frame delay of 15, like @TylerL )
+ 8.33 ms (0.5 frames. This is how long it takes to scan out the image on the screen until it reaches the middle.)
= 19.33 ms or 1.16 frames on average
On an original console running a game with next frame latency on a CRT and using a controller which adds no input lag at all, average latency would be 1 frame. The reason we get 0.16 frames (2.67 ms) extra is mainly because of a very small delay in controller input handling and the fact that we can’t use a frame delay that completely removes one whole frame of input lag (that would require an infinitely fast execution of the emulator).
The reasons I mention all this are:
Regarding QuickNES vs Nestopia:
I should mention that I’ve measured input lag using Nestopia on Linux with KMS previously and I have not seen this extra input lag you experienced. I’m pretty sure there’s something else at play here…
Very interesting thread.
Seeing that earlier calculation is what inspired me to try for (and test) demonstrable next-frame latency. It looked achievable, and I had the equipment to directly compare performance to real hardware.
I worked on some more tests last night with various systems. I did some preliminary tests with Windows on my CRT but aren’t seeing the performance I’d expect. I’ll come back to it and tweak settings at some point.
As for QuickNES vs Nestopia, I can confirm that stepping through the emulator for a SMB jump gives second-frame response time as expected. Still, I’ve experience that same additional frame of latency across multiple platforms when played at full speed.
Observed on Ubuntu 17.10 on a MacBook Air, Ubuntu 17.04 on a homebuilt PC, and Windows 10 on the same homebuilt PC.
My Ubuntu 17.10 MacBook Air tests were the first time I noticed the difference in latency between the cores. Unfortunately, I didn’t keep my test videos around for this trial.
Here are my Windows 10 core vs. core results, based on a SMB Mario jump. Based on Mario position and input timing, real hardware would be between ~1.5 to ~2.5 frames:
RetroArch Windows 10 LCD 120fps Black Frame Insertion Nestopia:
17 13 16 17 14 15
Avg 15.333333333333333
3.8 frames
RetroArch Windows 10 LCD 120fps Black Frame Insertion QuickNES:
10 12 13 11 12
Avg 11.6
2.9 frames
I hope to revisit my Ubuntu KMS tests in the future, making sure all cores are completely up-to-date (in case something has changed to Nestopia recently) as well as test other NES cores for comparison.
That’s great to hear!
It’s really strange. I’ve been giving it some thought the past couple of days, but I can’t find any theoretical explanation. I’ve previously measured Nestopia using Mega Man 2 on Heat Man’s stage (which also responds on the second frame) and it performs exactly one frame faster than Yoshi’s Island does. Counting backwards as per my previous calculation, there’s no extra lag left unaccounted for. Granted, these tests were made quite a while ago. Do you think you could give Mega Man 2 a go and see if it behaves the same? I’ll try to do my own tests, but I really can’t fit it into my schedule right now.
As usual, curiosity got the best of me and I made a quick test of Nestopia vs QuickNES. The test setup was the same as in my earlier post, but I’ll repost it here:
Dell E5450
RetroArch settings
I used my LED-rigged RetroLink controller for these tests. Results:
Average for both is exactly the same: 15.6 frames at 240 FPS, or 3.9 frames at 60 FPS
[Please note that this test was not done to get the fastest absolute input lag. It was only meant to do a comparison between the two emulators.]
So, I still cannot find any difference between these two emulators in terms of input lag. Counting backwards, input lag is where I expect it to be. I did notice, after the test, that my Nestopia build was a bit old (June 30th 2017). Other than testing the latest build, I don’t think I can do much more on my end, but please let me know if you want me to test anything else.
EDIT: Long shot, but are you sure you don’t have any core specific configuration that affects the emulators differently?
EDIT: Updated with current Nestopia build (downloaded just now):
The difference is within the margin of error.
I recall at one point doing a core-level remap of RetroPad B to NES A. When I get the time, I’ll put together a “clean” setup to test a few additional variables (or lack thereof!).
Also, all my tests have been with high Frame Delay. Depending on the device, I don’t believe I’ve gone under 12ms. Possibly something that only surfaces with an extremely small execution window?
@vanfanel Just a heads-up: I’ll do my best to test input lag on the Pi tomorrow!
Okay, sounds good.
Given what I know about frame delay, I don’t think it can have that kind of effect.
Now that I made an nvidia profile to force performance I get really good results and now use the frame delay setting that can be risen a lot more.
8/16bits systems allow me to use 10 (Mesen, Snes9x for mode7/SFX) to 13 (even 14 for Genesis+GX with GameGear games, the only one that fast from the cores I’m using).
I can’t wish for more with those, it really feels pretty reactive.
It seems I can use heavy shaders like crt-royale-kurozumi without changing the setting too.
Cores where I had to put Hard GPU sync to 1 are now working with 0 + frame delay at 5 or 6 (DesMume, Np2Kai at max emulated cpu speed, Mednafen PSX software with Tobal2 + deinterlacing crt shader).
Hard to go back to Mednafen Saturn then which allows nothing and has a lot of lag.
Snes9x and Snes9x2010 react the same here with slower games (I just do 1 profile per core for the worst case).
Hi Brunnis and TylerK,
Thank you for the detailed tests on the NES cores, very interesting.
I’ve been using the latest QuickNES core and one from about July 11th 2017 (it was the one I still had installed before these tests) and I’m noticing a more fluid response with the QuickNES core from July 2017. This is on a low end spec computer without your solid testing methodology, I must admit. However, the difference feels real to the point that playing SMB or Akumajou with the QuickNES core from July 2017 is actually more fun than the latest. It’s just that bit snappier, giving it just the edge to play more like the real hardware. So I was about to post to confirm that Tyler was onto something. But your tests make me wonder.
@ TylerK: could you post the version of QuickNES you used in your tests?
@ Brunnis: if you could find some time, could you possibly test the QuickNES core from about July 2017. The version I’m using is “QuickNES 1.0-WIP 05a742e”. This test could exclude any timeline differences in the QuickNES core.
Out of curiousity, some posts ago someone mentioned SMB read input at rasterline 257 round about. So in front of vblank. I suppose in SMB this input is used in the next game loop (in the next frame), after which it waits for vsync to scan out the image. Doesn’t this then add up to the following latency on real hardware?
Adding this up means that on average SMB on real hardware would show response after the second vsync after input. Or 2 frames of lag (two vsyncs passed after input). Adding it more precisely is 8,33+16,67+11,67 = 36,67 ms, or on average 36,67/16,67 = ~2.2 frames of delay when mario stands at the bottom of the screen.
TylerK’s result are done with frame delay 12, which equals (12/16,7)*262 = rasterline 188. So compared to real hardware in theory it’s (257-188)/262 = 0,26 frame short of real hardware. So the “correct” expectation for a SMB test at frame delay 12 for SMB would be the previous 2,2 frames plus the 0,26 = ~2,5 frames of delay.
TylerK’s result for the QuickNES core with frame delay 12 showed 2,9 frames of delay, which is much closer to the theoretical expectation of 2,5 frames of delay when using frame delay 12 than the 3,8 frames he got for Nestiopia.
With regards to your results of 3.9 frames at framedelay 6. Given that it differs 6ms with frame delay 12 it should add 6/16,67 = 0.36 frame to the expectation of the framedelay 12 case, in other words 2.46+0.36= ~2,8 frames of expected average delay at frame delay 6. Given your result of 3.9 frames, doesn’t this mean the emulator is adding a full frame of latency?
Given this I guess don’t understand how at framedelay 6 the average delay would deviate much from the expected average of 2,8 frames (as per above calculation), let alone it be 3,9 frames.
So my personal impression with the QuickNES core from around July 2017 feeling more fluid and the above calculation which suggest that the Nestopia cores and the 2018 QuickNES core seem to have a full frame of latency added makes me think TylerK may still be on to something.
Anyway, just my 2 cents.
Thanks again for all the detailed tests and getting the topic out of the twilight zone!
This is almost definitely a placebo effect, unfortunately, as all of the commits since then have been minor buildfixes:
I’m not sure I can be bothered, to be honest. Too little time available and too much time already spent…
Hehe, that’s why I said not to use my results for anything else than comparing the difference between the emulators. I’m using the Retro-Link controller, which I’ve measured to be 0.78 frames slower than using the SNES Mini controller plus Raphnet adapter. 0.78 frames is 13 ms and we also need to add the 1 ms that it still takes to handle input via the Raphnet adapter. Doing the whole calculation:
+ 14 ms (average delay from controller, including USB polling)
+ 8.33 ms (average until start of next frame)
+ 33.33 ms (2 frames. SMB responds on the second frame.)
- 6 ms (the frame delay used in my tests)
+ 12.50 ms (0.75 frames. This is how long it takes to scan out the image on the screen until it reaches Mario.)
= 62.17 ms (3.73 frames)
This matches up well (within tolerances) of my tests and means all lag is accounted for. It also means there’s no extra delay added by Nestopia itself.
@vanfanel Below are the results you’ve been waiting for:
I used RetroPie for these tests, for convenience. I updated RetroArch using the RetroPie script and it downloaded version 1.6.9, which, as far as I can see, includes your fix. The actual game tested was Mega Man 2 (Heat Man’s stage). Please don’t pay any attention to the absolute input lag, i.e. only compare the difference between each test case. I was using a small Samsung LCD TV for the test and I believe it introduces one frame of input lag (as opposed to my HP Z24i, which is more or less devoid of input lag).
Not sure if the results are what you were hoping for. I believe they mirror my suspicions, i.e. that dispmanx + max_swapchain = 2 is still faster and there’s a noticeable performance regression with max_swapchain = 2 no matter which driver you use.
Great to see the GL driver is finaly usable.
I think you’re double counting. The 0.75 frames scan out is already implicitly incorporated in the
+33.33 ms (2 frames. SMB responds on the second frame).
Correct summation leaving out double count is 49.67 (2.98) frames.
It means 1 frame in your results is unaccounted for (lag).
It can be understood better by examining your results for Real SNES on CRT (LED) from few post back.
Your measurements for real SNES on CRT (LED) show average lag of 3.58 frames. With your counting method we would explain as follows:
3.58 frames = 60 ms
+ 8.33 ms (average until start of next frame)
+ 50 ms (3 frames. Yoshi's Island has a built in delay which means the result of an action is visible in the third frame.)
+ 12.50 ms (0.75 frames. This is how long it takes to scan out the image on the screen until it reaches Mario.)
Total is 71 ms or 4.25 frames. You see result is overstated by double counting of scan out.
Without it the summation is 58.33 ms or 3.5 frames, very near to your measurement for real SNES of 3.58 frames. (If you still don’t believe try to explain your lowest measured response for real SNES on CRT: 2.71 frame.)
Nope, the scan-out is not already included in the 2 frame response of SMB mentioned in that calculation. Those two frame periods are what’s needed for the emulator to generate the frame showing the response. At the end of the second frame period, the final frame is fully rendered by the emulator. That final frame is then pushed to the GPU, which proceeds to scan it out to the display. This is how it has to be, since modern computers are frame buffer based and can’t control the display directly.
Regarding the calculation for the real SNES, that’s not how that calculation would look. It would look like this:
+ 8.33 ms (average until start of next frame)
+ 33.33 ms (2 frames. Yoshi's Island has a built in delay which means the result of an action isn't seen in the first two frames.)
+ 12.50 ms (0.75 frames. This is how long it takes to scan out the image on the screen until it reaches Mario.)
= 3.25 frames
Exactly why my measured result is slightly slower than this is something I’ve yet to work out. Anyway, as I said further up, the reason there’s a difference is that the emulator needs to finish the frame before sending it to the GPU which then starts scanning it out, while an actual NES/SNES scans out the frame while it’s being generated.
@Brunnis: Thanks a lot for the tests! If we concentrate on the max_swapchain=2 cases, that’s where the low-latency is , I really can’t understand where the extra frame of lag on GL is coming from: with max_swapchain=2, both drivers wait for vsync immediately after sending the frame to the GPU… Why, oh why is dispmanx faster? That shouldn’t be happening
What if this extra lag is actually built into the BCM driver? I know that it is not present when using the VC4 driver (although that is not really proof that the BCM driver is to blame). Just speculating here…