Yeah, my mGBA results seem indeed consistent with yours, Tatsuya. It’s actually one of my favourite emulators out there: lightweight and responsive, with extremely low audio latency, great accuracy while not too taxing on the CPU.
I think Brunnis’ fix has brought the SNES core back in line with the others as far as processing time is concerned. Now I wonder if it’s possible to further reduce emulator latency and get to the point of seeing a reaction time of just one frame.
It is just pure speculation on my part and I don’t have any programming competency either, but the fact that the lowest amount of latency we’re experiencing on all cores (even the fastest ones) is of 2 frames makes me wonder if we may look at something at a Retroarch-wide level to decrease it further.
The reason I’m making this assumption is that there is another libretro frontend that actually appears to be even more reactive than what we have measured so far. I’m talking about Alcaro’s ZMZ, which couples the old ZSNES interface with the SNES libretro cores.
You can find it here: http://www.smwcentral.net/?p=section&a=details&id=5681
Here is its github page: https://github.com/Alcaro/ZMZ
Back when I made the other thread on SNES latency, hunterk advised me to try it and I immediately noticed a huge improvement, although the timings were not as buttery-smooth as in Retroarch. Hunterk references this difference with actual measurements on his blog page: http://filthypants.blogspot.it/2015/06/latency-testing.html
I have made a quick test by using the latest Brunnis-fixed cores within ZMZ and it seems even more responsive, to the point of starting to really resemble actual SNES hardware.
I wouldn’t really know where to start honestly, but maybe Brunnis might find something relevant by comparing the two programs.
Another interesting resource, mentioned also by hunterk, might be to look at Calamity’s GroovyMAME, available here: http://forum.arcadecontrols.com/index.php/board,52.0.html
It’s a special distribution of MAME that diverges from baseline thanks to several features aimed at CRT usage (but it also works on common LCD screens) and minimizing latency. As far as I can understand, they have implemented two things that make it great from an input-lag perspective: a d3d9ex backend which supposedly skips all sorts of driver overhead and a “frame_delay” function. They claim to achieve next-frame latency on Windows x64, so once again I wonder if we can look at it as a reference.