Essentially, yes. The rest of the screen remains unchanged.
There are many ways to do this:
- Full screen blitting (Present() full screen of incompletely-rendered emulator framebuffer)
- Partial blitting (Present() ing only a rectangle)
- Front buffer (Writing new emulator scanlines directly to front buffer)
The same principles of beam racing jitter margin applies to all the above. All of them can be made to behave identically (With different GPU bandwidth-waste considerations) at the end of the day.
In all situations, the already-rendered (top portion) of emulator frame buffer is left unchanged and kept in the onscreen buffer. That, in itself, prevents tearing. Duplicate frames don’t show tearing. Likewise, duplicate frameslices don’t show tearing either. Viola!
The incompleteness never hits the graphics output, as long as new emulator scanlines are appended to the framebuffer before the GPU reads the pixel row out to the graphics output.
It’s already a working & proven technique already implemented in 2 other emulators including WinUAE. Please read the WinUAE thread from Page 8 through to the end.
No tearing. Duplicate frames = no tearing. Duplicate frameslices = no tearing. If the screen is unchanged where the real-raster is, there’s no tearing.
And the jitter-margin is very healthy. If there’s a lot of random variances in the beam-race distance between emulator-raster and real-raster, there is no tearing, as long as it’s all within the jitter margin (which can be almost a full refresh cycle long, with a few additional tricks!) In practicality, variances and latency can be easily brought to sub-millisecond on modern systems (>1000 frameslices per second).
Although I risk shameless self promoting when I say this – being the founder of Blur Busters & inventor of TestUFO and have co-authored peer reviewed conference papers – I have a lot of practical skills when it comes to rasters. I programmed split screens and sprite multiplication in 6502 assembly language in SuperMon64, so I’m familiar with raster programming myself, too.
There’s no tearing because the screen is unchanged where the realraster is.
It’s already implemented & proven.
Also, there is no “perceptually” – tearing electronically doesn’t happen because it’s unchanged bits & bytes at the real-world raster. It as if Present() never happened at that area of the screen.
Download WinUAE and see for yourself that there is no tearing.
Also, I posted an older BlurBusters article to hype it up a bit (ha!). Since then I’ve collaborated with quite a few emu authors. Shoutout to Tony Wilen, Calamity, Tommy and Unwinder (new RTSS raster-based framerate capping feature using “racing the VBI to do a tearingless VSYNC OFF” – not an emulator, but inspired by our research).
Depends on the VSYNC ON implementations – you can save 2 frames of latency, and also depends on parameters like NVInspector’s “Max Prerendered frames”.
But not only that, the randomness of the input lag savings disappear (e.g. granularity errors of input reads caused by enforced conversion of midscreen input reads to beginning-of-screen input reads, especially with surge execution techniques). Say, you’ve got a flight simulator game that does input reads at random positions throughout the emulator’s screen. Many emulator lag-reducing techniques will round-off those reads to refresh cycle granularities out of necessity. Etc. So you got linearity distortions, too.
Emulators already do inputreads on the fly so the number of inputreads does not increase or decrease. They already rasterplot into an offscreen frame buffer, because many emulators need to support retro “racing the beam” tricks. What we’re simply doing is beam-racing the visibility of that already existing offscreen frame buffer. In the past, you waited to build up the whole offscreen emulator framebuffer (beamracing in the dark) before displaying it all at once. With the tricks, you can now beamrace the visibility of that existing offscreen progressively-more-complete emulator framebuffer. Basically scanning out more realtime.
In reality with good frameslice beamracing, and execution synchronization (pace emulator CPU cycle exact or scanline exact), the latency between input read and the photon visibility can become practically perfectly constant and consistent (not granularized by the frameslices) with only an exact latency of the decided beamrace margin. (In reality, your input device is limited by the poll rate, e.g. 1000Hz mouse, so that’s the practical new limiting factor, but that is another topic altogether)
Since the granularity is hidden by the jitter margin, if the realraster and emuraster are chasing at an exact distance apart (e.g. 1ms) despite the frameslicing occuring (like brick laying new bricks on a path, while a person behind is walking at constant speed – adding new sections to a road granularity while the cars behind moves at a constant speed – same “relativity” basis. As long as you don’t go too fast, you don’t run into the incomplete section of the road (or on the screen: tearing), but the realraster and emuraster is always moving at a constant speed, despite the “intervalness” of the frameslicing.
You simply adjust the beamracing margin (e.g. chase margin) to make sure the new frameslices are being added on time before the constant-moving realraster hits the top edge of a not-yet-delivered frameslice (that’s when you get artifacts or tearing problems). Even when presenting frameslices, the GPU still outputs one scanline at a time out of the graphics output. So the frameslices are essentially being “appended” to the GPU’s internal graphics output buffer ahead of the current constant-rate (at horizontal scanrate) readout of pixels by the RAMDAC (analog) or digital transceiver (digital). Sometimes they’re bunched into 2 or 4 scanlines (DisplayPort micropackets). But other than that, there’s no relationship between frameslice height and graphics output – the display doesn’t know what a frameslice is. The frameslice is only relevant to the GPU memory. The realraster and emuraster continues at constant speed, and the frameslice boundaries jump granularity but always in between the two (realraster and emuraster, screen-height-wise), and tearlines never become visible. As long as it’s within the jitter margin technique, there can safely be variances – it doesn’t have to be perfectly synchronous. The margin between emuraster and realraster can vary.
You simply just have to keep appending newly rendered framebuffer data (emulator scanlines) to the “scanning-out onscreen framebuffer” before the pixel row readout (real raster) begins transmitting the pixel rows. Metaphorically, this “onscreen framebuffer” is not actually already onscreen. The front buffer is simply a buffer directly at the graphics output that’s got a transceiver still reading it one scanline at a time (at exact interval, without caring about frame slice sizes or frame slice boundaries), serializing the data to the graphics output. All of the beamracing techniques discussed above are simply various methods of appending new data to the front buffer before the pixel row begins to be transmitted out of the graphics output. It’s just the nature of serializing a 2D image into a 1D graphics cable, and it works the same way from a 1930s analog TV broadcast through a 2020s DisplayPort cable. That’s why frameslice beam racing works on the entire 100 year span without tearing.
Properly implemented frameslice beam racing is actually more forgiving than things like Adaptive VSYNC (e.g. trying to Present() in the VBI between refresh cycles to try to do a tearingless VSYNC OFF), since the jitter margin is much bigger than the height of the VBI. So tearing is less likely to happen! And if when tearing happens, it only briefly appears then disappears as the beamracing catches up into the proper jitter margin.
The synchronousness of input reads is unchanged, but the synchronousness of input-reads-to-photons becomes much closer to the original machine. Beam racing replicates the original machine’s latency. That’s what makes it so preservationist friendly and more fair (especially when head-to-heading an emulator with a real machine). So before a real-machine arcade contest, one can train in an emulator configured to replace original machine latencies via frameslice beam racing.
Some emulators will surge-execute a framesliceful, but that’s not essential. If you wanted, you can simply execute emulator code in cycle exact realtime (like an original machine) and the raster line callback will “decide” when it’s time to deliver a new frameslice on a leading “emu-raster percentage ahead of real-raster” basis (or simply use a 2 frameslice beam race margin) The emuraster and realraster always moves at constant scan-rate in this case (emuraster at emu scanrate, realraster at realworld scanrate). With the frameslice boundaries jumping forward in granular steps, but always confined between the emuraster and realraster. Tearing never electronically reaches the graphics output, tearing doesn’t exist. (Because the previous frameslice (that the realraster is currently single-scanline-stepping through) is still onscreen unchanged).
RunAhead is superior in reducing lag in many ways, but beam racing (sync between emuraster & realraster) is the only reliable way to duplicate an original machine’s exact latency mechanics. As a bonus, it is less demanding (At low frameslice counts) on the machine. All original times are preserved between realworld/emulator inputread versus the pixels being turned into photons – even for mid-screen input reads on the original machine – and even for certain games that have jittery-on-original-machine input reads that sometimes read right before/after original machine’s VBI – which causes a 1/60sec (16.7ms) randomly varying latency-jumping jitter on emulators doing surge-execution techniques since some input reads miss the VBI and gets delayed – the beamracing avoids such a situation).
Whatever distortions to input reads is occuring by all kinds of emulator lag-reducing techniques, all of that disappear with beam racing and you only get a fixed latency offset representing your beamracing margin (which can become really tight, sub-millisecond relative to original machine when both orig machine & emulator is connected to the same display) – with high-frameslice rate or front buffer rendering. Yet still scale all the way down to simpler machines – still remain sub-frame latency even on Android/Raspberry PI 4-frameslice coarse beamracing by using a low frameslice count and a wider (but still sub-frame) beam racing margin.
So while RunAhead is superior in many ways, and can reliably get less latency than the original machine – frameslice beamracing is much more preservationist-friendly, replicates original machine latency for all possible inputread timings (including mid-screen and jittery input reads).
P.S. If you download WinUAE, turn on the WinUAE tinted-frameslice feature (for debugging/analysis), and you’ll see the tearing (jitter margin debugging technique). It’s fun to watch and a very educational learning experience. Turn it off, and the tearing is not visible – horizontal scrolling platforms look like perfect VSYNC ON.
EDIT: Expanded my post