It’s possible. The runahead algorithm just needs to run on a frameslice basis insted of once per frame.
f = number of scanlines per frameslice
r = runahead value (in scanlines)
for each frameslice:
Run the emulator for f scanlines; save state
Run the emulator for r scanlines; beamsync last f scanlines
load state
All else equal, this multiplies the workload by the number of frameslices per frame, with some overhead for (de)serialization. So if the frame is divided into four frameslices, you have to do four times the work. This isn’t as bad as it seems if we’re using sub-frame runahead values. For example, if r = f, it’s approximately equivalent to a runahead of 1 today, 2r = f is 2 frames of runahead, and so on.
One caveat: Since we’re no longer presenting discrete frames, if the runahead value is set too high (above that of the game’s “internal lag”), then in addition to the usual “frame skipping” effect you normally get when runahead is too high, you might also see occasional screen tearing, not unlike when vsync is turned off, except it will only happen when state changes across frameslice boundaries in response to changes in input, which doesn’t happen as frequently as you might imagine.
This is fundamentally unavoidable. You can’t have intra-frame responses without tearing unless the game was designed with it in mind (which obviously won’t be the case if we are using runahead to achieve it).
However, provided you don’t set the runahead value above that of the “internal lag”, you can still reduce input lag without producing visual disturbances.
This makes it more useful for removing (small) amounts of host lag (e.g. driver, display, polling lag) up to the limit of the internal lag, while still maintaining faithful system latency.