You can raise scanline saturation like beyond 1.5, but it’s also connected with scanline shapes. If the image gets too saturated, then throw in some clip saturated color beams.
Thank you so much again to all of you 
Awesome to see you still at work making this shader better by the day. Even though, imo, you pretty much already offered the best shader available from the start of your project.
It’s been a year-or-two since I tweaked my last config. I’m now going to spend my day off catching up here and setting up a new config. 4K this time around. I’ve pretty much moved away from my PC monitor completely.
Your dedication is very much appreciated @guest.r 
Hey @guest.r, @anikom15, I just modified my Turbo Duo preset to Float Framebuffer for all passes and I can’t really say that I noticed a huge difference without an A-B comparison but it just looks and feels perfect. It’s definitely not pixelly for me.
I’m also able to use merge fields or leave it on Auto without seeing the same interlacey artifacts I had mentioned before. This could have something to do with the added precision of the float frame buffer for all passes.
I use the Grain shader from the img mod shader but I’m sure I set it to zero in my last tests when I was getting the anomaly. The easy test is to just use my older preset before the float framebuffer mod but that might have to wait until I get a chance plus I like moving forward.
The overall text and image presentation seems improved. Even the “m” looks better. Again some of this could be placebo. Pics will follow at a later time.
@nesguy, you should try this with you presets and see if you can see a difference.
Another thing I noticed was the accurate looking CRT like interlacing flicker when I was trying to take my photos. I’ve previously had BFI on on my TV and didn’t really notice anything like that but it’s awesome!
I did enable the new RF Noise feature, which is going to completely replace my use of the img mod grain shader. Just have to figure out or get some help to cull it from the pipeline without breaking anything.
I have been experimenting with D3D11, Waitable Swapchain and Max Frame Latency 1 though. @HunterK, is there an equivalent Hard GPU Sync feature available on Vulkan?
Anyway thanks again! Everything seems to be working and working well. You mentioned trying to improve certain things like Resolution scaling, well I don’t know how much better they can get than this.
Again thank you and thanks to everyone else who contributes and shares their time, knowledge and expertise towards making these things better.
All of these photos were taken at ISO 100. Some at Speed 1/60 and others at 1/30.
I need a special Guest to help me out tonight, I’m having some trouble getting this to match my XBR960.
The subject in question is the accuracy of this bottom bar, blended here to match my Dreamcast connected to the XBR960 over Composite video. This requires Interlace Mode 1, 2, or 4
I’d also like to avoid the interlacing bobble, which also makes this bar flicker like crazy (Taxi) for some reason so I’m going with 4.
But this (of course) makes me lose out on the hi-res / VGA / Super Fine Pitch scanlines present on the XBR960. When I enable them here, I lose the blending effect:
This ultimately means I can’t recreate the look of my CRT, despite being so close. Mask 0/5/7 (can’t decide) + VGA mode pretty much nails how the XBR960 displays the Dreamcast, so. Not sure if I’ve missed something, but ideally I’d like something like a “2 height horizontal mask” over Interlace Mode 5.
If you are still going after interlacing modes 1-3 or 6, this are some good advices:
- merge fields
- set rainbowing to 3.0
- set ntsc resolution scaling to 0.5
There can be problems if odd/even filed timing isn’t a perfect 1:1, which can happen, as also discussed in an other thread.
If going interlace mode 4, then you can play a bit with internal resolution settings to find the best scanline density. If you have regular scanlines, either by using VGA mode or setting interlaced mode to 0.0, then you are probably OK with 2160p resolution.
With masks you really have plenty of options with your display. I guess you want a somewhat higher TVL, for which mask 10 is pretty close.
I don’t perfecty understand a “2 height horizontal mask”, the only option that fits the description is using a shadowmask with the 0.5 value. These mask are a "“2 height horizontal masks”.
What shader parameters are more taxing on the gpu/cpu currently, with the recent updates and additions?
I was just kinda inventing something that isn’t there, because there aren’t visible scanlines in Interlace Mode 4.
I’d like to have the VGA style scanlines display over the top of that, like with Interlace Mode 0, but scanlines (most likely by design) disappear with Interlace Mode 4, so I can’t seem to go about faithfully recreating the way my TV looks with this kind of screen door pattern, plus the blending.

The XBR960 displays a progressive image, so Interlace Mode 1/2 won’t be accurate.
And whatever settings I use so I’m able to get that scanlines pattern there, we then lose the ever important blending (Interlace Mode 4 in screenshot; the scanlines just disappear completely):

I’m addressing this, but with other “solutions”. I will add a parameter which allows to adjust scanline density. Will only work with scanline mode and not interlacing (precisely interlace mode 0 and 6 will cooperate), but will be compatible with “high res scanlines”, which is basically interlace mode 4 with scanlines.
I guess this will also solve some other situations with 1080p and scanlined high-res content.
Well there is also an internal format command the shader authors can use to request the framebuffer that way. I don’t know how that works with float_framebuffer.
Modern video cards are optimized to work with floats now even more than unorms, so it doesn’t surprise me that you see a performance benefit.
I think format conversions shouldn’t matter too much in these situations. Benefits of unorm8 vs. sfloat16 are more or less bandwidth related.
So cherry picked texture formats can make a notable difference with “integrated” gpus, which use system memory.
It’s also to be considered that shader passes can force texure format pragmas, which flagging a framebuffer as float can override.
Can this result in any negative effects?
Maybe, for example megatron’s last pass is “R10G10B10A2_unorm”, which is fast and matchess HDR buffers later on, i.e. gpu’s backbuffer. Setting last pass of a crt shader chain to float in general doesn’t make to much sense, because there is immediate confersion to R8G8B8A8 to fit SDR backbuffers.
Setting a last shader chain pass to sfloat16 makes the most sense, if the preset is to be prepended etc. But visual benefits aren’t guaranteed.
Float framebuffer a.k.a. sfloat16 format is a must when colors reside in linear conversion, when negative values or values above 1.0 are expected and where precision is very important, like you want to pack 8 variables into RGBA output.
Via texture pragma formats sfloat32 can also be forced, but i noticed performance degradations in my testing. In an ideal world we could just use sfloat32 all day long…
So in other words, there’s a risk of adding unnecessary inefficiencies as well as possible incompatibility if not used properly.
I find this topic very interesting. I know this is not a university but it would have been cool if someone like yourself who knew better, could have analyzed my shader chain to point out where each texture format made the most sense with a goal of prioritising image quality without sacrificing performance where there would be no image quality benefit.
If that’s not possible, where can I learn more about stuff like this?
I really would like to have the optimal settings regardless of whether I had the additional headroom to not notice any performance dips.
It’s a good reading for start:
Otherwise current RA presets are more or less fine, maybe the missing link is assigning texture format types from presets directly - other than choosing between unorm8, srgb and sfloat16…
float_framebuffer is safe because all 8-bit and 10-bit fixed point color can fit into its 16-bit float format. I think even 12-bit color will fit (not relevant now; maybe in the future).
For pragma format, the shader author needs to know what she’s doing, ensuring whatever possible output values can fit into the chosen format. Since many intermediate shaders pass linear data and data outside of [0-1], and since there is a cost to format conversions, it’s best to use 16-bit float for all intermediate shaders unless you are certain you only need to work in [0-1] gamma-corrected space. Final shaders can be in 8-bit or 10-bit (for SDR or HDR respectively).
If an author is expecting a shader to be used as final pass, but someone wants to add a shader on top of it, that’s a perfect case to use float_framebuffer.
You forgot to mention the bandwith cost.
Otherwise well written.
A fine example is also MegaBezel smooth preset with 4k, some systems just run out of memory. If someone knows what he’s doing then it’s usually fine.
Do you know if it’s more expensive to switch formats or keep things as a higher memory format?
Bandwidth can be far more expensive with gpu’s on system memory. I also trided some sfloat32 passes, the performance hit was notable, even with a small size pass (with like 160GB/s bandwidth) and there is basically no conversion needed. The most standard conversion is unorm8 to sfloat32 (or whatever the “phone” uses) and is hard-wired into all relevant modern gpus with shader processors. I woudn’t worry about this.
Thanks. It makes sense.
















