New CRT shader from Guest + CRT Guest Advanced updates

guest.r · 4 November 2025 13:24

You can raise scanline saturation like beyond 1.5, but it’s also connected with scanline shapes. If the image gets too saturated, then throw in some clip saturated color beams.

ROBMARK85 · 4 November 2025 13:55

Thank you so much again to all of you

brad86 · 5 November 2025 08:58

Awesome to see you still at work making this shader better by the day. Even though, imo, you pretty much already offered the best shader available from the start of your project.

It’s been a year-or-two since I tweaked my last config. I’m now going to spend my day off catching up here and setting up a new config. 4K this time around. I’ve pretty much moved away from my PC monitor completely.

Your dedication is very much appreciated @guest.r

Cyber · 6 November 2025 15:52

Hey @guest.r, @anikom15, I just modified my Turbo Duo preset to Float Framebuffer for all passes and I can’t really say that I noticed a huge difference without an A-B comparison but it just looks and feels perfect. It’s definitely not pixelly for me.

I’m also able to use merge fields or leave it on Auto without seeing the same interlacey artifacts I had mentioned before. This could have something to do with the added precision of the float frame buffer for all passes.

I use the Grain shader from the img mod shader but I’m sure I set it to zero in my last tests when I was getting the anomaly. The easy test is to just use my older preset before the float framebuffer mod but that might have to wait until I get a chance plus I like moving forward.

The overall text and image presentation seems improved. Even the “m” looks better. Again some of this could be placebo. Pics will follow at a later time.

@nesguy, you should try this with you presets and see if you can see a difference.

Another thing I noticed was the accurate looking CRT like interlacing flicker when I was trying to take my photos. I’ve previously had BFI on on my TV and didn’t really notice anything like that but it’s awesome!

I did enable the new RF Noise feature, which is going to completely replace my use of the img mod grain shader. Just have to figure out or get some help to cull it from the pipeline without breaking anything.

I have been experimenting with D3D11, Waitable Swapchain and Max Frame Latency 1 though. @HunterK, is there an equivalent Hard GPU Sync feature available on Vulkan?

Anyway thanks again! Everything seems to be working and working well. You mentioned trying to improve certain things like Resolution scaling, well I don’t know how much better they can get than this.

Again thank you and thanks to everyone else who contributes and shares their time, knowledge and expertise towards making these things better.

All of these photos were taken at ISO 100. Some at Speed 1/60 and others at 1/30.

PU786 · 6 November 2025 08:16

I need a special Guest to help me out tonight, I’m having some trouble getting this to match my XBR960.

The subject in question is the accuracy of this bottom bar, blended here to match my Dreamcast connected to the XBR960 over Composite video. This requires Interlace Mode 1, 2, or 4

I’d also like to avoid the interlacing bobble, which also makes this bar flicker like crazy (Taxi) for some reason so I’m going with 4.

But this (of course) makes me lose out on the hi-res / VGA / Super Fine Pitch scanlines present on the XBR960. When I enable them here, I lose the blending effect:

This ultimately means I can’t recreate the look of my CRT, despite being so close. Mask 0/5/7 (can’t decide) + VGA mode pretty much nails how the XBR960 displays the Dreamcast, so. Not sure if I’ve missed something, but ideally I’d like something like a “2 height horizontal mask” over Interlace Mode 5.

guest.r · 6 November 2025 12:01

If you are still going after interlacing modes 1-3 or 6, this are some good advices:

merge fields
set rainbowing to 3.0
set ntsc resolution scaling to 0.5

There can be problems if odd/even filed timing isn’t a perfect 1:1, which can happen, as also discussed in an other thread.

If going interlace mode 4, then you can play a bit with internal resolution settings to find the best scanline density. If you have regular scanlines, either by using VGA mode or setting interlaced mode to 0.0, then you are probably OK with 2160p resolution.

With masks you really have plenty of options with your display. I guess you want a somewhat higher TVL, for which mask 10 is pretty close.

I don’t perfecty understand a “2 height horizontal mask”, the only option that fits the description is using a shadowmask with the 0.5 value. These mask are a "“2 height horizontal masks”.

petran791 · 6 November 2025 12:14

What shader parameters are more taxing on the gpu/cpu currently, with the recent updates and additions?

PU786 · 6 November 2025 12:40

I was just kinda inventing something that isn’t there, because there aren’t visible scanlines in Interlace Mode 4.

I’d like to have the VGA style scanlines display over the top of that, like with Interlace Mode 0, but scanlines (most likely by design) disappear with Interlace Mode 4, so I can’t seem to go about faithfully recreating the way my TV looks with this kind of screen door pattern, plus the blending.

The XBR960 displays a progressive image, so Interlace Mode 1/2 won’t be accurate.

And whatever settings I use so I’m able to get that scanlines pattern there, we then lose the ever important blending (Interlace Mode 4 in screenshot; the scanlines just disappear completely):

guest.r · 6 November 2025 15:12

I’m addressing this, but with other “solutions”. I will add a parameter which allows to adjust scanline density. Will only work with scanline mode and not interlacing (precisely interlace mode 0 and 6 will cooperate), but will be compatible with “high res scanlines”, which is basically interlace mode 4 with scanlines.

I guess this will also solve some other situations with 1080p and scanlined high-res content.

anikom15 · 6 November 2025 15:48

Well there is also an internal format command the shader authors can use to request the framebuffer that way. I don’t know how that works with float_framebuffer.

Modern video cards are optimized to work with floats now even more than unorms, so it doesn’t surprise me that you see a performance benefit.

guest.r · 6 November 2025 15:56

I think format conversions shouldn’t matter too much in these situations. Benefits of unorm8 vs. sfloat16 are more or less bandwidth related.

So cherry picked texture formats can make a notable difference with “integrated” gpus, which use system memory.

It’s also to be considered that shader passes can force texure format pragmas, which flagging a framebuffer as float can override.

Cyber · 6 November 2025 16:19

Can this result in any negative effects?

guest.r · 6 November 2025 16:37

Maybe, for example megatron’s last pass is “R10G10B10A2_unorm”, which is fast and matchess HDR buffers later on, i.e. gpu’s backbuffer. Setting last pass of a crt shader chain to float in general doesn’t make to much sense, because there is immediate confersion to R8G8B8A8 to fit SDR backbuffers.

Setting a last shader chain pass to sfloat16 makes the most sense, if the preset is to be prepended etc. But visual benefits aren’t guaranteed.

Float framebuffer a.k.a. sfloat16 format is a must when colors reside in linear conversion, when negative values or values above 1.0 are expected and where precision is very important, like you want to pack 8 variables into RGBA output.

Via texture pragma formats sfloat32 can also be forced, but i noticed performance degradations in my testing. In an ideal world we could just use sfloat32 all day long…

Cyber · 6 November 2025 16:58

So in other words, there’s a risk of adding unnecessary inefficiencies as well as possible incompatibility if not used properly.

I find this topic very interesting. I know this is not a university but it would have been cool if someone like yourself who knew better, could have analyzed my shader chain to point out where each texture format made the most sense with a goal of prioritising image quality without sacrificing performance where there would be no image quality benefit.

If that’s not possible, where can I learn more about stuff like this?

I really would like to have the optimal settings regardless of whether I had the additional headroom to not notice any performance dips.

guest.r · 6 November 2025 16:59

It’s a good reading for start:

github.com

libretro/slang-shaders/blob/master/README.md

# Vulkan GLSL RetroArch shader system

This document is a draft of RetroArch's new GPU shader system.
It will outline the features in the new shader subsystem and describe details for how it will work in practice.

In addition this document will contain various musings on why certain design choices are made and which compromised have been made to arrive at the conclusion. This is mostly for discussing and deliberation while the new system is under development.

## Introduction

### Target shader languages
 - Vulkan
 - GL 2.x (legacy desktop)
 - GL 3.x+ (modern desktop)
 - GLES2 (legacy mobile)
 - GLES3 (modern mobile)
 - (HLSL, potentially)
 - (Metal, potentially)

RetroArch is still expected to run on GLES2 and GL2 systems.
GL2 is mostly not relevant any longer, but GLES2 is certainly a very relevant platform still and having GLES2 compatibility makes GL2 very easy.

This file has been truncated. show original

Otherwise current RA presets are more or less fine, maybe the missing link is assigning texture format types from presets directly - other than choosing between unorm8, srgb and sfloat16…

anikom15 · 6 November 2025 17:36

float_framebuffer is safe because all 8-bit and 10-bit fixed point color can fit into its 16-bit float format. I think even 12-bit color will fit (not relevant now; maybe in the future).

For pragma format, the shader author needs to know what she’s doing, ensuring whatever possible output values can fit into the chosen format. Since many intermediate shaders pass linear data and data outside of [0-1], and since there is a cost to format conversions, it’s best to use 16-bit float for all intermediate shaders unless you are certain you only need to work in [0-1] gamma-corrected space. Final shaders can be in 8-bit or 10-bit (for SDR or HDR respectively).

If an author is expecting a shader to be used as final pass, but someone wants to add a shader on top of it, that’s a perfect case to use float_framebuffer.

guest.r · 6 November 2025 17:47

You forgot to mention the bandwith cost. Otherwise well written. A fine example is also MegaBezel smooth preset with 4k, some systems just run out of memory. If someone knows what he’s doing then it’s usually fine.

anikom15 · 6 November 2025 17:48

Do you know if it’s more expensive to switch formats or keep things as a higher memory format?

guest.r · 6 November 2025 18:01

Bandwidth can be far more expensive with gpu’s on system memory. I also trided some sfloat32 passes, the performance hit was notable, even with a small size pass (with like 160GB/s bandwidth) and there is basically no conversion needed. The most standard conversion is unorm8 to sfloat32 (or whatever the “phone” uses) and is hard-wired into all relevant modern gpus with shader processors. I woudn’t worry about this.

anikom15 · 6 November 2025 18:05

Thanks. It makes sense.