New CRT shader - Any interest?

kokoko3k · 3 February 2025 05:14

Very nice looking!

I did something like thay by pushing the input signal gain to 3x to fully exploit rgb mask, and then help it by adding the original color.

However I found that the display itself has to be very well calibrated to support this theory behind, otherwise you get evident discontinuity (seen with a simple bw gradient) when starting to add the “helper” color to the mask.

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

beans · 3 February 2025 14:51

@kokoko3k, Thanks! I remember you talking about that on the Discord (and I sent an image of my own experiment).

Here’s what I ended up doing:

vec3 mask(vec3 pixel_value) {
    // I removed some overscan stuff here to make it simpler to read...
    vec3 mask = vec3(1.0);
    float mask_coverage = 1.0;
    if (params.SubpixelMaskPattern == 2.0) {  // <=1080p
        vec3[2] mask_tile = { vec3(1, 0, 1), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 2.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 3.0) {  // 1080p/1440p
        vec3[3] mask_tile = { vec3(0, 0, 1), vec3(0, 1, 0), vec3(1, 0, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 3.0))];
        mask_coverage = 3.0;
    } else if (params.SubpixelMaskPattern == 4.0) {  // 1440p/4k
        vec3[4] mask_tile = { vec3(1, 0, 0), vec3(1, 1, 0), vec3(0, 1, 1), vec3(0, 0, 1) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 4.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 5.0) {  // 4k, lower TVL
        vec3[5] mask_tile = { vec3(1, 0, 0), vec3(1, 0, 1), vec3(0, 0, 1), vec3(0, 1, 0), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 5.0))];
        mask_coverage = 15.0 / 6.0;
    }
    // For BGR subpixel arrangement, just transpose red and blue.

    // Piecewise phase-in. The original image starts phasing in only when we've
    // maxed the brightness we can get from our masked image.
    float s = mask_coverage / (mask_coverage - 1.0);
    vec3 weight = clamp(-s * pixel_value + s, 0.0, 1.0);
    return pixel_value * ((1 - weight) + mask_coverage * mask * weight);
}

In practice, I don’t really notice the discontinuity. If the area where the original image starts to phase in is noticeable, I had a backup plan:

// Cubic phase-in. Ramps up the mask to higher strength faster than linear 
// but has no discontinuity like piecewise.
float s = mask_coverage / (mask_coverage - 1);
float a = -s + 2.0;
float b = s - 3.0;
vec3 weight = a * pow(pixel_value, 3.0) + b * pow(pixel_value, 2.0) + 1.0;
return pixel_value * ((1 - weight) + mask_coverage * mask * weight);

This looks pretty similar but has no discontinuity. Here’s the graph of the mask strength (on the y-axis) and the pixel value (on the x-axis) for mask_coverage = 3. Red is the piecewise phase-in and blue is the cubic.

And here’s a gradient with the piecewise phase-in on the top and the cubic on the bottom. It’s 4k and I recommend viewing at native resolution (when it is rescaled you can definitely notice the discontinuity).

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

There are two reasons the white shirt retains some mask. One is that Genesis Plus GX doesn’t actually output full-scale white values. The other is that this is done per-pixel, and the scanline is brightest in the middle, so that’s the part that gets blown out first. The mask is retained at the edges. I was using a max spot width of 0.9, so the scanlines don’t completely touch. That does reduce the brightness slightly, but it’s the scanlines reducing the brightness, not the mask. At a max spot width of 1.0, they’d merge if the values on those lines were full-scale white and the mask would not be visible. Hopefully that makes sense.

kokoko3k · 3 February 2025 16:36

Yup, I remember (now )

By manually altering the gamma of my display, the upper gradient slowly starts to show the trick, while, as expected, the lower one is much more robust.

I’d been curious to see how a simpler RGB mask performs at 1080p to compare to mine.

Btw great achivement, congrats!

beans · 4 February 2025 01:15

Here are the same gradients with a BGR, 3-pixel mask at 1080p. I prefer BGR for RGB panels because it leaves more of a gap between each phosphor triad, even if it does switch blue and red. This is generated for true sRGB monitors (with the linear portion in the darker region, not simple 2.2 gamma).

The 1080p Streets of Rage screenshot above is the same 3-pixel mask.

beans · 4 February 2025 05:05

I have new shader files here (direct link to zip).

I’d say this is a pretty experimental release. It contains a few things.

Mask support

The shaders now have basic mask support. Currently only aperture grille is supported, and only for monitors with RGB or BGR subpixel arrangements. There is no “mask strength” parameter yet. Masks fade out as the area gets brighter so that they do not reduce the total brightness at all. (@kokoko3k, I ended up using the cubic blending as a “safer” default option.)

New presets: crt-beans-rgb and crt-beans-svideo

The original shader is now used in two preset files: crt-beans-rgb.slangp and crt-beans-svideo.slangp. They are the same set of shaders, but the S-Video preset has the low pass filter tuned to simulate S-Video signal bandwidth. Different consoles may have output signals with different bandwidths, so I just sort of picked some values that seemed like a reasonable default.

New preset: crt-beans-gaussian

There is a new crt-beans-gaussian.slangp preset. This is similar to the previous RGB preset, but uses a gaussian shape for the CRT spot. I would guess that this is more accurate to real CRTs. The image is slightly blurrier than the RGB preset because the gaussian has fatter tails.

The major advantage of this is that the spot width (and thus the scanline width) can be pushed to 1.2. You can basically get fatter, more overlapping lines. This allows simulating smaller, lower quality CRTs. The normal RGB preset doesn’t work properly with widths above 1.0 (that’s just a mathematical consequence of the spot function it uses).

The disadvantage is that the gaussian preset is much slower than the RGB preset. Because the gaussian’s tails are fatter, each area affects more pixels around it, which means a lot more texture samples are necessary. The scanline pass itself takes about 2x the time of the original, and I’d guess the whole shader takes about 1.6x the time of the original (I don’t have the numbers in front of me, currently).

New preset: crt-beans-monitor (no low pass filter)

There is a new crt-beans-monitor.slangp preset. The goal of this preset is to allow inputs other than 240p/480i or 288p/576i (basically, other than 15khz television). Thus the “monitor” name—it can be used for content at vga/480p and beyond.

This is a little complicated to explain. The original shader can’t be used without the low pass filter. It solves the scanline integral numerically, which requires the low pass filter to resample the input. This new preset solves the scanline integral analytically, which, conversely, requires that there is no resampling of the input. The input is interpreted as a piecewise-constant function and the integral is solved in pieces.

When resampling, I need to pick a number of samples to output for each line (a new image width, basically). The number of samples required changes based on the number of lines there are (the input image height). Unfortunately there is no way to do this within the limits of the slangp preset format. This, and another subtle issue with the presentation of the low pass frequencies parameters, led me to remove the low pass filter as a way to accommodate higher input resolutions.

There are two downsides to this. First is that there is no low pass filter, so you lose that capability. The output is a bit sharper horizontally than it probably should be. Second is that it is slightly slower. The analytical solution generally requires fewer texture samples but significantly more math. The worst performance is for very elongated pixel aspect ratios. For example, a 640x240 input will lead to worse performance than a 640x480 input.

Final thoughts

I may be able to upload some sample images tomorrow (the Streets of Rage screenshots above are pretty representative of the RGB preset).

I don’t know if there is really any interest in the gaussian or monitor presets. They might end up being a bit more confusing. They were interesting to make, though!

Cyber · 4 February 2025 06:25

No need to reinvent the wheel as we already have Mask Layouts which happen to work with W-OLED/RWGB subpixel structures in CRT-Guest-Advanced and Sony Megatron Color Video Monitor.

You can probably use those as a starting point but of course better, novel implementations are welcomed as well.

So far 4K W-OLED users seem to be covered but this doesn’t seem to carry over to 1440p displays for whatever reason.

I’ll share the posts which led to Sony Megatron Color Video Monitor gaining the RWGB Display Subpixel Layout.

kokoko3k · 4 February 2025 06:31

Otoh, sometimes reinventing the wheel may lead to a rounder wheel

Cyber · 4 February 2025 07:34

I agree, that’s why I included this part:

beans · 4 February 2025 14:24

Thanks, I will take a look at those posts. There won’t be any reinventing the wheel here—I don’t have an OLED display so I can’t even test any of these. And I assume nobody has any subpixels masks figured out for QD-OLEDs or small pentile OLEDs. It doesn’t really seem possible.

beans · 15 April 2025 03:53

There is a new version available here (direct link here).

The main addition is an option for a dynamically-generated aperture grille. This mask is not dependent on particular subpixel arrangements and can be scaled to arbitrary densities. I have 400-800 phosphor triads per output width as options, which seems reasonable to me. This should hopefully be a good option for weird subpixel arrangements (e.g. QD-OLED or pentile) and TATE mode (where the subpixels will be oriented incorrectly for subpixel masks), but it works well even on normal RGB LCD panels. It really shines at 4k but works reasonably well down to 1080p if you don’t go nuts with the density.

I have also reworked the mask blending function that I was using, to fix a big that caused clipping with certain inputs.

Finally, there are also some performance improvements, most notably for the crt-beans-monitor preset (the analytical scanline method).

@hunterk, the include file for the mask handling is here if you want to take a look. If you are interested in putting this in the repo, I can open a pull request. You may not want the subpixel stuff since you already have many of those masks in your own functions. I may add a mask strength parameter to the blend function, so it might make sense to wait for that (and maybe wait until this is tested a bit).

Here’s an example image at 1080p, with the dynamic aperture grille at 550 phosphor triads:

Even oxipng couldn’t compress the 4k image small enough for this forum, and webp isn’t supported, so you’ll have to see the 4k image at imgbb:

I only have a 4k monitor, so I’ve been testing these at lower resolutions by nearest neighbor scaling them to 200%. It would be great if anyone could let me know how they look on other native resolutions!

md2mcb · 15 April 2025 22:48

Testing on native 1080p. I think it gives pretty good results, congratulations for your work. However, it’s still a pretty heavy shader, compared to others which try similar results. The problem with 480p content also persists: the shader zooms in and it can’t be used with those.

hunterk · 15 April 2025 23:04

I don’t mind having another mask function in the mix. Probably a good idea to wait until everything settles with testing and whatnot, but yeah, throw us a PR and we’ll get it in there

beans · 16 April 2025 03:47

Thanks for testing!

What core, operating system, and graphics card are you using when you are seeing the zoomed in 480p problem? It works fine here. I’m using Linux, AMD graphics, and I tested on Flycast because Dolphin is broken on my install. Other cores (bsnes, mupen64, etc) also output something that looks like 480p when they are in interlacing modes or when you use upscaling, and that also works fine. I suspect there might be a Dolphin core or Retroarch bug that is triggering this behavior, but it is hard for me to fix because I can’t replicate it.

Note that I recommend the crt-beans-monitor preset for 480p if you don’t want interlacing. The rgb and svideo presets are tuned for 240p/480i (or 288p/576i, basically 15kHz TVs).

As for the performance, it has become a heavier shader than most, but it is still lighter weight than popular shaders like crt-guest-advanced or crt-royale. Currently the performance breakdown (on my hardware in 4k) is roughly 60% of time doing scanlines, 30% of time doing the glow, and 10% doing other stuff. The glow is already pretty fast for requiring such a wide blur (which is a fundamentally expensive operation). Most (but not all) other shaders have smaller blur radiuses.

The main performance hog is the scanline simulation. From what I can tell, most shaders basically blur the line horizontally, then spread the line vertically (maybe based on the brightness at that location). I actually calculate what a pixel would look like if a round (or round-ish, depending on the preset) spot were to be scanned over the line, varying in brightness and width based on the intensity of the input. I think this approach is in some ways more flexible and is more faithful to the way CRTs actually worked. It is more expensive, though!

A few of the advantages of this approach are:

Dark areas of the image have more detail, even horizontally, because the spot is smaller. Brighter areas get blown out in comparison.
It doesn’t matter if the input has the pixels doubled horizontally (or tripled, quadrupled, etc). The output will look the same. Some emulators actually do this for various reasons.
More generally, the sharpness of the output doesn’t change based on the horizontal resolution of the input. You could actually achieve this without fully simulating the spot, but it can be more expensive and most shaders don’t do it.
The scanlines don’t alter the tonality of the image. The dark and bright areas don’t get darker or brighter relative to each other. The average pixel value of the image is only scaled linearly by the maximum spot size parameter. I haven’t worked out the math, but it isn’t clear to me that this is the case with some of the simpler scanline implementations.

I think it’s valid to ask whether it’s worth it, though. If you favor performance, that’s a reasonable preference. I could make a faster version that works more like other shaders, I’m just not sure that I want to give up what I think makes this shader unique.

Hari-82 · 16 April 2025 15:37

Just quicky tried your shader and I like it a lot!

I’m on a 1080p monitor so I had to set MaskType = "2.000000", other than that is good to go “out of the box”, without changing much else!

Well done!

hunterk · 17 April 2025 02:05

It sounds like it will be especially useful for Ares, which also uses our shaders but has problems with some of them due to its doubling/tripling/etc of the horizontal res on some of its cores.

Make a fast version if you like, but I will warn you that it will never be fast enough lol. Someone will always have a weaker device that they want to use it on, and if you just make it a little bit faster, it’ll be perfect, I promise

md2mcb · 17 April 2025 18:32

Sorry it took me long to reply, I was a little out of time.

Don’t worry much about what I said. Your shader is already good, and produces gorgeous results, definitely among the accurate ones. I’m far from being an expert, so I won’t argue, but I want to explain myself further:

I have three devices (two of them with integrated GPUs) and your shader runs fullspeed in all of them. Far from being an “omg so slow” issue, it’s just there are already other works which emulate the beam dynamics of scanlines and are a bit lighter (like crt-hyllian and crt-royale-fast). I imagine their approach isn’t the same as yours, and the results different, but perhaps there’s still something that can be improved about your shader performance. If there’s not, it’s still not the end of the world and it won’t change the already beautiful picture quality it provides. It’s just that, by having a slower shader, it may (possibility) tip people into thinking “why not use guest’s instead”.
I really forgot to clarify on my second report, but, yes, I was talking about the Dolphin core: your shader does not work on it, whereas many others do (be they simple or complex ones). The issue is still the same: seems like it divides the image into a 2x2 grid and zooms into the top left one. I tested on native resolution only. I also have a Linux + AMD device and Dolphin works fine there, using the AppImage version of RetroArch. Other 480i/p cores are fine indeed (like Flycast or LRPS2).

It’s the only couple of issues I really found with your shader. Don’t think I’m dissing your work, because I really like it. I’m fully aware the situation is not favorable either: there are already plenty of good crt shaders around, what more can be done? I recognize you’re fighting an uphill battle. However, I still believe you bring something interesting to the table, your shader builds upon the (classic, but outdated) GTU and it provides an excellent middleground for accuracy without a swarm of options.

Anyway, since I have a 1080p screen, feel free to ask for some tests, and I’ll do my best to execute them when I have the proper time. From a little less of two hours of testing, I can already say it’s pleasantly usable on 1080p, although I had to switch to the dynamic mask for better results.

beans · 19 April 2025 03:34

I was able to get Dolphin working from the Flatpak and I didn’t encounter the zooming behavior. I’ll try with some more settings and see if I can figure out what’s going on. Is it a Windows/Nvidia device where you see this problem?

This is basically what I’m aiming for. I didn’t necessarily want it to do everything, but I want what it does do to be top-notch with straightforward configuration options, with a focus on faithfulness to the way a CRT works.

I will think some more on the performance aspect and see if I can come up with anything. I think the biggest gains would come by applying the horizontal and vertical calculations for the scanlines into different passes, but that doesn’t work with the math as-is. It might be worth comparing to see if the quality difference is noticeable, though.

Thanks for testing, I appreciate it!

md2mcb · 19 April 2025 04:13

All my devices are Linux ones, although with different distributions. I will do further testing with Dolphin soon and see if the issue happens on every one of them. Luckily, I will be able to pinpoint it or blame it on some goofiness of mine.

By the way, here’s a single comparison of crt-beans against crt-hyllain and royale-fast, all shaders which emulate the beam dynamics of a CRT. The three pictures look great to me, with minor differences that boil down to personal preferences. Definitely usable already, I haven’t detected any glaring issue yet. As usual with proper scanline emulation, moire patterns happen, but it can be mitigated by adjusting the phosphor triads (the default values are fine).

Note: all shaders are using a composite LUT.

crt-beans-monitor (sRGB gamma and dynamic mask)

crt-hyllian

crt-royale-fast (using a slot mask)

beans · 26 April 2025 12:47

Thanks, @md2mcb. I agree that those images look pretty similar. Especially at 1080p with a mask, the scanline probably doesn’t need to be simulated really accurately.

I’ve been thinking about a potential faster version, and I came up with something by splitting the scanline calculations into two passes. It’s a bit of a hack and probably still needs some tuning. The scanline dynamics are not as accurate, but I think it looks reasonable. I removed the glow as well, so it’s somewhere between crt-pi and crt-easymode in terms of speed.

The nice thing is that it keeps the resolution independence (the horizontal resolution still doesn’t affect how sharp the pixel transitions are) and also preserves the tonality of the original image.

I’m away from my computer but might have the code up in a couple days.

md2mcb · 26 April 2025 14:13

It’s good. Having two options, one of them faster, greatly enlarges your audience. Most of the lighter shaders are stuff from many years ago, and there’s a shortage of lighter shaders that implement modern solutions. However, your default, heavier shader is also another great option for accuracy. Yes, it’s a shame it’s a bit heavier than intended, but the results are satisfactory and that shouldn’t be overlooked. The way it is now, I think crt-beans can easily fit into the official repository. I hope you can upload it there after you feel content with some adjustments.