New CRT shader - Any interest?

@NonWonderDog I’ve had some time to look at your fork, and I originally misunderstood what you were doing with the 1/width^2 factor. Your 1/width^2 is correct, and I had made a mistake projecting the spot function onto 2 dimensions. There is actually another mistake that somewhat compensated for this but clips and crushes the highlights when the maximum spot size is reduced. I’ve fixed both and the result looks better, and usually brighter.

I also wrote a test to make sure the tonality of the image is preserved. I probably should have done this before! I generated full-screen, solid-color images of varying brightness levels, ran them through the simulator, then got the average brightness. If the maximum spot size is 1.0, they should have equal brightness as the input images (compensating for CRT gamma). This is what it looked like before, with the input brightness on the x-axis and the output brightness on the y-axis. Each image is a dot:

incorrect

This is what it looks like with the fix:

fixed

So thank you for pointing out my mistake! I’ll clean up, commit, and push some time soon, as well as generate another archive to download. I’m a little short on time currently.

Oh, and I can’t test the Dolphin core because it crashes on my installation (something about not being linked properly to libbz2—I haven’t really spent any time looking into it).

2 Likes

For one, online consensus seems to be that the NES (at least) just looks better on TVs without one. Secondly, a comb filter ultimately sacrifices temporal accuracy in favor of spacial accuracy. Since we’re adding the imperfections, that just seems unnecessary when we can just turn down the strength instead. Worst case is you spend a month writing the world’s worst motion-blur shader.

A comb filter doesn’t necessarily have a temporal, inter-frame component. My understanding is that most comb filters were simple 2-line or 3-line filters, and later 2D adaptive filters. Only the 3D adaptive filters in the very late model TVs used anything outside the current frame (or field or whatever).

There are times when a comb filter won’t work well, though. Lots of old systems (like the Mega Drive/Genesis) didn’t shift the chroma phase between lines at all, which defeats the comb filter. Maybe an argument could be made that there isn’t much point in adding one.

I’ve updated the repository with the fixes and test. The glow is also less strong by default. I haven’t split out a basic version yet.

I’ve created a new zip file here (direct link).

Here are some quick screenshots with the default settings.

5 Likes

I’ve been experimenting with masks. I have some subpixel masks implemented in the shader (not updated on GitHub yet).

One of the main issues with masks is keeping the brightness high enough. What I settled on doing was pretty simple: let each pixel get as bright as it could with the mask at full strength, and then blend in the original, un-masked image if it needs to get brighter than the mask would allow. This allows masking with no brightness penalty, although brighter sections of the image will have a weaker mask pattern. It also preserves the tonality of the image, so I don’t have to resort to things like messing with the gamma or raising the midtones.

Here are some images with an aperture grille at three different resolutions. I recommend viewing them at their original size or the mask will look messed up.

EDIT: I should mention that these images are for RGB subpixels. BGR is also easy to support. OLED subpixel structures are… complicated.

8 Likes

Very nice looking!

I did something like thay by pushing the input signal gain to 3x to fully exploit rgb mask, and then help it by adding the original color.

However I found that the display itself has to be very well calibrated to support this theory behind, otherwise you get evident discontinuity (seen with a simple bw gradient) when starting to add the “helper” color to the mask.

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

2 Likes

@kokoko3k, Thanks! I remember you talking about that on the Discord (and I sent an image of my own experiment).

Here’s what I ended up doing:

vec3 mask(vec3 pixel_value) {
    // I removed some overscan stuff here to make it simpler to read...
    vec3 mask = vec3(1.0);
    float mask_coverage = 1.0;
    if (params.SubpixelMaskPattern == 2.0) {  // <=1080p
        vec3[2] mask_tile = { vec3(1, 0, 1), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 2.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 3.0) {  // 1080p/1440p
        vec3[3] mask_tile = { vec3(0, 0, 1), vec3(0, 1, 0), vec3(1, 0, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 3.0))];
        mask_coverage = 3.0;
    } else if (params.SubpixelMaskPattern == 4.0) {  // 1440p/4k
        vec3[4] mask_tile = { vec3(1, 0, 0), vec3(1, 1, 0), vec3(0, 1, 1), vec3(0, 0, 1) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 4.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 5.0) {  // 4k, lower TVL
        vec3[5] mask_tile = { vec3(1, 0, 0), vec3(1, 0, 1), vec3(0, 0, 1), vec3(0, 1, 0), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 5.0))];
        mask_coverage = 15.0 / 6.0;
    }
    // For BGR subpixel arrangement, just transpose red and blue.

    // Piecewise phase-in. The original image starts phasing in only when we've
    // maxed the brightness we can get from our masked image.
    float s = mask_coverage / (mask_coverage - 1.0);
    vec3 weight = clamp(-s * pixel_value + s, 0.0, 1.0);
    return pixel_value * ((1 - weight) + mask_coverage * mask * weight);
}

In practice, I don’t really notice the discontinuity. If the area where the original image starts to phase in is noticeable, I had a backup plan:

// Cubic phase-in. Ramps up the mask to higher strength faster than linear 
// but has no discontinuity like piecewise.
float s = mask_coverage / (mask_coverage - 1);
float a = -s + 2.0;
float b = s - 3.0;
vec3 weight = a * pow(pixel_value, 3.0) + b * pow(pixel_value, 2.0) + 1.0;
return pixel_value * ((1 - weight) + mask_coverage * mask * weight);

This looks pretty similar but has no discontinuity. Here’s the graph of the mask strength (on the y-axis) and the pixel value (on the x-axis) for mask_coverage = 3. Red is the piecewise phase-in and blue is the cubic.

And here’s a gradient with the piecewise phase-in on the top and the cubic on the bottom. It’s 4k and I recommend viewing at native resolution (when it is rescaled you can definitely notice the discontinuity).

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

There are two reasons the white shirt retains some mask. One is that Genesis Plus GX doesn’t actually output full-scale white values. The other is that this is done per-pixel, and the scanline is brightest in the middle, so that’s the part that gets blown out first. The mask is retained at the edges. I was using a max spot width of 0.9, so the scanlines don’t completely touch. That does reduce the brightness slightly, but it’s the scanlines reducing the brightness, not the mask. At a max spot width of 1.0, they’d merge if the values on those lines were full-scale white and the mask would not be visible. Hopefully that makes sense.

2 Likes

Yup, I remember (now :smile: )

By manually altering the gamma of my display, the upper gradient slowly starts to show the trick, while, as expected, the lower one is much more robust.

I’d been curious to see how a simpler RGB mask performs at 1080p to compare to mine.

Btw great achivement, congrats!

2 Likes

Here are the same gradients with a BGR, 3-pixel mask at 1080p. I prefer BGR for RGB panels because it leaves more of a gap between each phosphor triad, even if it does switch blue and red. This is generated for true sRGB monitors (with the linear portion in the darker region, not simple 2.2 gamma).

The 1080p Streets of Rage screenshot above is the same 3-pixel mask.

1 Like

I have new shader files here (direct link to zip).

I’d say this is a pretty experimental release. It contains a few things.

Mask support

The shaders now have basic mask support. Currently only aperture grille is supported, and only for monitors with RGB or BGR subpixel arrangements. There is no “mask strength” parameter yet. Masks fade out as the area gets brighter so that they do not reduce the total brightness at all. (@kokoko3k, I ended up using the cubic blending as a “safer” default option.)

New presets: crt-beans-rgb and crt-beans-svideo

The original shader is now used in two preset files: crt-beans-rgb.slangp and crt-beans-svideo.slangp. They are the same set of shaders, but the S-Video preset has the low pass filter tuned to simulate S-Video signal bandwidth. Different consoles may have output signals with different bandwidths, so I just sort of picked some values that seemed like a reasonable default.

New preset: crt-beans-gaussian

There is a new crt-beans-gaussian.slangp preset. This is similar to the previous RGB preset, but uses a gaussian shape for the CRT spot. I would guess that this is more accurate to real CRTs. The image is slightly blurrier than the RGB preset because the gaussian has fatter tails.

The major advantage of this is that the spot width (and thus the scanline width) can be pushed to 1.2. You can basically get fatter, more overlapping lines. This allows simulating smaller, lower quality CRTs. The normal RGB preset doesn’t work properly with widths above 1.0 (that’s just a mathematical consequence of the spot function it uses).

The disadvantage is that the gaussian preset is much slower than the RGB preset. Because the gaussian’s tails are fatter, each area affects more pixels around it, which means a lot more texture samples are necessary. The scanline pass itself takes about 2x the time of the original, and I’d guess the whole shader takes about 1.6x the time of the original (I don’t have the numbers in front of me, currently).

New preset: crt-beans-monitor (no low pass filter)

There is a new crt-beans-monitor.slangp preset. The goal of this preset is to allow inputs other than 240p/480i or 288p/576i (basically, other than 15khz television). Thus the “monitor” name—it can be used for content at vga/480p and beyond.

This is a little complicated to explain. The original shader can’t be used without the low pass filter. It solves the scanline integral numerically, which requires the low pass filter to resample the input. This new preset solves the scanline integral analytically, which, conversely, requires that there is no resampling of the input. The input is interpreted as a piecewise-constant function and the integral is solved in pieces.

When resampling, I need to pick a number of samples to output for each line (a new image width, basically). The number of samples required changes based on the number of lines there are (the input image height). Unfortunately there is no way to do this within the limits of the slangp preset format. This, and another subtle issue with the presentation of the low pass frequencies parameters, led me to remove the low pass filter as a way to accommodate higher input resolutions.

There are two downsides to this. First is that there is no low pass filter, so you lose that capability. The output is a bit sharper horizontally than it probably should be. Second is that it is slightly slower. The analytical solution generally requires fewer texture samples but significantly more math. The worst performance is for very elongated pixel aspect ratios. For example, a 640x240 input will lead to worse performance than a 640x480 input.

Final thoughts

I may be able to upload some sample images tomorrow (the Streets of Rage screenshots above are pretty representative of the RGB preset).

I don’t know if there is really any interest in the gaussian or monitor presets. They might end up being a bit more confusing. They were interesting to make, though!

2 Likes

No need to reinvent the wheel as we already have Mask Layouts which happen to work with W-OLED/RWGB subpixel structures in CRT-Guest-Advanced and Sony Megatron Color Video Monitor.

You can probably use those as a starting point but of course better, novel implementations are welcomed as well.

So far 4K W-OLED users seem to be covered but this doesn’t seem to carry over to 1440p displays for whatever reason.

I’ll share the posts which led to Sony Megatron Color Video Monitor gaining the RWGB Display Subpixel Layout.

1 Like

Otoh, sometimes reinventing the wheel may lead to a rounder wheel :wink:

1 Like

I agree, that’s why I included this part:

2 Likes

Thanks, I will take a look at those posts. There won’t be any reinventing the wheel here—I don’t have an OLED display so I can’t even test any of these. And I assume nobody has any subpixels masks figured out for QD-OLEDs or small pentile OLEDs. It doesn’t really seem possible.

1 Like

There is a new version available here (direct link here).

The main addition is an option for a dynamically-generated aperture grille. This mask is not dependent on particular subpixel arrangements and can be scaled to arbitrary densities. I have 400-800 phosphor triads per output width as options, which seems reasonable to me. This should hopefully be a good option for weird subpixel arrangements (e.g. QD-OLED or pentile) and TATE mode (where the subpixels will be oriented incorrectly for subpixel masks), but it works well even on normal RGB LCD panels. It really shines at 4k but works reasonably well down to 1080p if you don’t go nuts with the density.

I have also reworked the mask blending function that I was using, to fix a big that caused clipping with certain inputs.

Finally, there are also some performance improvements, most notably for the crt-beans-monitor preset (the analytical scanline method).

@hunterk, the include file for the mask handling is here if you want to take a look. If you are interested in putting this in the repo, I can open a pull request. You may not want the subpixel stuff since you already have many of those masks in your own functions. I may add a mask strength parameter to the blend function, so it might make sense to wait for that (and maybe wait until this is tested a bit).

Here’s an example image at 1080p, with the dynamic aperture grille at 550 phosphor triads:

Even oxipng couldn’t compress the 4k image small enough for this forum, and webp isn’t supported, so you’ll have to see the 4k image at imgbb:

I only have a 4k monitor, so I’ve been testing these at lower resolutions by nearest neighbor scaling them to 200%. It would be great if anyone could let me know how they look on other native resolutions!

8 Likes

Testing on native 1080p. I think it gives pretty good results, congratulations for your work. However, it’s still a pretty heavy shader, compared to others which try similar results. The problem with 480p content also persists: the shader zooms in and it can’t be used with those.

1 Like

I don’t mind having another mask function in the mix. Probably a good idea to wait until everything settles with testing and whatnot, but yeah, throw us a PR and we’ll get it in there :slight_smile:

1 Like

Thanks for testing!

What core, operating system, and graphics card are you using when you are seeing the zoomed in 480p problem? It works fine here. I’m using Linux, AMD graphics, and I tested on Flycast because Dolphin is broken on my install. Other cores (bsnes, mupen64, etc) also output something that looks like 480p when they are in interlacing modes or when you use upscaling, and that also works fine. I suspect there might be a Dolphin core or Retroarch bug that is triggering this behavior, but it is hard for me to fix because I can’t replicate it.

Note that I recommend the crt-beans-monitor preset for 480p if you don’t want interlacing. The rgb and svideo presets are tuned for 240p/480i (or 288p/576i, basically 15kHz TVs).

As for the performance, it has become a heavier shader than most, but it is still lighter weight than popular shaders like crt-guest-advanced or crt-royale. Currently the performance breakdown (on my hardware in 4k) is roughly 60% of time doing scanlines, 30% of time doing the glow, and 10% doing other stuff. The glow is already pretty fast for requiring such a wide blur (which is a fundamentally expensive operation). Most (but not all) other shaders have smaller blur radiuses.

The main performance hog is the scanline simulation. From what I can tell, most shaders basically blur the line horizontally, then spread the line vertically (maybe based on the brightness at that location). I actually calculate what a pixel would look like if a round (or round-ish, depending on the preset) spot were to be scanned over the line, varying in brightness and width based on the intensity of the input. I think this approach is in some ways more flexible and is more faithful to the way CRTs actually worked. It is more expensive, though!

A few of the advantages of this approach are:

  • Dark areas of the image have more detail, even horizontally, because the spot is smaller. Brighter areas get blown out in comparison.
  • It doesn’t matter if the input has the pixels doubled horizontally (or tripled, quadrupled, etc). The output will look the same. Some emulators actually do this for various reasons.
  • More generally, the sharpness of the output doesn’t change based on the horizontal resolution of the input. You could actually achieve this without fully simulating the spot, but it can be more expensive and most shaders don’t do it.
  • The scanlines don’t alter the tonality of the image. The dark and bright areas don’t get darker or brighter relative to each other. The average pixel value of the image is only scaled linearly by the maximum spot size parameter. I haven’t worked out the math, but it isn’t clear to me that this is the case with some of the simpler scanline implementations.

I think it’s valid to ask whether it’s worth it, though. If you favor performance, that’s a reasonable preference. I could make a faster version that works more like other shaders, I’m just not sure that I want to give up what I think makes this shader unique.

3 Likes

Just quicky tried your shader and I like it a lot!

I’m on a 1080p monitor so I had to set MaskType = "2.000000", other than that is good to go “out of the box”, without changing much else!

Well done! :+1:

1 Like

It sounds like it will be especially useful for Ares, which also uses our shaders but has problems with some of them due to its doubling/tripling/etc of the horizontal res on some of its cores.

Make a fast version if you like, but I will warn you that it will never be fast enough lol. Someone will always have a weaker device that they want to use it on, and if you just make it a little bit faster, it’ll be perfect, I promise :stuck_out_tongue:

1 Like

Sorry it took me long to reply, I was a little out of time.

Don’t worry much about what I said. Your shader is already good, and produces gorgeous results, definitely among the accurate ones. I’m far from being an expert, so I won’t argue, but I want to explain myself further:

  1. I have three devices (two of them with integrated GPUs) and your shader runs fullspeed in all of them. Far from being an “omg so slow” issue, it’s just there are already other works which emulate the beam dynamics of scanlines and are a bit lighter (like crt-hyllian and crt-royale-fast). I imagine their approach isn’t the same as yours, and the results different, but perhaps there’s still something that can be improved about your shader performance. If there’s not, it’s still not the end of the world and it won’t change the already beautiful picture quality it provides. It’s just that, by having a slower shader, it may (possibility) tip people into thinking “why not use guest’s instead”.

  2. I really forgot to clarify on my second report, but, yes, I was talking about the Dolphin core: your shader does not work on it, whereas many others do (be they simple or complex ones). The issue is still the same: seems like it divides the image into a 2x2 grid and zooms into the top left one. I tested on native resolution only. I also have a Linux + AMD device and Dolphin works fine there, using the AppImage version of RetroArch. Other 480i/p cores are fine indeed (like Flycast or LRPS2).

It’s the only couple of issues I really found with your shader. Don’t think I’m dissing your work, because I really like it. I’m fully aware the situation is not favorable either: there are already plenty of good crt shaders around, what more can be done? I recognize you’re fighting an uphill battle. However, I still believe you bring something interesting to the table, your shader builds upon the (classic, but outdated) GTU and it provides an excellent middleground for accuracy without a swarm of options.

Anyway, since I have a 1080p screen, feel free to ask for some tests, and I’ll do my best to execute them when I have the proper time. From a little less of two hours of testing, I can already say it’s pleasantly usable on 1080p, although I had to switch to the dynamic mask for better results.

1 Like