New CRT shader - Any interest?

I have new shader files here (direct link to zip).

I’d say this is a pretty experimental release. It contains a few things.

Mask support

The shaders now have basic mask support. Currently only aperture grille is supported, and only for monitors with RGB or BGR subpixel arrangements. There is no “mask strength” parameter yet. Masks fade out as the area gets brighter so that they do not reduce the total brightness at all. (@kokoko3k, I ended up using the cubic blending as a “safer” default option.)

New presets: crt-beans-rgb and crt-beans-svideo

The original shader is now used in two preset files: crt-beans-rgb.slangp and crt-beans-svideo.slangp. They are the same set of shaders, but the S-Video preset has the low pass filter tuned to simulate S-Video signal bandwidth. Different consoles may have output signals with different bandwidths, so I just sort of picked some values that seemed like a reasonable default.

New preset: crt-beans-gaussian

There is a new crt-beans-gaussian.slangp preset. This is similar to the previous RGB preset, but uses a gaussian shape for the CRT spot. I would guess that this is more accurate to real CRTs. The image is slightly blurrier than the RGB preset because the gaussian has fatter tails.

The major advantage of this is that the spot width (and thus the scanline width) can be pushed to 1.2. You can basically get fatter, more overlapping lines. This allows simulating smaller, lower quality CRTs. The normal RGB preset doesn’t work properly with widths above 1.0 (that’s just a mathematical consequence of the spot function it uses).

The disadvantage is that the gaussian preset is much slower than the RGB preset. Because the gaussian’s tails are fatter, each area affects more pixels around it, which means a lot more texture samples are necessary. The scanline pass itself takes about 2x the time of the original, and I’d guess the whole shader takes about 1.6x the time of the original (I don’t have the numbers in front of me, currently).

New preset: crt-beans-monitor (no low pass filter)

There is a new crt-beans-monitor.slangp preset. The goal of this preset is to allow inputs other than 240p/480i or 288p/576i (basically, other than 15khz television). Thus the “monitor” name—it can be used for content at vga/480p and beyond.

This is a little complicated to explain. The original shader can’t be used without the low pass filter. It solves the scanline integral numerically, which requires the low pass filter to resample the input. This new preset solves the scanline integral analytically, which, conversely, requires that there is no resampling of the input. The input is interpreted as a piecewise-constant function and the integral is solved in pieces.

When resampling, I need to pick a number of samples to output for each line (a new image width, basically). The number of samples required changes based on the number of lines there are (the input image height). Unfortunately there is no way to do this within the limits of the slangp preset format. This, and another subtle issue with the presentation of the low pass frequencies parameters, led me to remove the low pass filter as a way to accommodate higher input resolutions.

There are two downsides to this. First is that there is no low pass filter, so you lose that capability. The output is a bit sharper horizontally than it probably should be. Second is that it is slightly slower. The analytical solution generally requires fewer texture samples but significantly more math. The worst performance is for very elongated pixel aspect ratios. For example, a 640x240 input will lead to worse performance than a 640x480 input.

Final thoughts

I may be able to upload some sample images tomorrow (the Streets of Rage screenshots above are pretty representative of the RGB preset).

I don’t know if there is really any interest in the gaussian or monitor presets. They might end up being a bit more confusing. They were interesting to make, though!

2 Likes

No need to reinvent the wheel as we already have Mask Layouts which happen to work with W-OLED/RWGB subpixel structures in CRT-Guest-Advanced and Sony Megatron Color Video Monitor.

You can probably use those as a starting point but of course better, novel implementations are welcomed as well.

So far 4K W-OLED users seem to be covered but this doesn’t seem to carry over to 1440p displays for whatever reason.

I’ll share the posts which led to Sony Megatron Color Video Monitor gaining the RWGB Display Subpixel Layout.

1 Like

Otoh, sometimes reinventing the wheel may lead to a rounder wheel :wink:

1 Like

I agree, that’s why I included this part:

2 Likes

Thanks, I will take a look at those posts. There won’t be any reinventing the wheel here—I don’t have an OLED display so I can’t even test any of these. And I assume nobody has any subpixels masks figured out for QD-OLEDs or small pentile OLEDs. It doesn’t really seem possible.

1 Like

There is a new version available here (direct link here).

The main addition is an option for a dynamically-generated aperture grille. This mask is not dependent on particular subpixel arrangements and can be scaled to arbitrary densities. I have 400-800 phosphor triads per output width as options, which seems reasonable to me. This should hopefully be a good option for weird subpixel arrangements (e.g. QD-OLED or pentile) and TATE mode (where the subpixels will be oriented incorrectly for subpixel masks), but it works well even on normal RGB LCD panels. It really shines at 4k but works reasonably well down to 1080p if you don’t go nuts with the density.

I have also reworked the mask blending function that I was using, to fix a big that caused clipping with certain inputs.

Finally, there are also some performance improvements, most notably for the crt-beans-monitor preset (the analytical scanline method).

@hunterk, the include file for the mask handling is here if you want to take a look. If you are interested in putting this in the repo, I can open a pull request. You may not want the subpixel stuff since you already have many of those masks in your own functions. I may add a mask strength parameter to the blend function, so it might make sense to wait for that (and maybe wait until this is tested a bit).

Here’s an example image at 1080p, with the dynamic aperture grille at 550 phosphor triads:

Even oxipng couldn’t compress the 4k image small enough for this forum, and webp isn’t supported, so you’ll have to see the 4k image at imgbb:

I only have a 4k monitor, so I’ve been testing these at lower resolutions by nearest neighbor scaling them to 200%. It would be great if anyone could let me know how they look on other native resolutions!

8 Likes

Testing on native 1080p. I think it gives pretty good results, congratulations for your work. However, it’s still a pretty heavy shader, compared to others which try similar results. The problem with 480p content also persists: the shader zooms in and it can’t be used with those.

1 Like

I don’t mind having another mask function in the mix. Probably a good idea to wait until everything settles with testing and whatnot, but yeah, throw us a PR and we’ll get it in there :slight_smile:

1 Like

Thanks for testing!

What core, operating system, and graphics card are you using when you are seeing the zoomed in 480p problem? It works fine here. I’m using Linux, AMD graphics, and I tested on Flycast because Dolphin is broken on my install. Other cores (bsnes, mupen64, etc) also output something that looks like 480p when they are in interlacing modes or when you use upscaling, and that also works fine. I suspect there might be a Dolphin core or Retroarch bug that is triggering this behavior, but it is hard for me to fix because I can’t replicate it.

Note that I recommend the crt-beans-monitor preset for 480p if you don’t want interlacing. The rgb and svideo presets are tuned for 240p/480i (or 288p/576i, basically 15kHz TVs).

As for the performance, it has become a heavier shader than most, but it is still lighter weight than popular shaders like crt-guest-advanced or crt-royale. Currently the performance breakdown (on my hardware in 4k) is roughly 60% of time doing scanlines, 30% of time doing the glow, and 10% doing other stuff. The glow is already pretty fast for requiring such a wide blur (which is a fundamentally expensive operation). Most (but not all) other shaders have smaller blur radiuses.

The main performance hog is the scanline simulation. From what I can tell, most shaders basically blur the line horizontally, then spread the line vertically (maybe based on the brightness at that location). I actually calculate what a pixel would look like if a round (or round-ish, depending on the preset) spot were to be scanned over the line, varying in brightness and width based on the intensity of the input. I think this approach is in some ways more flexible and is more faithful to the way CRTs actually worked. It is more expensive, though!

A few of the advantages of this approach are:

  • Dark areas of the image have more detail, even horizontally, because the spot is smaller. Brighter areas get blown out in comparison.
  • It doesn’t matter if the input has the pixels doubled horizontally (or tripled, quadrupled, etc). The output will look the same. Some emulators actually do this for various reasons.
  • More generally, the sharpness of the output doesn’t change based on the horizontal resolution of the input. You could actually achieve this without fully simulating the spot, but it can be more expensive and most shaders don’t do it.
  • The scanlines don’t alter the tonality of the image. The dark and bright areas don’t get darker or brighter relative to each other. The average pixel value of the image is only scaled linearly by the maximum spot size parameter. I haven’t worked out the math, but it isn’t clear to me that this is the case with some of the simpler scanline implementations.

I think it’s valid to ask whether it’s worth it, though. If you favor performance, that’s a reasonable preference. I could make a faster version that works more like other shaders, I’m just not sure that I want to give up what I think makes this shader unique.

3 Likes

Just quicky tried your shader and I like it a lot!

I’m on a 1080p monitor so I had to set MaskType = "2.000000", other than that is good to go “out of the box”, without changing much else!

Well done! :+1:

1 Like

It sounds like it will be especially useful for Ares, which also uses our shaders but has problems with some of them due to its doubling/tripling/etc of the horizontal res on some of its cores.

Make a fast version if you like, but I will warn you that it will never be fast enough lol. Someone will always have a weaker device that they want to use it on, and if you just make it a little bit faster, it’ll be perfect, I promise :stuck_out_tongue:

1 Like

Sorry it took me long to reply, I was a little out of time.

Don’t worry much about what I said. Your shader is already good, and produces gorgeous results, definitely among the accurate ones. I’m far from being an expert, so I won’t argue, but I want to explain myself further:

  1. I have three devices (two of them with integrated GPUs) and your shader runs fullspeed in all of them. Far from being an “omg so slow” issue, it’s just there are already other works which emulate the beam dynamics of scanlines and are a bit lighter (like crt-hyllian and crt-royale-fast). I imagine their approach isn’t the same as yours, and the results different, but perhaps there’s still something that can be improved about your shader performance. If there’s not, it’s still not the end of the world and it won’t change the already beautiful picture quality it provides. It’s just that, by having a slower shader, it may (possibility) tip people into thinking “why not use guest’s instead”.

  2. I really forgot to clarify on my second report, but, yes, I was talking about the Dolphin core: your shader does not work on it, whereas many others do (be they simple or complex ones). The issue is still the same: seems like it divides the image into a 2x2 grid and zooms into the top left one. I tested on native resolution only. I also have a Linux + AMD device and Dolphin works fine there, using the AppImage version of RetroArch. Other 480i/p cores are fine indeed (like Flycast or LRPS2).

It’s the only couple of issues I really found with your shader. Don’t think I’m dissing your work, because I really like it. I’m fully aware the situation is not favorable either: there are already plenty of good crt shaders around, what more can be done? I recognize you’re fighting an uphill battle. However, I still believe you bring something interesting to the table, your shader builds upon the (classic, but outdated) GTU and it provides an excellent middleground for accuracy without a swarm of options.

Anyway, since I have a 1080p screen, feel free to ask for some tests, and I’ll do my best to execute them when I have the proper time. From a little less of two hours of testing, I can already say it’s pleasantly usable on 1080p, although I had to switch to the dynamic mask for better results.

1 Like

I was able to get Dolphin working from the Flatpak and I didn’t encounter the zooming behavior. I’ll try with some more settings and see if I can figure out what’s going on. Is it a Windows/Nvidia device where you see this problem?

This is basically what I’m aiming for. I didn’t necessarily want it to do everything, but I want what it does do to be top-notch with straightforward configuration options, with a focus on faithfulness to the way a CRT works.

I will think some more on the performance aspect and see if I can come up with anything. I think the biggest gains would come by applying the horizontal and vertical calculations for the scanlines into different passes, but that doesn’t work with the math as-is. It might be worth comparing to see if the quality difference is noticeable, though.

Thanks for testing, I appreciate it!

3 Likes

All my devices are Linux ones, although with different distributions. I will do further testing with Dolphin soon and see if the issue happens on every one of them. Luckily, I will be able to pinpoint it or blame it on some goofiness of mine.

By the way, here’s a single comparison of crt-beans against crt-hyllain and royale-fast, all shaders which emulate the beam dynamics of a CRT. The three pictures look great to me, with minor differences that boil down to personal preferences. Definitely usable already, I haven’t detected any glaring issue yet. As usual with proper scanline emulation, moire patterns happen, but it can be mitigated by adjusting the phosphor triads (the default values are fine).

Note: all shaders are using a composite LUT.

crt-beans-monitor (sRGB gamma and dynamic mask)

crt-hyllian

crt-royale-fast (using a slot mask)