New CRT shader - Any interest?

beans · 15 March 2024 21:29

Hi!

I’ve generally used crt-lottes, crt-lottes-fast, or occasionally crt-royale. I have also been interested in the ideas behind GTU, though I thought it never quite looked right (more on that later). With that in mind, I thought it would be interesting to write my own shader.

This shader has a similar format to GTU.

The first pass transforms the color space to CRT gamma (and optionally to YIQ for simulating composite). The input is assumed to be sRGB. This may or may not actually be the case. I assume some cores output in simple 2.2 gamma and some might output in the original system gamma, but I haven’t looked at the code to check.

The second pass low-pass filters the signal to simulate the bandwidth limitations of analog video. This uses a continuous-time FIR filter with a raised cosine kernel. After I sat down and did the math, I realized this was the same way GTU does its filtering. I think the only real difference here is that the filtering happens in the CRT gamma space (in either RGB or YIQ), and then the result is transformed to linear RGB. The output is a series of samples from the band-limited signal.

I should note that, like in GTU, this is not a full NTSC simulation with the “Composite” parameter turned on. The bandwidth limitations are simulated without the effects of trying to separate the chroma from the luma. In this way it is really more like S-Video (or a very good comb filter) and won’t result in dot crawl or color fringing artifacts. With a little work, it is probably possible to replace the first 2 passes with the ntsc-adaptive shader for truer NTSC composite simulation.

The third pass draws scanlines. One easily visible difference here is that the scanlines are drawn and blended in linear RGB space. Many other shaders seem to do this (e.g. the lottes shaders), but GTU handles scanlines in gamma space. I believe this is part of why I find that GTU looks a little “off.”

The scanline handling is also completely different. The way scanlines are drawn is by simulating the scanning of a roughly circular spot (or really three, one for each color) across the screen, varying in brightness and width according to the band-limited input signal. In other words, the same way CRTs work! There are some simplifications for mathematical convenience and in pursuit of my other goals. I think this is one of the most interesting parts of the shader, and I will try to explain the math later.

The system requirements are higher than something like crt-easymode, and probably a bit lower than something like crt-royale or crt-guest. There is a lot more math in the scanline simulation than some of the faster shaders. It runs easily in 4k on my RX 5500 XT. This is the first shader I have ever written, and there is probably some space for performance improvements. To be honest, I’m not sure how to go about profiling a shader, which is how I would normally start performance optimization.

I had a few goals in developing this shader:

Approximate brightness preservation As long as the maximum spot size is 1.0, the image overall should not be significantly darker than the input image. Thinner scanlines in darker areas are brighter at their peak to make up for the brightness lost in the dark areas between them (as I believe they would be on a real CRT). There are no “tricks” to boost the brightness like raising the midtones, which would alter the tonality of the image.

Lower values for the maximum spot size will result in darkening proportional to the value chosen.
Consistent sharpness regardless of horizontal input resolution Some shaders get much sharper when the horizontal resolution of the input increases. This shader retains a consistent sharpness. For example, imagine stretching a 256 pixel wide input image to 512 pixels by duplicating every pixel horizontally. The output should look exactly the same.

CRTs did not have pixels (as I’m sure everyone in this forum is aware), and sharpness was dictated only by the bandwidth of the signal, the size of the spot, and the pitch of the mask.

This is especially important for systems that have multiple output resolutions, or games that switch resolutions for different screens (e.g. menus and cut scenes). This way the settings can be set once and a consistent look can be achieved.
No clipping There should be no clipping of pixel values beyond minor floating point errors or possibly out-of-gamut colors from the YIQ to RGB transform. The filter has no negative lobes and the scanline simulation should not result in values above 1.0.

This should help preserve the tonality of the image. Highlights and shadows should not be lost.
Simple, limited parameters I wanted something that was easily customizable but not overwhelming. Hopefully that is what I ended up with!

Parameters

Composite Uses the NTSC YIQ color space for filtering when set to 1.0, and RGB otherwise.
RGB/Y bandwidth (MHz), I bandwidth (MHz), and Q bandwidth (MHz) The -6dB cutoff frequencies of each signal component. Between 4 to 6 MHz seems about right for RGB bandwidth. Y bandwidth for S-Video would be similar. Y bandwidth for composite would vary depending on the quality of the filter in the television and the quality of the source (the Mega Drive/Genesis had a notoriously poor composite output). I and Q bandwidth should probably be around 0.6 MHz. Technically, I bandwidth should be limited to 1.6 MHz, but in practice it seems that most televisions disregarded the extra bandwidth because it would require an extra filter and delay lines to implement proper filtering. Composite connections might theoretically be allowed to use 1.6 MHz for each of I and Q (see SMPTE 170M), but I’m not sure if any systems or televisions used that outside of studio applications.
Maximum spot size The maximum size of the CRT spot, relative to the scanline height. A setting of 1.0 will completely merge scanlines at full brightness.
Minimum spot size The minimum size of the CRT spot, relative to the maximum size. A setting of 1.0 will result in scanlines that do not vary with brightness of the signal. If the maximum and minimum are both set to 1.0, all scanlines will be merged.
Samples per line This isn’t an actual exposed parameter, but you can see the value in the scale_x1 field in the slangp file. This is the number of samples generated by the output of the filter pass. Values that are too low (i.e. below Nyquist plus some headroom for filter rolloff) will result in aliasing and degrade the output. This also dictates the number of samples that can be used to estimate the integral in the scanline pass, so higher values will result in a better estimation. Values that are very high will impact performance by requiring more executions in the filter pass and more iterations in the scanline pass.

I ran a quick experiment and found that values between 900 and 950 performed well (generally requiring only 7 iterations in the scanline pass) and also looked good. When comparing against outputs created with 2880 samples, PSNR was over 50 dB and the SSIMULACRA2 score was around 100. I don’t think anybody will notice a difference in normal use. The image really starts to degrade below 800 or so.

The odd value of 907 is my (probably misguided) attempt to avoid something like interference patterns by picking a prime number.

Future improvements

If there is interest, here are some things I would like to add, roughly in the order that I would probably implement them:

Interlacing support This is especially important for games that switch between progressive and interlaced for menus or cut scenes. Currently interlaced content won’t look correct. When the vertical resolution is higher, it will just look like there are more, smaller scanlines.
Overscan Some games and systems rendered garbage in the overscan area and it would be nice to be able to cut that off.
Curvature I need to think through the math on this.
Glow The glow around bright areas of the screen as the light passes through the CRT glass. I believe this can be done fairly efficiently as a separable gaussian blur using mipmaps. Someone has probably done this already in another shader.
Mask simulation I have this last because I have not been happy with most approaches. In other shaders, I often just turn mask simulation off when I can.

The subpixel masks are difficult to configure, get vastly different results on different display resolutions, won’t work on some displays, and greatly affect brightness. I would like a particular shader configuration to look consistent across all display types.

Tiling a mask texture across the screen and resizing can achieve consistency, but doesn’t work well at low resolutions and still requires tricks to boost the brightness back up.

Still, I think mask simulation is important. I’ve been curious how it is done in the koko-aio shader but have not dug into that yet.

Things I’m not really interested in adding (unless convinced otherwise):

Simulating flaws or misconfiguration, such as deconvergence, raised black levels, poor focus at the screen edges, etc. I know CRTs often had these issues, but I don’t personally feel like they are important for replicating the experience. I would rather simulate an ideal, properly configured CRT. Besides, these sorts of things can really blow up the parameter count.
Simulating the color space of the CRT phosphors Mostly, I think this is too inaccurate in the sRGB color space (which I am targeting) because you will have to deal with clipping or mapping the colors into the gamut. It would probably be more practical on an HDR display with DCI-P3 or Rec 2020 coverage. In any case, the sRGB primaries are relatively close to the SMPTE C and EBU primaries. Considering the disregard most CRT televisions seem to have had for standardized color, I consider it close enough.
Temporal effects like ghosting due to phosphor persistence I don’t think that phosphor decay time was long enough to bleed into the next frame, at least on later CRTs. See the Slow Mo Guys’ videos of CRTs.

Screenshots

Anyway, sorry for writing so much above. Here are some screenshots!

Simulating a decent, late model CRT connected with component or RGB:

Composite = 0.0
RGB/Y bandwidth (MHz) = 5.0
I bandwidth (MHz) = 0.6
Q Bandwidth (MHz) = 0.6
Maximum spot size = 1.0
Minimum spot size = 0.5

Simulating a similar CRT, but through terrible Mega Drive/Genesis composite (notice the blending of the drop shadows):

Composite = 1.0
RGB/Y bandwidth (MHz) = 1.6
I bandwidth (MHz) = 0.6
Q Bandwidth (MHz) = 0.6
Maximum spot size = 1.0
Minimum spot size = 0.5

Simulating something with more pronounced scanlines, like a PVM or broadcast monitor, connected through RGB:

Composite = 0.0
RGB/Y bandwidth (MHz) = 5.0
I bandwidth (MHz) = 0.6
Q Bandwidth (MHz) = 0.6
Maximum spot size = 0.75
Minimum spot size = 0.3

And for fun, here are some high resolution renderings using my Python proof of concept code, with masks overlaid:

hunterk · 15 March 2024 21:32

Of course! There’s always room for more CRT shaders! Those masked shots at the bottom look great. All of them do, actually. Good stuff

I, too, am a big fan of GTU, so i’m glad to see someone else exploring those same ideas.

beans · 27 March 2024 22:10

Thanks!

I have added basic interlacing support. Interlacing seems pretty fussy to get right. It requires the cores to output weave-deinterlaced video and then it will reinterlace it. There is a parameter to flip whether the even or odd field should be shown on even or odd FrameCounts. I believe this is how some of the other shaders do it.

It would be convenient if there was a uniform to indicate whether even or odd fields were supposed to be displayed, but I suppose that would require support in all of the cores. Hopefully this way works well enough. Is there a better way that I’m missing?

I’ve also written up a description of how the scanlines are drawn. It doesn’t seem like the math plugin is enabled on this forum so I’ve used a gist: https://gist.github.com/aduffey/d7468b8068d6124641ff0762c2b373e8

hunterk · 27 March 2024 22:13

That’s how the others handle interlacing, as well, yeah.

EDIT: I just skimmed the scanline writeup and it looks great. I’ll try to really dig into it soon. At first glance, this looks very similar to aliaspider’s thought process when he was designing GTU, as well

ComfyTsu · 4 April 2024 19:48

Just gave it a try and it works quite well with lower resolutions and looks really nice. Nice work

beans · 6 April 2024 03:05

Thanks for trying it! Is that on a Steam Deck? I generated a few test images at 800 pixels tall and thought they looked pretty good as long as I didn’t zoom in too far. I don’t have a Steam Deck to test with personally, though.

Very thin scanlines suffer from not having enough pixels to work with, so it can make sense to be conservative with the minimum scanline width. Supersampling antialiasing would help, but I’m not sure the performance costs justify the quality gains.

ComfyTsu · 6 April 2024 05:33

Yes it’s on a Deck. At a normal viewing distance it looks really nice with the standard and composite settings. Lowering the max/min spot size will break the image though but that’s expected with that low resolution

beans · 19 April 2024 01:48

I’ve cleaned up the code a bit and made a few additional changes. There’s a zip file here.

Configurable input gamma I looked through the source of a few emulators and it seemed like most output in the original system gamma (i.e. they did not correct for the different gamma of modern displays compared to CRT TVs). I added a configuration option for different input gammas and defaulted it to the original system gamma.
Fix undefined behavior I was previously relying on undefined behavior when using texelFetch with out-of-bounds coordinates.
Performance improvements Some code was restructured to reduce register usage. I also switched to using a cubic function as the spot model. The raised cosine was more expensive to compute. I could just barely get the shader running in 4K at 60 FPS on my Ryzen 7700X’s integrated graphics, but the margin is too close for comfort and I had to reduce the maximum spot size (smaller spot sizes are slightly faster). Lower resolutions should be fine on integrated graphics and 4K should work on decent APUs or discrete GPUs.

Additionally, I wanted to highlight one advantage of this way of doing scanlines. I said in a previous post:

Consistent sharpness regardless of horizontal input resolution Some shaders get much sharper when the horizontal resolution of the input increases. This shader retains a consistent sharpness. For example, imagine stretching a 256 pixel wide input image to 512 pixels by duplicating every pixel horizontally. The output should look exactly the same.

While I was testing, I made a visual demonstration of this. I generated two checkerboard patterns: a 320x240 single pixel checkerboard, and a 640x240 checkerboard with pixels doubled horizontally. Like the following two images (cropped and scaled up 20x so you can see the details):

checkerboard_double-crop checkerboard_pixel-crop

I ran both checkerboards through this shader and another popular shader with the aspect ratio fixed to 4:3. Crops from each shader are below, scaled up 16x. The top is the single pixel checkerboard, and the bottom is the double pixel checkerboard.

With this shader they both look the same, but with the other shader the pixel boundaries are sharper on the double pixel checkerboard. I think the resolution-invariant behavior is “correct” from the standpoint of what a CRT would look like. The CRT sees only the level changes in the input signal, not actual pixels.

I think resolution-invariant behavior is useful for a few reasons:

Consistency of configuration You can configure the shader once and use it on systems having different horizontal resolutions, and it will look like the same CRT.
Games that change resolution Some games change resolution for menu screens, cut scenes, etc. This was especially common in the fifth generation consoles. If pixel transitions suddenly get sharper, it would look like the console was quickly plugged into a different CRT in the middle of the game.
Some systems change resolution during a frame To deal with this, emulators usually scale each portion of the frame to the least common multiple. This is possible on the SNES, I believe, though maybe only used in A.S.P. Air Stike Patrol. It was more commonly done on the Amiga, where you might have text at a higher resolution in the same frame as graphics at a lower resolution. At these higher resolutions many shaders look too sharp.

hunterk · 19 April 2024 02:00

oh yeah, keeping a consistent look regardless of resolution is a great advantage. A common game where it matters is Secret of Mana on SNES, which doubles the horizontal res when text boxes are on-screen.

DariusG · 21 April 2024 06:25

Looks good, i would suggest to check the code and sharpen vertically to better match an actual CRT. It’s almost there horizontally. But still looks really good anyway.

crt, notice there are no edges on pixels, they are curved, shader is pretty close. Shot from Retroarch to CRT 20" with crt-emudriver on a Radeon card on a PC i have, 2560x240

beans · 21 April 2024 22:41

Thank you for your input, DariusG! Could you explain what you mean about vertical sharpening? As far as I know, CRTs didn’t do any vertical sharpening at least until the later digital era.

beans · 22 April 2024 02:51

For reference, here’s a quick attempt at the upscale-test image with a little bit of glow and a mask applied over the top. This is essentially the same scanline-drawing method as the shader, just in Python/Taichi so that I can prototype more easily and generate images at arbitrary resolutions.

The lines aren’t as thick (the lines on your CRT are quite thick!) and there is obviously no deconvergence. Since it isn’t a photograph, there is also no default tone curve applied, so it’s a bit darker and the highlights aren’t rolled off. The white balance is also different. I probably should have reduced the RGB bandwidth a bit as well.

DariusG · 22 April 2024 16:50

That looks close enough, yeah that CRT has a bit of deconvergence too.