New CRT shader - Any interest?

ComfyTsu · 4 October 2024 20:03

I think it’s a good idea with ‘simple’ and ‘deluxe’. I would save the glow for the deluxe version to keep the ‘simple’ preset simple will your mask work with non integer scaling too? I’m interested to see it

NonWonderDog · 7 October 2024 04:16

I had the same thoughts about GTU but not the skill or knowledge to improve it like this, so this instantly became my favorite scanline implementation.

I was never able to get crt-royale or crt-guest-advanced to give me convincing performance with PC-98 games no matter what I did, but yours (after some code changes and additions on my end) I was able to get basically perfect.

I hadn’t kept up with this thread since May, but I had been adding the features I wanted to see on my own since then and kept forgetting to share.

Anyway, the version I’m happy with has:

A simple subpixel mask implementation inspired from the Filty Pants blog post (ran out of links I can post, sorry)
VGA line-doubling mode for DOS games.
Constant brightness regardless of scanline width (just a scale factor of 1/(scanline width)^2; I don’t know how physically-justifiable it is).
A semi-fixed scanline width mode that uses a reference resolution instead of matching the input resolution, so games that switch resolution don’t also switch monitors. (It scales with the square root of vertical lines, and seems to work fine even if scanlines overlap.)
Wider resolution range (I’m sure there’s a good reason for scaling to 907 horizontal pixels, but by using half the display resolution instead I can use this shader for Darius or even SVGA.)
Integration with the NTSC color fringing simulation from crt-guest-advanced-ntsc.
A new shader I’ve written to combine a Famicom palette generator with crt-guest-advanced-ntsc for proper artifact colors in NES games (I should probably figure out how to contribute it over there…).
Presets for various systems, including one that looks really good for PC-98 games

More examples here. (apparently I’m having problems with my Imgur account? hopefully it shows up at some point, or I’ll have to delete and recreate my account I guess.)

Currently at https://github.com/NonWonderDog/crt-beans/tree/nwd but I branched it in May and it’s not up-to-date with the latest yet.

Caveat is that I’m not as performance-focused, so it’s probably much slower. It works full speed with SEGA Naomi games in Flycast at 1080p on my Nvidia Shield, though, which was my performance target.

Also I had broken my dev setup on my desktop a while ago and was lax about git commits. Sorry.

NonWonderDog · 9 October 2024 02:57

Dunno what’s wrong with my Imgur account, but here’s an album on vgy:

And when I said that I couldn’t get any other shader to work on the PC-98, this is what I meant:

The text is just flat-out illegible on anything newer than an early-90’s 0.41 mm dot-pitch 31 kHz PC monitor. But with this shader (with a mask pass added), I can emulate one.

yuzqailo · 9 October 2024 16:55

What is the resolution that you are playing at?

beans · 10 October 2024 00:34

@md2mcb In a way this is sort of an evolution of the GTU shader. GTU was one of my inspirations for it. The lowpass filtering does basically the same thing, but the scanline simulation is completely new.

beans · 10 October 2024 00:42

@ComfyTsu Thanks! I have to admit, the mask is frustrating me. Subpixel masks look the best (and @hunterk already defined a ton of them!) , but they rely on knowing the resolution and subpixel layout of the screen, and that subpixel arrangement being appropriate for simulating a mask. They don’t work as intended on OLEDs because they have all different subpixel arrangements, sometimes pretty weird ones! Screens seem to be increasingly going OLED. Subpixel masks also don’t work as intended in TATE mode.

The other things I’ve tried so far really struggle if I want an accurate mask, especially at low resolutions. Masks are difficult because the detail is so fine that we are sometimes near (or past) the limit of what the screen can display.

I really wanted something that could be configured easily without having to know specific details about the user’s screen.

hunterk · 10 October 2024 00:47

yeah, if you want to get away from subpixel-aware masks, you’re stuck with either unpredictable subpixel behavior as the shader mask gets smaller than the monitor’s physical pixels can display, or just requiring really high resolution so you can flat-out draw the mask (megatron’s mask is like this).

beans · 10 October 2024 00:57

@NonWonderDog, I’m glad you’ve found it useful! I still have to look through your repo more thoroughly, but I have some comments and explanations. I apologize in advance for the wall of text here.

Wider resolution range (I’m sure there’s a good reason for scaling to 907 horizontal pixels, but by using half the display resolution instead I can use this shader for Darius or even SVGA.)

Basically, there is a tradeoff here. If you go too low, there can be 2 problems:

Not having enough samples to properly represent the bandwidth of the input (i.e. dropping below or too close to the Nyquist frequency). This can cause aliasing artifacts. I’ll skip over this for now, because it requires some math for each different output type. 907 is more than enough for 15kHz TV/monitor content.
Not having enough samples to properly estimate the integral for the scanlines. This is the bigger problem most of the time for 15kHz TV/monitor content (where the bandwidth is quite limited) and is why the default value isn’t lower. The smaller the spot/scanline is on the screen, the more samples will be needed. Basically, some number of samples must be in the borders of the spot, and so a higher sample density is required for smaller spots.

On the other hand, if you push the number of samples too high, the performance gets worse. For each pixel, the scanline code basically looks at any nearby samples that are close enough to affect the value of this pixel. If there are more samples overall, it looks at more samples for a proper estimation. This results in more lookups and more computation for each pixel. The number of lines and the spot size affect the performance as well. More lines and smaller spot sizes mean looking at fewer samples as a proportion of the total. So performance doesn’t actually suffer much if we only scale up the sample count with the line count.

The proper value for this parameter would be something like max(1.5 * 2 * Cutoff frequency * active line time, 3.5 * line count). The first value avoids problem 1, and the second value avoids problem 2. The 1.5 and 3.5 factors are kind of pulled out of my hat, but seem reasonable to me. Unfortunately, we can’t do this sort of math in .slangp files, so I set it to a reasonable number for content up to 6MHz and 240-288 lines while aiming for good performance. If we could set it dynamically, it should work for any resolution from CGA to SVGA and beyond while maintaining good performance.

The downside of using half the viewport width is that for low resolution devices (I’m thinking of the Steam deck, handhelds, and mobile phones), the sample count may be too low. For slow, high resolution devices (like a laptop or NUC connected to a 4k TV), the performance may suffer for no visible quality gain.

VGA line-doubling mode for DOS games.

This is a great idea, and I may be able to come up with a VGA preset that does line doubling like this, sets the sample count appropriately, and adjusts the filter for the VGA timings being different from NTSC/PAL.

Constant brightness regardless of scanline width (just a scale factor of 1/(scanline width)^2; I don’t know how physically-justifiable it is).

This may result in some tonality changes to the image. Basically, it may crush the highlights as pixel values get clipped. Currently, the very center of a full brightness scanline will be at full brightness (e.g., 1.0 or rgb(255)) even with smaller scanline widths. So the default is basically the brightest the image can be without changing the shape of a full brightness scanline.

I think if I wanted the image to be brighter, I would try applying some gamma function like pow(rgb, MaxSpotWidth). I’d have to do some more thinking to figure out what exactly would be appropriate. This will still taper off the highlights and change the tonality of the image, but it should at least avoid clipping because it still maps 1.0 to 1.0.

A semi-fixed scanline width mode that uses a reference resolution instead of matching the input resolution, so games that switch resolution don’t also switch monitors. (It scales with the square root of vertical lines, and seems to work fine even if scanlines overlap.)

I think this is an interesting idea and should make systems that can change the line count work properly.

The scanlines overlapping will cause some issues with scanline shape and image tonality. There are two issues, really:

Currently, only the 2 nearest lines are used for finding a pixel’s value. If the scanlines overlap more, it should really consider the nearest 3 lines. If they overlap even more, it should consider the nearest 4 lines, and so on.
When scanlines overlap more, the pixel values will increase above 1.0. Some sort of compensation needs to be done to bring the values back into range to avoid clipping.

Both of these are solvable problems if there is enough desire for this sort of configurability. There could be a substantial performance impact to using 3 lines instead of 2, though. It would basically make that part of the shader 50% slower, and that’s already the slow part.

Integration with the NTSC color fringing simulation from crt-guest-advanced-ntsc. A new shader I’ve written to combine a Famicom palette generator with crt-guest-advanced-ntsc for proper artifact colors in NES games (I should probably figure out how to contribute it over there…).

I have been thinking about NTSC simulation. There are some things that Themaister’s shaders (and guest’s, which are based on them) don’t do, if I understand the code correctly (and I may not!).

The chroma filter is a simple notch filter instead of a comb filter, so the artifacts are often more severe and of a different character than on nicer TVs from the 90’s.
Those shaders are also sensitive to the input resolution. The FIR filter coefficients are hard-coded based on the input resolution and I think that can be avoided, which would allow more resolutions to be used.
Some subtleties of certain systems aren’t simulated. For example, the extra pixel that the SNES outputs every other frame, which offsets the chroma phase.

@PlainOldPants has an interesting NTSC shader, although I will admit that I don’t understand the code yet. I think a fairly simple and accurate NTSC shader (with a comb filter but without color correction) could be done in 3 passes.

Presets for various systems, including one that looks really good for PC-98 games

I noticed that you are using the NTSC shader for an S-Video preset. Do you find that to look better than just flipping the composite parameter in crt-beans and adjusting the I and Q bandwidths?

NonWonderDog · 10 October 2024 04:24

I’ll respond with my own wall of text, but basically I was just going for quick-and-dirty and artistic effect.

I figured 907 was chosen for performance/some calculation of the lowest you could get away with for a 15 kHz TV, and I didn’t test on anything less than 1080p. Just setting it to 0.5x works great even for triple monitor Darius at 5120x1440 (at least on a 2080 Super), though. It’s too bad it has to be baked into the preset, but it seems worth two presets just for that.

The VGA line-doubling is really only needed because DOSBox doesn’t do it (the PC-88/PC-98 emulators have it built-in). My code for it isn’t very clean, and it could probably be done almost entirely in the vertex shader for basically no performance cost, but it really does need to be dynamic, if just to avoid funny-looking text mode while the game loads.

The original 1/width factor causes wide gamma swings when you adjust the “minimum spot size” parameter. Setting it to 1/width^2 causes no noticeable gamma change whatsoever when you change that parameter, which seems like correct behavior and is more ergonomic even if it isn’t. It definitely crushes the highlights when you reduce the “maximum spot size”, though; there might be a way to make it more selective.

What you’re saying on overlapping scanlines makes sense, and I’m sure that doing it correctly would look different. Doing it quick and dirty already looks pretty good, though. The clipping is also fine. It’s soft-edged, so it just reads as bloom. Adding a mask also gives you more dynamic range to work with; in this screenshot the through-mask brightness is set to scale from 65% to 85% (variable mask strength was my addition to the ubiquitous subpixel mask sim), but the flare is pure white because the scanlines overlap and blow out the image. I think it looks quite nice, even if the mask isn’t terribly convincing in close-up:

(I’m actually not completely sure where in the code it lets the mask blow through like this… basically a happy accident.)

I tried and mostly failed to understand the NTSC shader, but I did confirm the lack of comb filter. I’m not sure that’s a downside, though. For one, online consensus seems to be that the NES (at least) just looks better on TVs without one. Secondly, a comb filter ultimately sacrifices temporal accuracy in favor of spacial accuracy. Since we’re adding the imperfections, that just seems unnecessary when we can just turn down the strength instead. Worst case is you spend a month writing the world’s worst motion-blur shader.

For S-Video, I didn’t actually compare back to back. Just went and looked at it, though, and sure enough setting it to YIQ mode in crt-beans does actually look a bit better (more pleasing gamma), and it’s obviously less arcane to tweak.

md2mcb · 10 October 2024 03:59

That’s great. GTU is a classic. I gave your shader a test, it’s shapening up to be good, although the defaults are too glowy and blurry, compared to GTU. Performance is around 20% slower, but not too shabby, considering you plan to add new features. I’ve found, however, an issue: on Dolphin, your shader zooms in the picture. Here:

NonWonderDog · 10 October 2024 04:06

I noticed this too, and I couldn’t figure out where to fix it. As far as I can tell, Dolphin presents the image to the shader after scaling it to screen size. The shader is only intended to work with an image at the source resolution, so even if whatever’s going wrong was fixed it still wouldn’t do anything useful.

NonWonderDog · 11 October 2024 23:34

All it takes is adding a shader pass that forces 640x480 absolute. I’ve added a preset to my fork (current mainline doesn’t do 480p).

beans · 12 October 2024 00:35

@NonWonderDog I’ve had some time to look at your fork, and I originally misunderstood what you were doing with the 1/width^2 factor. Your 1/width^2 is correct, and I had made a mistake projecting the spot function onto 2 dimensions. There is actually another mistake that somewhat compensated for this but clips and crushes the highlights when the maximum spot size is reduced. I’ve fixed both and the result looks better, and usually brighter.

I also wrote a test to make sure the tonality of the image is preserved. I probably should have done this before! I generated full-screen, solid-color images of varying brightness levels, ran them through the simulator, then got the average brightness. If the maximum spot size is 1.0, they should have equal brightness as the input images (compensating for CRT gamma). This is what it looked like before, with the input brightness on the x-axis and the output brightness on the y-axis. Each image is a dot:

incorrect

This is what it looks like with the fix:

fixed

So thank you for pointing out my mistake! I’ll clean up, commit, and push some time soon, as well as generate another archive to download. I’m a little short on time currently.

Oh, and I can’t test the Dolphin core because it crashes on my installation (something about not being linked properly to libbz2—I haven’t really spent any time looking into it).

beans · 12 October 2024 02:16

For one, online consensus seems to be that the NES (at least) just looks better on TVs without one. Secondly, a comb filter ultimately sacrifices temporal accuracy in favor of spacial accuracy. Since we’re adding the imperfections, that just seems unnecessary when we can just turn down the strength instead. Worst case is you spend a month writing the world’s worst motion-blur shader.

A comb filter doesn’t necessarily have a temporal, inter-frame component. My understanding is that most comb filters were simple 2-line or 3-line filters, and later 2D adaptive filters. Only the 3D adaptive filters in the very late model TVs used anything outside the current frame (or field or whatever).

There are times when a comb filter won’t work well, though. Lots of old systems (like the Mega Drive/Genesis) didn’t shift the chroma phase between lines at all, which defeats the comb filter. Maybe an argument could be made that there isn’t much point in adding one.

beans · 13 October 2024 01:56

I’ve updated the repository with the fixes and test. The glow is also less strong by default. I haven’t split out a basic version yet.

I’ve created a new zip file here (direct link).

Here are some quick screenshots with the default settings.

beans · 2 February 2025 20:50

I’ve been experimenting with masks. I have some subpixel masks implemented in the shader (not updated on GitHub yet).

One of the main issues with masks is keeping the brightness high enough. What I settled on doing was pretty simple: let each pixel get as bright as it could with the mask at full strength, and then blend in the original, un-masked image if it needs to get brighter than the mask would allow. This allows masking with no brightness penalty, although brighter sections of the image will have a weaker mask pattern. It also preserves the tonality of the image, so I don’t have to resort to things like messing with the gamma or raising the midtones.

Here are some images with an aperture grille at three different resolutions. I recommend viewing them at their original size or the mask will look messed up.

EDIT: I should mention that these images are for RGB subpixels. BGR is also easy to support. OLED subpixel structures are… complicated.

kokoko3k · 3 February 2025 05:14

Very nice looking!

I did something like thay by pushing the input signal gain to 3x to fully exploit rgb mask, and then help it by adding the original color.

However I found that the display itself has to be very well calibrated to support this theory behind, otherwise you get evident discontinuity (seen with a simple bw gradient) when starting to add the “helper” color to the mask.

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

beans · 3 February 2025 14:51

@kokoko3k, Thanks! I remember you talking about that on the Discord (and I sent an image of my own experiment).

Here’s what I ended up doing:

vec3 mask(vec3 pixel_value) {
    // I removed some overscan stuff here to make it simpler to read...
    vec3 mask = vec3(1.0);
    float mask_coverage = 1.0;
    if (params.SubpixelMaskPattern == 2.0) {  // <=1080p
        vec3[2] mask_tile = { vec3(1, 0, 1), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 2.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 3.0) {  // 1080p/1440p
        vec3[3] mask_tile = { vec3(0, 0, 1), vec3(0, 1, 0), vec3(1, 0, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 3.0))];
        mask_coverage = 3.0;
    } else if (params.SubpixelMaskPattern == 4.0) {  // 1440p/4k
        vec3[4] mask_tile = { vec3(1, 0, 0), vec3(1, 1, 0), vec3(0, 1, 1), vec3(0, 0, 1) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 4.0))];
        mask_coverage = 2.0;
    } else if (params.SubpixelMaskPattern == 5.0) {  // 4k, lower TVL
        vec3[5] mask_tile = { vec3(1, 0, 0), vec3(1, 0, 1), vec3(0, 0, 1), vec3(0, 1, 0), vec3(0, 1, 0) };
        mask = mask_tile[int(mod(floor(viewport_x_coord * params.OutputSize.x), 5.0))];
        mask_coverage = 15.0 / 6.0;
    }
    // For BGR subpixel arrangement, just transpose red and blue.

    // Piecewise phase-in. The original image starts phasing in only when we've
    // maxed the brightness we can get from our masked image.
    float s = mask_coverage / (mask_coverage - 1.0);
    vec3 weight = clamp(-s * pixel_value + s, 0.0, 1.0);
    return pixel_value * ((1 - weight) + mask_coverage * mask * weight);
}

In practice, I don’t really notice the discontinuity. If the area where the original image starts to phase in is noticeable, I had a backup plan:

// Cubic phase-in. Ramps up the mask to higher strength faster than linear 
// but has no discontinuity like piecewise.
float s = mask_coverage / (mask_coverage - 1);
float a = -s + 2.0;
float b = s - 3.0;
vec3 weight = a * pow(pixel_value, 3.0) + b * pow(pixel_value, 2.0) + 1.0;
return pixel_value * ((1 - weight) + mask_coverage * mask * weight);

This looks pretty similar but has no discontinuity. Here’s the graph of the mask strength (on the y-axis) and the pixel value (on the x-axis) for mask_coverage = 3. Red is the piecewise phase-in and blue is the cubic.

And here’s a gradient with the piecewise phase-in on the top and the cubic on the bottom. It’s 4k and I recommend viewing at native resolution (when it is rescaled you can definitely notice the discontinuity).

But I think your implementation differs or maybe you didn’t aimed for full brightness, as I see that the white shirt retains some of the mask (?).

There are two reasons the white shirt retains some mask. One is that Genesis Plus GX doesn’t actually output full-scale white values. The other is that this is done per-pixel, and the scanline is brightest in the middle, so that’s the part that gets blown out first. The mask is retained at the edges. I was using a max spot width of 0.9, so the scanlines don’t completely touch. That does reduce the brightness slightly, but it’s the scanlines reducing the brightness, not the mask. At a max spot width of 1.0, they’d merge if the values on those lines were full-scale white and the mask would not be visible. Hopefully that makes sense.

kokoko3k · 3 February 2025 16:36

Yup, I remember (now )

By manually altering the gamma of my display, the upper gradient slowly starts to show the trick, while, as expected, the lower one is much more robust.

I’d been curious to see how a simpler RGB mask performs at 1080p to compare to mine.

Btw great achivement, congrats!

beans · 4 February 2025 01:15

Here are the same gradients with a BGR, 3-pixel mask at 1080p. I prefer BGR for RGB panels because it leaves more of a gap between each phosphor triad, even if it does switch blue and red. This is generated for true sRGB monitors (with the linear portion in the darker region, not simple 2.2 gamma).

The 1080p Streets of Rage screenshot above is the same 3-pixel mask.