I think it’s a good idea with ‘simple’ and ‘deluxe’. I would save the glow for the deluxe version to keep the ‘simple’ preset simple will your mask work with non integer scaling too? I’m interested to see it
I had the same thoughts about GTU but not the skill or knowledge to improve it like this, so this instantly became my favorite scanline implementation.
I was never able to get crt-royale or crt-guest-advanced to give me convincing performance with PC-98 games no matter what I did, but yours (after some code changes and additions on my end) I was able to get basically perfect.
I hadn’t kept up with this thread since May, but I had been adding the features I wanted to see on my own since then and kept forgetting to share.
Anyway, the version I’m happy with has:
- A simple subpixel mask implementation inspired from the Filty Pants blog post (ran out of links I can post, sorry)
- VGA line-doubling mode for DOS games.
- Constant brightness regardless of scanline width (just a scale factor of 1/(scanline width)^2; I don’t know how physically-justifiable it is).
- A semi-fixed scanline width mode that uses a reference resolution instead of matching the input resolution, so games that switch resolution don’t also switch monitors. (It scales with the square root of vertical lines, and seems to work fine even if scanlines overlap.)
- Wider resolution range (I’m sure there’s a good reason for scaling to 907 horizontal pixels, but by using half the display resolution instead I can use this shader for Darius or even SVGA.)
- Integration with the NTSC color fringing simulation from crt-guest-advanced-ntsc.
- A new shader I’ve written to combine a Famicom palette generator with crt-guest-advanced-ntsc for proper artifact colors in NES games (I should probably figure out how to contribute it over there…).
- Presets for various systems, including one that looks really good for PC-98 games
Currently at https://github.com/NonWonderDog/crt-beans/tree/nwd but I branched it in May and it’s not up-to-date with the latest yet.
Caveat is that I’m not as performance-focused, so it’s probably much slower. It works full speed with SEGA Naomi games in Flycast at 1080p on my Nvidia Shield, though, which was my performance target.
Also I had broken my dev setup on my desktop a while ago and was lax about git commits. Sorry.
Dunno what’s wrong with my Imgur account, but here’s an album on vgy:
And when I said that I couldn’t get any other shader to work on the PC-98, this is what I meant:
The text is just flat-out illegible on anything newer than an early-90’s 0.41 mm dot-pitch 31 kHz PC monitor. But with this shader (with a mask pass added), I can emulate one.
What is the resolution that you are playing at?
@md2mcb In a way this is sort of an evolution of the GTU shader. GTU was one of my inspirations for it. The lowpass filtering does basically the same thing, but the scanline simulation is completely new.
@ComfyTsu Thanks! I have to admit, the mask is frustrating me. Subpixel masks look the best (and @hunterk already defined a ton of them!) , but they rely on knowing the resolution and subpixel layout of the screen, and that subpixel arrangement being appropriate for simulating a mask. They don’t work as intended on OLEDs because they have all different subpixel arrangements, sometimes pretty weird ones! Screens seem to be increasingly going OLED. Subpixel masks also don’t work as intended in TATE mode.
The other things I’ve tried so far really struggle if I want an accurate mask, especially at low resolutions. Masks are difficult because the detail is so fine that we are sometimes near (or past) the limit of what the screen can display.
I really wanted something that could be configured easily without having to know specific details about the user’s screen.
yeah, if you want to get away from subpixel-aware masks, you’re stuck with either unpredictable subpixel behavior as the shader mask gets smaller than the monitor’s physical pixels can display, or just requiring really high resolution so you can flat-out draw the mask (megatron’s mask is like this).
@NonWonderDog, I’m glad you’ve found it useful! I still have to look through your repo more thoroughly, but I have some comments and explanations. I apologize in advance for the wall of text here.
Wider resolution range (I’m sure there’s a good reason for scaling to 907 horizontal pixels, but by using half the display resolution instead I can use this shader for Darius or even SVGA.)
Basically, there is a tradeoff here. If you go too low, there can be 2 problems:
- Not having enough samples to properly represent the bandwidth of the input (i.e. dropping below or too close to the Nyquist frequency). This can cause aliasing artifacts. I’ll skip over this for now, because it requires some math for each different output type. 907 is more than enough for 15kHz TV/monitor content.
- Not having enough samples to properly estimate the integral for the scanlines. This is the bigger problem most of the time for 15kHz TV/monitor content (where the bandwidth is quite limited) and is why the default value isn’t lower. The smaller the spot/scanline is on the screen, the more samples will be needed. Basically, some number of samples must be in the borders of the spot, and so a higher sample density is required for smaller spots.
On the other hand, if you push the number of samples too high, the performance gets worse. For each pixel, the scanline code basically looks at any nearby samples that are close enough to affect the value of this pixel. If there are more samples overall, it looks at more samples for a proper estimation. This results in more lookups and more computation for each pixel. The number of lines and the spot size affect the performance as well. More lines and smaller spot sizes mean looking at fewer samples as a proportion of the total. So performance doesn’t actually suffer much if we only scale up the sample count with the line count.
The proper value for this parameter would be something like max(1.5 * 2 * Cutoff frequency * active line time, 3.5 * line count)
. The first value avoids problem 1, and the second value avoids problem 2. The 1.5 and 3.5 factors are kind of pulled out of my hat, but seem reasonable to me. Unfortunately, we can’t do this sort of math in .slangp
files, so I set it to a reasonable number for content up to 6MHz and 240-288 lines while aiming for good performance. If we could set it dynamically, it should work for any resolution from CGA to SVGA and beyond while maintaining good performance.
The downside of using half the viewport width is that for low resolution devices (I’m thinking of the Steam deck, handhelds, and mobile phones), the sample count may be too low. For slow, high resolution devices (like a laptop or NUC connected to a 4k TV), the performance may suffer for no visible quality gain.
VGA line-doubling mode for DOS games.
This is a great idea, and I may be able to come up with a VGA preset that does line doubling like this, sets the sample count appropriately, and adjusts the filter for the VGA timings being different from NTSC/PAL.
Constant brightness regardless of scanline width (just a scale factor of 1/(scanline width)^2; I don’t know how physically-justifiable it is).
This may result in some tonality changes to the image. Basically, it may crush the highlights as pixel values get clipped. Currently, the very center of a full brightness scanline will be at full brightness (e.g., 1.0 or rgb(255)) even with smaller scanline widths. So the default is basically the brightest the image can be without changing the shape of a full brightness scanline.
I think if I wanted the image to be brighter, I would try applying some gamma function like pow(rgb, MaxSpotWidth)
. I’d have to do some more thinking to figure out what exactly would be appropriate. This will still taper off the highlights and change the tonality of the image, but it should at least avoid clipping because it still maps 1.0 to 1.0.
A semi-fixed scanline width mode that uses a reference resolution instead of matching the input resolution, so games that switch resolution don’t also switch monitors. (It scales with the square root of vertical lines, and seems to work fine even if scanlines overlap.)
I think this is an interesting idea and should make systems that can change the line count work properly.
The scanlines overlapping will cause some issues with scanline shape and image tonality. There are two issues, really:
- Currently, only the 2 nearest lines are used for finding a pixel’s value. If the scanlines overlap more, it should really consider the nearest 3 lines. If they overlap even more, it should consider the nearest 4 lines, and so on.
- When scanlines overlap more, the pixel values will increase above 1.0. Some sort of compensation needs to be done to bring the values back into range to avoid clipping.
Both of these are solvable problems if there is enough desire for this sort of configurability. There could be a substantial performance impact to using 3 lines instead of 2, though. It would basically make that part of the shader 50% slower, and that’s already the slow part.
Integration with the NTSC color fringing simulation from crt-guest-advanced-ntsc. A new shader I’ve written to combine a Famicom palette generator with crt-guest-advanced-ntsc for proper artifact colors in NES games (I should probably figure out how to contribute it over there…).
I have been thinking about NTSC simulation. There are some things that Themaister’s shaders (and guest’s, which are based on them) don’t do, if I understand the code correctly (and I may not!).
- The chroma filter is a simple notch filter instead of a comb filter, so the artifacts are often more severe and of a different character than on nicer TVs from the 90’s.
- Those shaders are also sensitive to the input resolution. The FIR filter coefficients are hard-coded based on the input resolution and I think that can be avoided, which would allow more resolutions to be used.
- Some subtleties of certain systems aren’t simulated. For example, the extra pixel that the SNES outputs every other frame, which offsets the chroma phase.
@PlainOldPants has an interesting NTSC shader, although I will admit that I don’t understand the code yet. I think a fairly simple and accurate NTSC shader (with a comb filter but without color correction) could be done in 3 passes.
Presets for various systems, including one that looks really good for PC-98 games
I noticed that you are using the NTSC shader for an S-Video preset. Do you find that to look better than just flipping the composite parameter in crt-beans and adjusting the I and Q bandwidths?
I’ll respond with my own wall of text, but basically I was just going for quick-and-dirty and artistic effect.
I figured 907 was chosen for performance/some calculation of the lowest you could get away with for a 15 kHz TV, and I didn’t test on anything less than 1080p. Just setting it to 0.5x works great even for triple monitor Darius at 5120x1440 (at least on a 2080 Super), though. It’s too bad it has to be baked into the preset, but it seems worth two presets just for that.
The VGA line-doubling is really only needed because DOSBox doesn’t do it (the PC-88/PC-98 emulators have it built-in). My code for it isn’t very clean, and it could probably be done almost entirely in the vertex shader for basically no performance cost, but it really does need to be dynamic, if just to avoid funny-looking text mode while the game loads.
The original 1/width factor causes wide gamma swings when you adjust the “minimum spot size” parameter. Setting it to 1/width^2 causes no noticeable gamma change whatsoever when you change that parameter, which seems like correct behavior and is more ergonomic even if it isn’t. It definitely crushes the highlights when you reduce the “maximum spot size”, though; there might be a way to make it more selective.
What you’re saying on overlapping scanlines makes sense, and I’m sure that doing it correctly would look different. Doing it quick and dirty already looks pretty good, though. The clipping is also fine. It’s soft-edged, so it just reads as bloom. Adding a mask also gives you more dynamic range to work with; in this screenshot the through-mask brightness is set to scale from 65% to 85% (variable mask strength was my addition to the ubiquitous subpixel mask sim), but the flare is pure white because the scanlines overlap and blow out the image. I think it looks quite nice, even if the mask isn’t terribly convincing in close-up:
(I’m actually not completely sure where in the code it lets the mask blow through like this… basically a happy accident.)
I tried and mostly failed to understand the NTSC shader, but I did confirm the lack of comb filter. I’m not sure that’s a downside, though. For one, online consensus seems to be that the NES (at least) just looks better on TVs without one. Secondly, a comb filter ultimately sacrifices temporal accuracy in favor of spacial accuracy. Since we’re adding the imperfections, that just seems unnecessary when we can just turn down the strength instead. Worst case is you spend a month writing the world’s worst motion-blur shader.
For S-Video, I didn’t actually compare back to back. Just went and looked at it, though, and sure enough setting it to YIQ mode in crt-beans does actually look a bit better (more pleasing gamma), and it’s obviously less arcane to tweak.
That’s great. GTU is a classic. I gave your shader a test, it’s shapening up to be good, although the defaults are too glowy and blurry, compared to GTU. Performance is around 20% slower, but not too shabby, considering you plan to add new features. I’ve found, however, an issue: on Dolphin, your shader zooms in the picture. Here:
I noticed this too, and I couldn’t figure out where to fix it. As far as I can tell, Dolphin presents the image to the shader after scaling it to screen size. The shader is only intended to work with an image at the source resolution, so even if whatever’s going wrong was fixed it still wouldn’t do anything useful.
All it takes is adding a shader pass that forces 640x480 absolute. I’ve added a preset to my fork (current mainline doesn’t do 480p).
@NonWonderDog I’ve had some time to look at your fork, and I originally misunderstood what you were doing with the 1/width^2 factor. Your 1/width^2 is correct, and I had made a mistake projecting the spot function onto 2 dimensions. There is actually another mistake that somewhat compensated for this but clips and crushes the highlights when the maximum spot size is reduced. I’ve fixed both and the result looks better, and usually brighter.
I also wrote a test to make sure the tonality of the image is preserved. I probably should have done this before! I generated full-screen, solid-color images of varying brightness levels, ran them through the simulator, then got the average brightness. If the maximum spot size is 1.0, they should have equal brightness as the input images (compensating for CRT gamma). This is what it looked like before, with the input brightness on the x-axis and the output brightness on the y-axis. Each image is a dot:
This is what it looks like with the fix:
So thank you for pointing out my mistake! I’ll clean up, commit, and push some time soon, as well as generate another archive to download. I’m a little short on time currently.
Oh, and I can’t test the Dolphin core because it crashes on my installation (something about not being linked properly to libbz2—I haven’t really spent any time looking into it).
For one, online consensus seems to be that the NES (at least) just looks better on TVs without one. Secondly, a comb filter ultimately sacrifices temporal accuracy in favor of spacial accuracy. Since we’re adding the imperfections, that just seems unnecessary when we can just turn down the strength instead. Worst case is you spend a month writing the world’s worst motion-blur shader.
A comb filter doesn’t necessarily have a temporal, inter-frame component. My understanding is that most comb filters were simple 2-line or 3-line filters, and later 2D adaptive filters. Only the 3D adaptive filters in the very late model TVs used anything outside the current frame (or field or whatever).
There are times when a comb filter won’t work well, though. Lots of old systems (like the Mega Drive/Genesis) didn’t shift the chroma phase between lines at all, which defeats the comb filter. Maybe an argument could be made that there isn’t much point in adding one.
I’ve updated the repository with the fixes and test. The glow is also less strong by default. I haven’t split out a basic version yet.
I’ve created a new zip file here (direct link).
Here are some quick screenshots with the default settings.