A new little shader i did (glsl)

Cyber · 22 August 2025 23:15

So now we have computers writing programs for computers. What could go wrong?

DariusG · 23 August 2025 05:44

It could probably write an emulator itself 10 years later. It can’t write a program yet, not even a shader properly. Perhaps the next step is it writes an emulator and the emulator has AI and writes it’s own games lol.

Lots of things could go wrong

Cyber · 23 August 2025 06:26

Wow! What a positive take on the possible impending rise of the machines.

We’re not to far from 2029 you know.

DariusG · 23 August 2025 07:23

Probably in 10 years or 20, you’ll say write a Sega Saturn emulator. In 10 minutes it’s ready, then ask it write a Metal Gear Solid port. In 20 minutes you’ll be playing MGS on your new Saturn AI emulator.

It will replace most jobs in the planet easily after they make some robots with advanced AI. Later on it could decide humans deplete the planet resources and have to be restricted lol. This is funny but true all the same

Jamirus · 23 August 2025 11:12

It’s definitely a useful tool and there huge’s potential. Still, some of the current unreliability makes me laugh, if not wary.

The other day, I was asking about old PC sound cards, and buried in a mountain of info I was told about a PC re-release of the Last Ninja that featured Gravis Ultrasound. I’m like “Wow, I never heard about that, where did you get that from?”

AI: : Oh, it turns out there is no evidence for it at all.

DariusG · 23 August 2025 11:24

It’s more like YOU teaching it, e.g. it gave me wrong ntsc phase and after i noted it, it came back with a more correct answer. It has half knowledge or some times maybe even drag you in the wrong path. There is also a lot of false information like it will say Crash Team Racing was one of the best Saturn racing games.

Jamirus · 23 August 2025 11:44

That’s true, e.g. I was curious what It would say about aspect ratios of old systems, explicitely noting that I’m refering to the active area, so borders are accounted for. Default answers were still something about filling the 4:3 screen, so ratios are 1.33, which is of course nonsense. So then entering more info: PAL or NTSC screen, take dot clock into account, whatever.

DariusG · 6 September 2025 20:23

A small experiment i did the last days on Gaussian filtering, i believe Lanczos, while looking a lot like a CRT pixel shape, it’s too sharp. As you would expect there is a ton of tweaks in this, only pushing an Intel HD630 GPU at battery mode to 40-45% while crt-Geom pushes it to it’s limits at 90%. Gaussian looks more like a CRT when used in small amounts.

Recipe:
crt-Geom gaussian falloff scanlines (scale to non integer without trouble) 
tweaked to avoid multiple pow() that will kneel any modest GPU
Gaussian horiz. filter in tiny amounts
Quillez Vertical filter, boosted for extra sharpness 
Tiny curvature code
Tiny border-smooth code so it's a breeze for the GPU
Glow lifted from crt-consumer
Coarse/Fine mask (CGWG look-Lottes)
Tiny Slot mask code

shader:

github.com

metallic77/shaders_glsl-slang/blob/main/crt-simple.glsl

#version 110

/*
   A shader by DariusG 2025
   This program is free software; you can redistribute it and/or modify it
   under the terms of the GNU General Public License as published by the Free
   Software Foundation; either version 2 of the License, or (at your option)
   any later version.
*/
#pragma parameter A_CURV          "Curvature" 0.12 0.0 0.3 0.01
#pragma parameter A_FOCUS         "CRT Focus" 0.8 0.5 1.0 0.01
#pragma parameter SCANLINE_WEIGHT "Scanline Weight" 0.3 0.2 0.6 0.05
#pragma parameter MASK_BR         "Mask Brightness" 0.7 0.0 1.0 0.05
#pragma parameter A_MASK          "Mask Fine/Coarse" 2.0 2.0 3.0 1.0
#pragma parameter A_SLOT          "Slot Mask On/Off" 0.0 0.0 1.0 1.0
#pragma parameter A_LUM           "Luminance" 0.03 0.0 1.0 0.01
#pragma parameter A_GLOW          "Glow strength" 0.08 0.0 1.0 0.01
#pragma parameter A_SAT           "Saturation" 1.0 0.0 2.0 0.05
#pragma parameter A_NTSC_J        "NTSC-Japan Colors" 0.0 0.0 1.0 1.0

This file has been truncated. show original

preset:

shaders = "4"
feedback_pass = "0"
shader0 = "shaders_glsl/crt/shaders/crt-consumer/linearize.glsl"
alias0 = ""
wrap_mode0 = "clamp_to_border"
mipmap_input0 = "false"
filter_linear0 = "false"
float_framebuffer0 = "false"
srgb_framebuffer0 = "false"
scale_type_x0 = "source"
scale_x0 = "1.000000"
scale_type_y0 = "source"
scale_y0 = "1.000000"
shader1 = "shaders_glsl/crt/shaders/crt-consumer/glow_x.glsl"
alias1 = ""
wrap_mode1 = "clamp_to_border"
mipmap_input1 = "false"
filter_linear1 = "true"
float_framebuffer1 = "false"
srgb_framebuffer1 = "false"
scale_type_x1 = "source"
scale_x1 = "1.000000"
scale_type_y1 = "source"
scale_y1 = "1.000000"
shader2 = "shaders_glsl/crt/shaders/crt-consumer/glow_y.glsl"
alias2 = ""
wrap_mode2 = "clamp_to_border"
mipmap_input2 = "false"
filter_linear2 = "true"
float_framebuffer2 = "false"
srgb_framebuffer2 = "false"
scale_type_x2 = "source"
scale_x2 = "1.000000"
scale_type_y2 = "source"
scale_y2 = "1.000000"
shader3 = "shaders_glsl/crt/shaders/crt-simple.glsl"
alias3 = ""
wrap_mode3 = "clamp_to_border"
mipmap_input3 = "false"
filter_linear3 = "true"
float_framebuffer3 = "false"
srgb_framebuffer3 = "false"
g_in = "2.200000"
A_SLOT = "1.000000"

kokoko3k · 7 September 2025 13:52

^ Care to explain this ^ ?

DariusG · 7 September 2025 16:01

Gaussian falloff scanlines will scale to non integer well without moire etc. sin() scanlines will too, exp() used in gaussian is a bit more expensive but actually crt-geom slows down by executing a ton of pow() in texture reads and scanlines (while it could have added an initial linearize pass and avoid like 8 pow() )

Nesguy · 7 September 2025 22:39

This is cool- where did you find the data for the different systems?

As far as I know, this is the only NTSC shader that has system-specific outputs. Nice work.

Cyber · 8 September 2025 13:15

This is awesome man!

Just putting my hand up to show my interest in the slang port.

I like this approach. As a matter of fact, maybe Comb Filters should be a standard feature of NTSC Shaders.

Currently aren’t we just adjusting Artifacts, Fringing, NTSC Resolution, Chroma Resolution and Chroma Bleed sort of willy-nilly until we get something that kinda looks NTSC-ish enough to us in our heads?

Removing the artifacts which make up the signal might be less accurate than keeping the signal to spec, then using a properly simulated/emulated (not sure which term is more apt) Comb Filter to mitigate artifacts and other NTSC badness as it might have looked on a real CRT.

I know @guest.r is not about accuracy for accuracy’s sake and just simulating things in code just because “CRT’s did it this way” but it would be good to have this approach be implemented if it provided a more accurate and tangible improvement in reproducing the true look of an NTSC CRT.

Right now we’re at a sort of uncanny place where CRT Shaders can look way more than good enough for the most part so it’s not necessarily worth it to everyone to continue working harder get it looking exactly like a CRT. It would be nice if we could get there though.

DariusG · 8 September 2025 13:27

That was the outcome of a vast research on NTSC (a DEEP dive) and various other shaders examinations.

in short:

NTSC’s 3.579545 MHz subcarrier divided by the 15.734 kHz line rate gives 227.5 cycles per scanline . This means 227 in odd field and 228 in even field creating sort of 1 dot “dithering” pattern originally. NES and SNES produce 227 and 1/3 internally creating that diagonal ladder pattern.
Depending on whether you normalize to the active portion only (the VISIBLE part) (≈ ¾ of the line), you get around 170.666 cycles per visible scanline. That’s the NTSC COLOR CARRIER, but each system WON’T produce exact NTSC (3.579545 MHz) but variations, check this out

https://pineight.com/mw/page/Dot_clock_rates.xhtml

So NTSC phase multiplier will have to be vtexcoord.x * 170.666 * PI But to be system accurate (using TextureSize.x in Retroarch instead of 170.666, each system has a different horiz. resolution) we have to adjust to our emulated system dot clock. So it will be vtexcoord.x * TextureSize.x * PI *(3.579545/system_dot_clock)

This is just scratching the surface a bit lol. PI (=180 degrees out of 360) would repeat every other line in total 170.666 times if NTSC. Everything there, even the smallest detail has it’s own tricky part that you have to be aware of, even Retroarch quirks. If you want to be accurate you would have to write an NTSC for each system separately, the reason blargg did it for NES.

Also these people that designed it were absolute GENIUS fitting a color signal in a wave on such a limited bandwidth. Probably color CRT is the biggest human invention EVER. And inventing this around 100 years ago is absolutely mind boggling

A bit of Comb filter explanation here

https://crtdatabase.com/articles/decoding-240p#comb-filter

DariusG · 8 September 2025 10:49

Further info with some crude multiply out of my head:

Luma 4,2mhz bandwidth translates to like ~260 pixels resolution horizontally (4,2mhz/15,7khz). This includes non visible borders.

I channel 1,3mhz bandwidth will give us around ~80 pixels resolution

Q channel 0,5mhz will do like ~30.

GTUV50 accurately does this calculation in composite mode (afaik it doesn’t carry a true phase signal modulate/demodulate, just blurrying the image in yiq and back to rgb)

So if our game is 256 px, around 4 pixel samples will be merged in “I” and like 8 pixels will be sampled for “Q” to create 1 pixel in that channel. This 0,5 lives inside 1,3 and 1,3 lives inside 4,2 centering in 3,579545 (in higher frequency that’s why you see rainbows in sharp detail luma parts). 4,2mhz contains both luma and IQ inside it.

So when you write an ntsc shader you’ll have to do 4 pixels merge for one I and 8 for one Q with a loop or something. Luma mostly is sharp with a bit of blur, probably like add 1 pixel with 0,5 x SourceSize.z as that 260 contains all line, non visible too.

Jamirus · 8 September 2025 12:13

Does this mean that PAL handles potentially much more pixels than NTSC? I’ve read the variants go from 5-6.0 Mhz bandwidth.

DariusG · 8 September 2025 12:50

Depends on the system’s pixel clock (=horizontal resolution). PAL is sharper because it lowers vertical refresh and gains resolution instead.

Total 480 lines at 30hz (NTSC visible)

Total 576 lines at 25hz (PAL visible)

As long as you transfer the signal with a composite cable, then yes it will be restricted by the format specs. Amiga composite PAL via it’s external modulator is extremely good, i would even prefer it than RGB. Sharp as hell and also blends dither.

beans · 8 September 2025 15:19

Luma 4,2mhz bandwidth translates to like ~260 pixels resolution horizontally (4,2mhz/15,7khz). This includes non visible borders.

I think your math is off by a factor of 2. You get around 260 cycles per line, but you need two pixels to represent a cycle: one light and one dark. So you’d get approximately 520 pixels per line. After accounting for the inactive area it’s maybe more like 444.

Of course equating pixels to cycles is a little imprecise. This is kind of like an application of the sampling theorem, although that would consider samples (which have no width) instead of pixels.

DariusG · 8 September 2025 16:16

You probably mean the other way around, 2 cycles to represent a pixel. E.g. NES has a 21.47mhz clock creating 1364 cycles and 4 cycles represent 1 ppu pixel. 341 ppu pixels of which 256 are active display.

You got some link that 2 pixels represent 1 cycle (or explain how)?

beans · 8 September 2025 16:39

I mean a cycle as in the positive and negative portion of a sine wave in an analog signal, not necessarily relating to the clocks on the console. For example, luma being limited to 4.2MHz means there can be signal components up to 4.2 million cycles per second, and each cycle is both the positive and negative portion of the sine wave.

Consider an image that is alternating black and white pixels horizontally. So it’s a black pixel, then a white pixel, then a black pixel, etc. Let’s say there are 100 black pixels and 100 white pixels. That’s 100 cycles across the image but 200 total pixels.

This is also where the factor of 2 comes from in the sampling theorem. You need at least 2 samples to represent one full cycle.

I may not be explaining this very well and I can’t currently draw an illustration. Here’s an article addressing a similar question, and you can see a factor of 2 being applied:

https://www.cardinalpeak.com/blog/the-math-behind-analog-video-resolution

DariusG · 8 September 2025 17:10

Interesting thought but afaik luma isn’t transferred using a sine wave, but chroma. Dot clock is the actual resolution, e.g. atari 2600 will do ntsc 3,579 mhz and result in ~160 horizontal visible. If composite luma is up to 4,2 then it should be around my calculations.

A system can have a dot clock that is like 640 like the Amiga but the format has to support it too, so it’ll be applicable to RGB only.