it’s glsl for the moment. At crt folder
There are some devices, old phones etc that have a 480p resolution, let’s see which glsl shaders work well with that screen (eg scanlines appear normally) at 2x scaling
In bold the ones most likely these devices could run
crt-aperture
crt-nes-mini
crt-potato-warm
fake-crt-geom-potato
fake-crt-geom
fakelottes no mask no curvature
grits
phosphorlut
zfast-composite
i have such an old phone around that i tweak here and there just to see what can be done and created this. Mask is not actually needed on such a screen
Watch out for sign(), it is doggy slow on my igp.
Other trick you can try is dy=(dy * dy)*(dy * dy) that may trig parallel computations.
Thanks i will check that! zfast avoids that by multiplying 3 times Actually is even smarter using a “MAD-mutiply and add” for defining position. Smart coding there, knowing what he does.
yeah, misread the code, sorry for that.
But you can still do (dy * dy) * (dy * 8), it should be equivalent to the specific code; or, generally speaking, you could put several variables into a vecx() and do operations versus another one and there are good chances they will be executed in parallel.
Also, I see you count cycles even in vertex shader; that isn’t really important, since they are executed much, less times (one per vertex) than the fragment shader (one per ‘pixel’), so that the more things you put in there (nothing that needs granularity ofc), the more you gain.
Just yesterday i benchmarked dozens of native functions (that’s where i discovered sign()'s slowness ).
Jump in #programming-shaders on discord’s retroarch server if you wish, coding for slow machines is so much fun
Yeah that’s really fun, trying to squeeze as much as you can
Reduced to 7 cycles with this, pixel is 1.0/TextureSize.x executed in vertex. Filter nearest
vec3 color1 = 0.50*COMPAT_TEXTURE(Source,vTexCoord).rgb;
vec3 color2 = 0.25*COMPAT_TEXTURE(Source,vTexCoord + vec2(pixel*blur,0.0)).rgb;
vec3 color3 = 0.25*COMPAT_TEXTURE(Source,vTexCoord - vec2(pixel*blur,0.0)).rgb;
vec3 color = (color1 + color2 + color3);
Reduced to 3 and gamma fixed too EDIT: actually 5 cycles
vec3 color1 = 0.75*COMPAT_TEXTURE(Source,vTexCoord).rgb;
vec3 color2 = COMPAT_TEXTURE(Source,vTexCoord + vec2(pixel*blur,0.0)).rgb;
vec3 color3 = COMPAT_TEXTURE(Source,vTexCoord - vec2(pixel*blur,0.0)).rgb;
vec3 color = (color1 + 0.25*(color2 * color3)); //MAD
860 fps vs 900 stock. That’s 5% speed loss with filter and gamma fixed.
Did a "light"version of crt-geom, check the differences Keeps the scanlines of the original with much faster filter of hyllian (about the same look, a bit sharper) etc.
crt-geom-light
crt-geom
Comparison with “fast” shaders
seems this line fixes “sine” scanlines for good (brightness problem and scanlines alignment).
WEIGHT is scanline parameter, 0.0 to 1.0
multiply res to
WEIGHT * sin(fract(vTexCoord.y*SourceSize.y)*3.141529) +(1.0-WEIGHT);
How tiny can a shader be and still be convincing?
float OGL2Pos = vTexCoord.y*SourceSize.y;
float cent = floor(OGL2Pos)+0.5;
float ycoord = cent*SourceSize.w;
vec3 res = texture2D(Source, vec2(vTexCoord.x, ycoord)).rgb;
res *= SCANLINE_WEIGHT*sin(fract(OGL2Pos*0.999)*pi) + 1.0-SCANLINE_WEIGHT ;
float lum = dot(vec3(0.22,0.7,0.08), res);
res *= mix(0.85,BLOOM, lum);
That’s pretty impressive! How does it look at low resolution, like 600p or 400p?
It looks fine on my netbook at 2x integer scale and runs about 100 fps snes9x2005 and that has a 9 gflops gpu Faster than absolutely anything else. And it still has a filter, gamma correct and bloom
Another formula that creates that PVM look scanlines (probably will inject this to crt-beam glsl)
#define SCANLINE 1.0
vec3 res= texture2D(Source, coords).rgb*vec3(1.0,0.93,1.18); // NTSC colors look alike
float lum = dot(res,vec3(0.15,0.55,0.10));
res *= 1.0-(f-0.5)*(f-0.5)*45.0*(SCANLINE*(1.0-lum));
30-40% scanlines look better on low brightness monitors
Nice and simple, but that formula doesn’t scale the scanline width linearly, it jumps alot in the upper values; I’d try to pow the lum to something around 0.1
It all depends on the intended input lum usecase ofc.
This looks really cool! Due to the brightness loss, well defined scanlines are producing, a high brightness panel (or a OLED) should be used. But that’s ok.
Just one question. White produces the biggest vertical spread. 100% R or G or B separately look smaller. Is this right? If i look at CRT screenshots 100% white or single 100% R, G, B have the same vertical spread. Maybe modify lum as a vec3?
I can change the lum to max r/g/b and will solve the width issue.
@kokoko3k i am open to suggestions/improvements.
Here it is with max r/g/b value in to play. weight is used to have some control or else white (or full r/g/b) has no scanlines
float lum = max(max(res.r * weightr,res.g * weightg),res.b * weightb);
Why don’t you just make vec3 lum=res?
That way, every subpixel would have independent height.
Next, for the width(height) thing, I’d try lum=pow(lum,vec3(0.1)) and see what it gives.
Edit:
You can feed the shader with a linear black to white gradient, and see if the apparent gamma is preserved (it could be altered by the scanline width versus lum, that’s the thing.)