Android/GoogleTV compatible shaders - Nitpicky

DariusG · 6 July 2023 07:07

Nice job, some clever tricks there.

Dogway · 26 July 2023 13:21

I uploaded a new version of zfast_crt_geo this time with blur included. I implemented the zfast_crt_composite blur code block into my geo version and looks good on PC, on the Chromecast it runs fine (60fps) but for some reason doesn’t look good, instead of blur I get thicker/blacker black pixels.

zfast_crt_geo_composite.glsl

About the name I would rename this to S-Video when I finish it and the normal geo would be more alike RGB signal.

Dogway · 26 July 2023 19:03

Ok, all I had to do was to load through a preset, because directly changing the sampling to Linear wasn’t working.

So these are direct captures, which one do you prefer?

RGB

S-Video

S-Video (with vertical Quilez)

I think without Quilez looks like more correct blurring…

EDIT: Here’s the shader. I included a bit of desaturation to better mimic S-Video, still 60fps, now compared to original zfast_crt it’s night and day.

DariusG · 26 July 2023 23:02

S-video is the best, without quilez. I think you don’t need that clamp on color? The range is still 0.0-1.0? That’s crt-consumer filter actually that I reused in zfast-composite. Perhaps you could pass some things to vertex shader. And try that ‘ntsc-feather’ shader I uploaded too.

Dogway · 27 July 2023 01:41

Thanks , yes I agree without Quilez looks better. That’s what I did merge the “composite” part of zfast-composite into geo. I thought it was soltangris code. About the clamp yes it was me going frenzy until I realized I needed a preset, I will remove it in next update.

As for ntsc-feather I think it’s too complex for the chromecast , I see three sin(), two dot() and some matrix conversions. This shader is already at the limit of performance for such an underpowered hardware. what do you mean with vertex shader?

The only thing missing that I would add is chroma limited bandwidth aka chroma blur, is that what you call color bleed? Feel free to play with the shader given the perf constraints.

DariusG · 27 July 2023 10:28

Did some tweaks

#version 110

/*
    zfast_crt_geo_svideo - A simple, fast CRT shader.

    Copyright (C) 2017 Greg Hogan (SoltanGris42)
    Copyright (C) 2023 Jose Linares (Dogway)

    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the Free
    Software Foundation; either version 2 of the License, or (at your option)
    any later version.


Notes:  This shader does scaling with a weighted linear filter
        based on the algorithm by Iñigo Quilez here:
        https://iquilezles.org/articles/texture/
        but modified to be somewhat sharper. Then a scanline effect that varies
        based on pixel brightness is applied along with a monochrome aperture mask.
        This shader runs at ~60fps on the Chromecast HD (10GFlops) on a 1080p display.
        (https://forums.libretro.com/t/android-googletv-compatible-shaders-nitpicky)

Dogway: I modified zfast_crt.glsl shader to include screen curvature,
        vignetting, round corners, S-Video like blur, phosphor*temperature and some desaturation.
        The scanlines and mask are also now performed in the recommended linear light.
        For this to run smoothly on GPU deprived platforms like the Chromecast and
        older consoles, I had to remove several parameters and hardcode them into the shader.
        Another POV is to run the shader on handhelds like the Switch or SteamDeck so they consume less battery.

*/

//For testing compilation
//#define FRAGMENT
//#define VERTEX

// Parameter lines go here:
#pragma parameter SCANLINE_WEIGHT "Scanline Amount"     7.0 0.0 15.0 0.5
#pragma parameter MASK_DARK       "Mask Effect Amount"  0.5 0.0 1.0 0.05
#pragma parameter g_vstr          "Vignette Strength"   20.0 0.0 50.0 1.0
#pragma parameter g_vpower        "Vignette Power"      0.40 0.0 0.5 0.01
#pragma parameter blurx           "Convergence X-Axis"  0.50 -2.0 2.0 0.05
#pragma parameter blury           "Convergence Y-Axis" -0.20 -2.0 2.0 0.05

#if defined(VERTEX)

#if __VERSION__ >= 130
#define COMPAT_VARYING out
#define COMPAT_ATTRIBUTE in
#define COMPAT_TEXTURE texture
#else
#define COMPAT_VARYING varying
#define COMPAT_ATTRIBUTE attribute
#define COMPAT_TEXTURE texture2D
#endif

#ifdef GL_ES
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

COMPAT_ATTRIBUTE vec4 VertexCoord;
COMPAT_ATTRIBUTE vec4 COLOR;
COMPAT_ATTRIBUTE vec4 TexCoord;
COMPAT_VARYING vec4 COL0;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING vec2 scale;

vec4 _oPosition1;
uniform mat4 MVPMatrix;
uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;

// compatibility #defines
#define vTexCoord TEX0.xy

#ifdef PARAMETER_UNIFORM
// All parameter floats need to have COMPAT_PRECISION in front of them
uniform COMPAT_PRECISION float SCANLINE_WEIGHT;
uniform COMPAT_PRECISION float MASK_DARK;
uniform COMPAT_PRECISION float g_vstr;
uniform COMPAT_PRECISION float g_vpower;
uniform COMPAT_PRECISION float blurx;
uniform COMPAT_PRECISION float blury;
#else
#define SCANLINE_WEIGHT 7.0
#define MASK_DARK 0.5
#define g_vstr 20.0
#define g_vpower 0.40
#define blurx 0.50
#define blury -0.20
#endif

void main()
{
    gl_Position = MVPMatrix * VertexCoord;
    TEX0.xy = TexCoord.xy;
    scale = TextureSize.xy/InputSize.xy;
}

#elif defined(FRAGMENT)

#ifdef GL_ES
#ifdef GL_FRAGMENT_PRECISION_HIGH
precision highp float;
#else
precision mediump float;
#endif
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

#if __VERSION__ >= 130
#define COMPAT_VARYING in
#define COMPAT_TEXTURE texture
out COMPAT_PRECISION vec4 FragColor;
#else
#define COMPAT_VARYING varying
#define FragColor gl_FragColor
#define COMPAT_TEXTURE texture2D
#endif

uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;
uniform sampler2D Texture;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING vec2 scale;

// compatibility #defines
#define Source Texture
#define vTexCoord TEX0.xy
#define blur_y blury/(TextureSize.y*2.0)
#define blur_x blurx/(TextureSize.x*2.0)

#ifdef PARAMETER_UNIFORM
// All parameter floats need to have COMPAT_PRECISION in front of them
uniform COMPAT_PRECISION float SCANLINE_WEIGHT;
uniform COMPAT_PRECISION float MASK_DARK;
uniform COMPAT_PRECISION float g_vstr;
uniform COMPAT_PRECISION float g_vpower;
uniform COMPAT_PRECISION float blurx;
uniform COMPAT_PRECISION float blury;
#else
#define SCANLINE_WEIGHT 7.0
#define MASK_DARK 0.5
#define g_vstr 20.0
#define g_vpower 0.40
#define blurx 0.50
#define blury -0.20
#endif


/*
// NTSC-J (D93) -> Rec709 D65 Joint Matrix (with D93 simulation)
// This is compensated for a linearization hack (RGB*RGB and then sqrt())
const mat3 P22D93 = mat3(
     1.00000, 0.00000, -0.06173,
     0.07111, 0.96887, -0.01136,
     0.00000, 0.08197,  1.07280);

// SAT 0.95;0.9
const mat3 SAT95 = mat3(
     0.921259999275207500, 0.07151728868484497, 0.007221288979053497,
     0.022333506494760513, 0.97512906789779660, 0.002537854015827179,
     0.010629460215568542, 0.03575950860977173, 0.953611016273498500);
*/


// P22D93 * SAT95
const mat3 P22D93SAT95 = mat3(
     0.920603871345520000, 0.06930985301733017, -0.051645118743181230,
     0.087028317153453830, 0.94945263862609860, -0.007860664278268814,
     0.013233962468802929, 0.11829412728548050,  1.023241996765136700);



vec2 Warp(vec2 pos)
{
    pos  = pos*2.0-1.0;
    pos *= vec2(1.0 + (pos.y*pos.y)*0.0276, 1.0 + (pos.x*pos.x)*0.0414);
    return pos*0.5 + 0.5;
}


void main()
{
    vec2 vpos = vTexCoord*scale;
    vec2 xy   = Warp(vpos);

    vec2 corn = min(xy,vec2(1.0)-xy); // This is used to mask the rounded
    corn.x = 0.0001/corn.x;           // corners later on

    xy    /= scale;


    COMPAT_PRECISION vec3 colour  = COMPAT_TEXTURE(Source,     xy).rgb;
    // 0.004 is the strength of flickering, common in all CRTs even on RGB
    COMPAT_PRECISION vec3 sample1 = 0.004*sin(float(FrameCount))+COMPAT_TEXTURE(Source,vec2(xy.x + blur_x, xy.y - blur_y)).rgb;
    COMPAT_PRECISION vec3 sample2 = 0.5*colour;
    COMPAT_PRECISION vec3 sample3 = 0.004*sin(float(FrameCount))+COMPAT_TEXTURE(Source,vec2(xy.x - blur_x, xy.y + blur_y)).rgb;

    colour = vec3(      sample1.r*0.50 + sample2.r,
                        sample1.g*0.25 + sample2.g + sample3.g*0.25,
                                         sample2.b + sample3.b*0.50);

    vpos  *= (1.0 - vpos.xy);
    float vig = vpos.x * vpos.y * max(10.0,-1.8*g_vstr+100.0);
    vig = min(pow(vig, g_vpower), 1.0);
    vig = vig >= 0.5 ? smoothstep(0.0,1.0,vig) : vig;


    // Of all the pixels that are mapped onto the texel we are
    // currently rendering, which pixel are we currently rendering?
    float ratio_scale = xy.y * TextureSize.y - 0.5;
    // Snap to the center of the underlying texel.
    float i = floor(ratio_scale) + 0.5;

    // This is just like "Quilez Scaling" but sharper
    float f = ratio_scale - i;
    COMPAT_PRECISION float Y = f*f;

    vec2 MSCL = OutputSize.y > 1499.0 ? vec2(0.30) : vec2(0.499999, 0.5);

    COMPAT_PRECISION float whichmask = floor(vTexCoord.x*scale.x*OutputSize.x)*-MSCL.x;
    COMPAT_PRECISION float mask = 1.0 + float(fract(whichmask) < MSCL.y) * -MASK_DARK;

    vec3 P22 = ((colour*colour) * P22D93SAT95) * vig;
    colour = max(vec3(0.0),P22);

    COMPAT_PRECISION float scanLineWeight = (1.5 - SCANLINE_WEIGHT*(Y - Y*Y));

    if (corn.y <= corn.x || corn.x < 0.0001 )
    colour = vec3(0.0);

    FragColor.rgba = vec4(sqrt(colour.rgb*(mix(scanLineWeight*mask, 1.0, dot(colour.rgb,vec3(0.26667))))),1.0);

}
#endif

Also remove 1.0001 in vertex since you use 0.4999 already

DariusG · 27 July 2023 16:47

I have an old device with Adreno 205 (GLES 2.0, 8.5 gflops) and your zfast-crt-geo runs at 35 fps so there is a relevance in performance, seems your gpu is aroung 20 but i can test things there. If more than 30 looks like it will run pretty good there.

Dogway · 27 July 2023 17:50

Thanks, I optimized your version a bit but still fps drops to about 45fps.

#version 110

#pragma parameter CURVATURE       "Screen Curvature"    1.0 0.0 1.0 1.0
#pragma parameter SCANLINE_WEIGHT "Scanline Amount"     7.0 0.0 15.0 0.5
#pragma parameter MASK_DARK       "Mask Effect Amount"  0.5 0.0 1.0 0.05
#pragma parameter g_vstr          "Vignette Strength"   20.0 0.0 50.0 1.0
#pragma parameter g_vpower        "Vignette Power"      0.40 0.0 0.5 0.01
#pragma parameter blurx           "Convergence X-Axis"  0.50 -2.0 2.0 0.05
#pragma parameter blury           "Convergence Y-Axis" -0.20 -2.0 2.0 0.05

#if defined(VERTEX)

#if __VERSION__ >= 130
#define COMPAT_VARYING out
#define COMPAT_ATTRIBUTE in
#define COMPAT_TEXTURE texture
#else
#define COMPAT_VARYING varying
#define COMPAT_ATTRIBUTE attribute
#define COMPAT_TEXTURE texture2D
#endif

#ifdef GL_ES
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

COMPAT_ATTRIBUTE vec4 VertexCoord;
COMPAT_ATTRIBUTE vec4 COLOR;
COMPAT_ATTRIBUTE vec4 TexCoord;
COMPAT_VARYING vec4 COL0;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING vec2 scale;

vec4 _oPosition1;
uniform mat4 MVPMatrix;
uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;

// compatibility #defines
#define vTexCoord TEX0.xy

#ifdef PARAMETER_UNIFORM
// All parameter floats need to have COMPAT_PRECISION in front of them
uniform COMPAT_PRECISION float CURVATURE;
uniform COMPAT_PRECISION float SCANLINE_WEIGHT;
uniform COMPAT_PRECISION float MASK_DARK;
uniform COMPAT_PRECISION float g_vstr;
uniform COMPAT_PRECISION float g_vpower;
uniform COMPAT_PRECISION float blurx;
uniform COMPAT_PRECISION float blury;
#else
#define CURVATURE 1.0
#define SCANLINE_WEIGHT 7.0
#define MASK_DARK 0.5
#define g_vstr 20.0
#define g_vpower 0.40
#define blurx 0.50
#define blury -0.20
#endif

void main()
{
    gl_Position = MVPMatrix * VertexCoord;
    TEX0.xy     = TexCoord.xy*1.00001;
    scale       = TextureSize.xy/InputSize.xy;
}

#elif defined(FRAGMENT)

#ifdef GL_ES
#ifdef GL_FRAGMENT_PRECISION_HIGH
precision highp float;
#else
precision mediump float;
#endif
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

#if __VERSION__ >= 130
#define COMPAT_VARYING in
#define COMPAT_TEXTURE texture
out COMPAT_PRECISION vec4 FragColor;
#else
#define COMPAT_VARYING varying
#define FragColor gl_FragColor
#define COMPAT_TEXTURE texture2D
#endif

uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;
uniform sampler2D Texture;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING vec2 scale;

// compatibility #defines
#define Source Texture
#define vTexCoord TEX0.xy
#define blur_y blury/(TextureSize.y*2.0)
#define blur_x blurx/(TextureSize.x*2.0)

#ifdef PARAMETER_UNIFORM
// All parameter floats need to have COMPAT_PRECISION in front of them
uniform COMPAT_PRECISION float CURVATURE;
uniform COMPAT_PRECISION float SCANLINE_WEIGHT;
uniform COMPAT_PRECISION float MASK_DARK;
uniform COMPAT_PRECISION float g_vstr;
uniform COMPAT_PRECISION float g_vpower;
uniform COMPAT_PRECISION float blurx;
uniform COMPAT_PRECISION float blury;
#else
#define CURVATURE 1.0
#define SCANLINE_WEIGHT 7.0
#define MASK_DARK 0.5
#define g_vstr 20.0
#define g_vpower 0.40
#define blurx 0.50
#define blury -0.20
#endif


// P22D93 * SAT95
const mat3 P22D93SAT95 = mat3(
     0.920603871345520000, 0.06930985301733017, -0.051645118743181230,
     0.087028317153453830, 0.94945263862609860, -0.007860664278268814,
     0.013233962468802929, 0.11829412728548050,  1.023241996765136700);



vec2 Warp(vec2 pos)
{
    pos  = pos*2.0-1.0;
    pos *= vec2(1.0 + (pos.y*pos.y)*0.0276, 1.0 + (pos.x*pos.x)*0.0414);
    return pos*0.5 + 0.5;
}


void main()
{

        vec2 vpos = vTexCoord;
        vec2 xy,corn;
    if (CURVATURE > 0.0) {
             vpos  *= scale;
             xy     = Warp(vpos);

             corn   = min(xy,vec2(1.0)-xy); // This is used to mask the rounded
             corn.x = 0.0001/corn.x;        // corners later on

             xy    /= scale;
    } else {
             corn   = vec2(1.0,1.1);
             xy     = vpos;
             vpos  *= scale;
    }


    // 0.004 is the strength of flickering, common in all CRTs even on RGB
    float sm = 0.004*sin(float(FrameCount));

    COMPAT_PRECISION vec2 sample1 = sm  + COMPAT_TEXTURE(Source,vec2(xy.x + blur_x, xy.y - blur_y)).rg;
    COMPAT_PRECISION vec3 sample2 = 0.5 * COMPAT_TEXTURE(Source,     xy).rgb;
    COMPAT_PRECISION vec2 sample3 = sm  + COMPAT_TEXTURE(Source,vec2(xy.x - blur_x, xy.y + blur_y)).gb;

    vec3 colour =    vec3(sample1.r*0.50 + sample2.r,
                          sample1.g*0.25 + sample2.g + sample3.r*0.25,
                                           sample2.b + sample3.g*0.50);

    vpos  *= (1.0 - vpos.xy);
    float vig = vpos.x * vpos.y * max(10.0,-1.8*g_vstr+100.0);
    vig = min(pow(vig, g_vpower), 1.0);
    vig = vig >= 0.5 ? smoothstep(0.0,1.0,vig) : vig;


    // Of all the pixels that are mapped onto the texel we are
    // currently rendering, which pixel are we currently rendering?
    float ratio_scale = xy.y * TextureSize.y - 0.5;
    // Snap to the center of the underlying texel.
    float i = floor(ratio_scale) + 0.5;

    // This is just like "Quilez Scaling" but sharper
    float f = ratio_scale - i;
    COMPAT_PRECISION float Y = f*f;

    vec2 MSCL = OutputSize.y > 1499.0 ? vec2(0.30) : vec2(0.5);

    COMPAT_PRECISION float whichmask = floor(vTexCoord.x*4.0*OutputSize.x)*-MSCL.x;
    COMPAT_PRECISION float mask = 1.0 + float(fract(whichmask) < MSCL.y) * -MASK_DARK;

    vec3 P22 = ((colour*colour) * P22D93SAT95) * vig;
    colour = max(vec3(0.0),P22);

    COMPAT_PRECISION float scanLineWeight = (1.5 - SCANLINE_WEIGHT*(Y - Y*Y));

    if (corn.y <= corn.x || corn.x < 0.0001)
    colour = vec3(0.0);

    FragColor.rgba = vec4(sqrt(colour.rgb*(mix(scanLineWeight*mask, 1.0, dot(colour.rgb,vec3(0.26667))))),1.0);

}
#endif

As you see I also included a Curvature option, but this creates branching so also slow, even without the flickering code. sin() is too slow also even when taken out to a var to calculate it once per frame.

On another note scale.x wasn’t working for me so I had to add 4.0 again.

Maybe this is as far as it gets for 10GFlops (20 in MT).

DariusG · 27 July 2023 18:50

You have way too many user variables that for sure kill performance. Perhaps hardcode vignette and add only on/off to save 1 variable?

Dogway · 27 July 2023 19:01

But an on/off creates branching for if else statement blocks. Also a simple ternary later on won’t prevent the vignette code from being parsed. Yes it’s a matter of compromises now.

DariusG · 27 July 2023 19:03

I would do this too (add size 1.0 or 2.0)

vec2 MSCL = vec2(0.5/size);

COMPAT_PRECISION float whichmask = floor(vTexCoord.x*4.0*OutputSize.x)*-MSCL.x;

Dogway · 27 July 2023 19:37

with size you mean OutputSize.y?

Anyway I tried that before but I think non integer values would give a bad mask so I don’t see the point.

You could use this linear fit, good for 1080p and for 2160p (a fract of 0.0 or so) but what about 1440p or 720p? I don’t know.

vec2 MSCL = vec2(-0.000185185*OutputSize.y+0.7);

DariusG · 28 July 2023 20:30

Check this out too, tell me how it works and if you like it.

#version 110



// Parameter lines go here:

#pragma parameter MASK "Mask Strength" 0.3 0.0 1.0 0.05

#define PI 3.141592

#if defined(VERTEX)

#if __VERSION__ >= 130
#define COMPAT_VARYING out
#define COMPAT_ATTRIBUTE in
#define COMPAT_TEXTURE texture
#else
#define COMPAT_VARYING varying
#define COMPAT_ATTRIBUTE attribute
#define COMPAT_TEXTURE texture2D
#endif

#ifdef GL_ES
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

COMPAT_ATTRIBUTE vec4 VertexCoord;
COMPAT_ATTRIBUTE vec4 COLOR;
COMPAT_ATTRIBUTE vec4 TexCoord;
COMPAT_VARYING vec4 COL0;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING float omega;
COMPAT_VARYING float omega2;
COMPAT_VARYING vec2 cent;
COMPAT_VARYING vec2 scale;

vec4 _oPosition1;
uniform mat4 MVPMatrix;
uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;

// compatibility #defines
#define vTexCoord TEX0.xy

void main()
{
    gl_Position = MVPMatrix * VertexCoord;
    TEX0.xy = TexCoord.xy*1.0001;
    scale = TextureSize.xy/InputSize.xy;
    omega = TEX0.x*OutputSize.x*scale.x*PI;
    omega2 = TEX0.x*OutputSize.x*0.6667*scale.x*PI;
    cent = floor(TEX0.xy*TextureSize.xy)+0.5;

}

#elif defined(FRAGMENT)

#ifdef GL_ES
#ifdef GL_FRAGMENT_PRECISION_HIGH
precision highp float;
#else
precision mediump float;
#endif
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

#if __VERSION__ >= 130
#define COMPAT_VARYING in
#define COMPAT_TEXTURE texture
out COMPAT_PRECISION vec4 FragColor;
#else
#define COMPAT_VARYING varying
#define FragColor gl_FragColor
#define COMPAT_TEXTURE texture2D
#endif

uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;
uniform sampler2D Texture;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING float omega;
COMPAT_VARYING float omega2;
COMPAT_VARYING vec2 cent;
COMPAT_VARYING vec2 scale;

// compatibility #defines
#define Source Texture
#define vTexCoord TEX0.xy

#ifdef PARAMETER_UNIFORM
// All parameter floats need to have COMPAT_PRECISION in front of them
uniform COMPAT_PRECISION float SHARPX;
uniform COMPAT_PRECISION float SHARPY;
uniform COMPAT_PRECISION float MASK;

#else
#define SHARPX 0.5
#define SHARPY 0.5
#define MASK 0.7

#endif

#define SourceSize vec4(TextureSize.xy, 1.0/TextureSize.xy)

vec2 Warp(vec2 pos)
{
    pos  = pos*2.0-1.0;    
    pos *= 1.0 +vec2((pos.y*pos.y),(pos.x*pos.x))*0.04;
    
    return pos*0.5 + 0.5;
}

void main()
{   
    vec2 linear = vTexCoord;
    vec2 nearest = cent * SourceSize.zw;    
    float x = mix(linear.x, nearest.x, 0.2);
    float y = mix(linear.y, nearest.y, 0.35);
    vec2 coords = vec2(x,y);
    coords = Warp(coords*scale)/scale;
    vec3 res = COMPAT_TEXTURE(Source,coords).rgb;
    
    float scanline =  sin(fract(coords.y*SourceSize.y)*PI);
    float mask = InputSize.y < 224.0 ? MASK*sin(omega2)+1.0-MASK : MASK*sin(omega)+1.0-MASK;
    res *= mix(mask*scanline, 1.0, dot(res,vec3(0.2666)));
    
    FragColor = vec4(res,1.0);
}
#endif

Dogway · 28 July 2023 21:47

Thanks, didn’t try on the Chromecast but on my 4K monitor the mask is not visible. Also missing rounded corners? Anyway I’m settled already for the version I uploaded to the repo, I noticed you are metallic77 on github, very active lol. Nice.

Dogway · 31 July 2023 15:37

I was annoyed by the fact that raising scanlines and mask made the image so much darker, so I created a compensation gamma that keeps the gamma/brightness. Now, using a pow() is so expensive that I had to not only optimize the rest of the shader but also be creative, yet for some cores like Gen Plus X this is still not enough. Is there something (core) really lightweight for Sega Genesis? (EDIT: Picodrive?)

As for the shader this is the function:

// Returns gamma corrected output, compensated for scanline+mask embedded gamma
vec3 inv_gamma(vec3 col, vec3 power)
{
    vec3 cir  = col-1.0;
         cir *= cir;
         col  = mix(sqrt(col),sqrt(1.0-cir),power);
    return col;
}

The power bias is scanline and mask implementation dependent. for zfast_crt this is the mapping.

#define pwr vec3(1.0/((-0.0325*SCANLINE_WEIGHT+1.0)*(-0.311*MASK_DARK+1.0))-1.2)

I have almost finished, I’m still optimizing for Sega Genesis. For SNES only zfast_crt_geo_svideo which is the all-features shader is slow for 8:7 games.

EDIT: Nice, I enabled a few settings in RA for Chromecast and now all run full speed. I tested enabling multithreading but it added some judder so I disabled GPU screenshots, added 50ms of shader load delay, and enabled CPU and GPU sync. I’m going to add vignetting back to the S-Video variants and see how they fare.

Cyber · 1 August 2023 11:26

If all CRT shaders could do this without clipping life would be much better for people like me.

It could be simple as measuring peak brightness before and after then compensating or as complex as measuring average brightness but any form of compensation with a clipping mitigation algorithm would be a Godsend.

Disabling GPU Screenshots improves performance?

All of this is while keeping multithreading enabled?

Dogway · 31 July 2023 20:40

Yes, I mentioned this a long ago but everybody seems to forget about the emulated scanline+mask gamma, so the system (end-to-end) gamma becomes not 1 or 1.1 but much higher.

My function above basically is a mix of sqrt() to reencode the hacky ungamma, and an inverted gamma or no-clip brightness function whatever you prefer. This is why nesguy promoted to raise display brightness, but that is more or less a gain control not an inverted gamma operator so the look won’t be fully recovered.

A power function is not symmetrical, it gives more weight to darks so for the opposite effect (the behaviour of scanlines+mask gamma) you invert the source and apply a power function, then invert back.

Here is a comparison of sqrt() and the inverted gamma which exactly models zfast scalines and mask gamma. At a=1 the effect is like sqrt() at a=2 the curve is exactly symmetrical, a quarter of a circle, just as if adding brightness to a sqrt() but in one go.

https://www.desmos.com/calculator/eo4ardp927

I don’t know, I tried disabling a bunch of things and it seemed to do the trick. This is all with multithreading disabled.

Just uploaded the updated shaders. Only zfast_crt_geo_svideo needs further optimizations.

Here a capture of Sonic with Scanline=9.0, mask=0.15 and blur_x=1.0

DariusG · 1 August 2023 05:28

Great job, that geo_composite hack runs full speed on my old htc one m7 while my zfast_compo runs 52-55 fps.

DariusG · 1 August 2023 15:18

If Chromecast can do without framedrops here is a matrix to go to NTSC colors in linear space. I am thinking to include this in some shaders i wrote.

// linear space (exact)
const mat3 NTSC = mat3(1.5073,  -0.3725, -0.0832, 
                            -0.0275, 0.9350,  0.0670,
                             -0.0272, -0.0401, 1.1677);


//non linear approximate (but not exact)
const mat3 NTSC2  = mat3 (
1.5164,	-0.4945,	-0.02,
-0.0372,	0.9571,	0.0802,
-0.0192,	-0.0309,	1.0500
);

Also quote from an old NewRisingSun post

“Japanese NTSC (“NTSC-J”) does not use a black setup of 7.5 IRE, but 0 IRE. Without compensation for the black setup, you’ll get the NTSC-J “look”. To get the “American” look, use signal = (signal - 0.075) / (1 - 0.075).” Actually a mix because J has colder colors too, but developers that used it definitely took that in to account, so games that made for J would look warmer in U.

Dogway · 2 August 2023 03:28

The full implementation is in grade , here for performance I omitted many things but kept the important.

What are the primaries of that transformation ? in other words what phosphors do they replicate?

You have to keep in mind that phosphors are the last thing in the chain not part of the signal So what is always presented to the display before the phosphors is a normalized signal regardless of region . Then you have to cater phosphors depending on region or CRT unit