Android/GoogleTV compatible shaders - Nitpicky

I noticed my Grade glsl shader would fail to load on RetroArch in GoogleTV, and as I found out many other shaders fail as well.

After some hours of painstakingly debugging these are some finds:

  • Non–square matrices not supported

  • mat3 YIQtoRGB = mat3( The mat3( portion must be just right to the =

  • Types should be respected for pow(), so for vec3, pow(RGB, vec3(2.2)). This is not the case for other operations where you can + - * / a scalar and a vector

  • So you can mix scalar and vectors, but not floats with ints. So do 1.0 + 0.5 and not 1 + 0.5

  • It doesn’t like mat3 type functions nor mat3 arguments in functions

So far these are what I found, currently having an issue with a block statement. Tried several alternatives all without success.

       mat3 LimThres;

    if (g_space_out < 1.0) {

       LimThres =  g_crtgamut == 3.0 ? mat3( 0.000000,0.044065,0.000000,
                                 0.000000,0.095638,0.000000,
                                 0.000000,0.000000,0.000000) : \
       g_crtgamut == 2.0 ? mat3( 0.006910,0.092133,0.000000,
                                 0.039836,0.121390,0.000000,
                                 0.000000,0.000000,0.000000) : \
       g_crtgamut == 1.0 ? mat3( 0.018083,0.059489,0.017911,
                                 0.066570,0.105996,0.066276,
                                 0.000000,0.000000,0.000000) : \
                           mat3( 0.100000,0.100000,0.100000,
                                 0.125000,0.125000,0.125000,
                                 0.000000,0.000000,0.000000);
    } else if (g_space_out > 0.0) {

       LimThres =  g_crtgamut == 3.0 ? mat3( 0.000000,0.234229,0.007680,
                                 0.000000,0.154983,0.042446,
                                 0.000000,0.000000,0.000000) : \
       g_crtgamut == 2.0 ? mat3( 0.078526,0.108432,0.006143,
                                 0.115731,0.127194,0.037039,
                                 0.000000,0.000000,0.000000) : \
       g_crtgamut == 1.0 ? mat3( 0.021531,0.237184,0.013466,
                                 0.072018,0.155438,0.057731,
                                 0.000000,0.000000,0.000000) : \
                           mat3( 0.100000,0.100000,0.100000,
                                 0.125000,0.125000,0.125000,
                                 0.000000,0.000000,0.000000);
    } else {

         LimThres =        mat3( 0.100000,0.100000,0.100000,
                                 0.125000,0.125000,0.125000,
                                 0.000000,0.000000,0.000000);
    }

The Chromecast HD has a Mali-G31 GPU with support for:

OpenGL®ES 1.1, 2.0, 3.2
Vulkan 1.2
1 Like

Yes, GLES can be a real pain in the butt. That’s why the crt-royale glsl ports don’t work on mobile :wink:

Is that snippet giving an error? If so, what? I usually start by testing shaders with an explicit #version 100 directive, which has a lot of the same picky-ness.

No implicit casting is indeed tedious/frustrating, and yeah, the matrix stuff is very barebones on those older #version levels and GLES. EDIT: oh, and initializing variables to zero rather than just declaring them can help avoid spooky behavior.

2 Likes

Yes i wanted to fix it to work on Android but i didn’t want to interfere with other active coders code. That’s why i did “chromaticity” and “simple color controls”, to somehow replace Grade on my Androids (combined).

1 Like

So this is very GPU dependent right? Meaning it heavily depends on the ARM GPU feature set. A recent and premium device would have no issues with GL 1.3 or even slang shaders, while console classics, RPi, Chromecast or weak tablets are very limited. In this case it should be #version 110 directive?

I was testing with several shaders on the Chromecast and OMG, so weak, I realized a port of Grade would not run smooth enough whatever I twist it and thus modified zfast_crt to make it “complete” with vignetting, corner size, border smoothness, distortion, and a phosphor * temperature joint matrix. Here is the first version. It uses pow for linearization (for the matrix) but I will change that to a simple product + sqrt after I compensate for the temperature increase.

Runs between 35 and 40fps, not too bad but a bit more work to do. I noticed that toggles don’t make a difference in performance and #if #else hard coded block statements should be used instead.

On another note, I bought the Chromecast because it was very cheap and also a very convenient device for Kodi (via SMB), TV Channel APPs for on demand programs, and of course RetroArch. I bought it with the idea of a stopgap for a future GoogleTV Set-Top Box release, which would be very dumb if they don’t capitalize on that.

1 Like

If you need certain features, you can force different versions, like 120 and 130 (RetroArch will transparently convert those version directives to the appropriate GLES directive) and they’ll just fail to compile on incompatible hardware. The GPU-specific stuff is typically more of an issue with precision, which is a huge drag.

The more configurable options you add, the slower it will become. For an absolute mini shader with filter, scanlines and mask check fake-crt-geom-potato. For what it counts after tweaking multiple shaders I believe the best scanlines are crt-geom with multiple oversample, best filter is a matter of personal taste. Catmull and spline16-fast are fine. You can run spline16-fast and fake crt geom potato on top (nearest, it has its own filter on linear).

PS the more masks you add the slower it gets too. Clamp, pow, sign, “while this - > do that” absolutely kill performance, it hits rock bottom lol. Especially when the while is very long.

2 Likes

Thanks, yes, ternaries consume a lot right? I think I will hard code a few things, namely corner and border smoothness. I just optimized the linearization part and compensated the matrix, which again is hard coded to NTSC-J gamut and D93 (8504K), but on a potato that’s mostly the target 8-bit and 16-bit emulation so fine with that.

I tested fake-crt-geom-potato but it lacked a few important things like beam dynamics, even if a cheap one, also the mask is too narrow not alike consumer CRTs. What I liked most regarding fast scaline+mask were crt-sim and zfast_crt. I wanted to give crt_guest-mini/fast a go but I think there’s no glsl version(?)

1 Like

It’s there it’s called crt-Guest.r-mini. That Mali GPU should have around 21 GFLOPS that’s very low if I am looking the same GPU. Probably could run fake-crt-geom-potato with a few add-ons, scanline dynamics and a bit more. Gdv mini should require around 80 GFLOPS, my old HTC one has 55-something and can’t run it full speed, around 45 fps. But it can run fake-crt-geom.

From my notes the Chromecast GPU is about 10Gflops.

My notes:

The SNES GPU is a Mali-400 MP2 from 2008 (5 GF) GigaFlops
The Chromecast is a Mali-G31 MP2 from 2018 (10 GF)

Now with the latest optimizations I increased 5 fps or a 15% perf increase. So it sits now at ~40fps for snes9x. For the NES QuickNES is faster than Mesen and also sits at ~40fps. I guess that for anything 1.5x faster than this GPU would run fine at 60fps, maybe some cheap tablets/phones and probably some raspberry pi? Well, I’m also forgetting some platforms for which I ignore the GPU capabilities like the Xbox OG, 3DS, what else, GC, Wii, PS2, and PSP.

EDIT: Wow, so deprived

XBOX OG (20GFlops)
Wii (12GFlops)
ChromecastHD (10GFlops)
Raspberry Pi-4B (4Gb) (ARMv8: 13.5GF. ARMv7: 9.69GF)
Raspberry Pi-4B (1Gb) (9.92GFlops)
GC (9.4GFlops)
PS2 (6.2GFlops)
Raspberry Pi-3B+ (5.3GFlops)
Raspberry Pi Zero 2W (5.1GFlops)
Raspberry Pi-3A+ (5.0GFlops)
SNSC (5GFlops)
3DS (4.8GFlops)
raspberry pi 2B-v1.2 (4.43GFlops)
raspberry pi3 (3.62GFlops)
PSP (2.6GFlops)

I guess I need to leave something out, probably geometry distortion to target at least the GC and up.

1 Like

It’s 10.4 gflops per core, that gpu should have 2 cores. Check this out, if it runs well. After so many hours studying, tweaking, altering, etc i think i can match z-fast :smile: :smile:

** RUN IN LINEAR (for the filter to work) // Simple scanlines and mask effect // original by hunterk, edit by DariusG

///////////////////////  Runtime Parameters  ///////////////////////
#pragma parameter SCANLOW "Scanline Intensity Low" 0.70 0.0 1.0 0.05
#pragma parameter SCANHIGH "Scanline Intensity High" 0.35 0.0 1.0 0.05
#pragma parameter MSK "Mask Brightness" 0.7 0.0 1.0 0.05
#pragma parameter BRIGHTBOOSTH "Boost Bright" 1.25 1.0 2.0 0.05
#pragma parameter BRIGHTBOOSTL "Boost Dark" 1.15 1.0 2.0 0.05



#define pi 3.141592

#if defined(VERTEX)

#if __VERSION__ >= 130
#define COMPAT_VARYING out
#define COMPAT_ATTRIBUTE in
#define COMPAT_TEXTURE texture
#else
#define COMPAT_VARYING varying 
#define COMPAT_ATTRIBUTE attribute 
#define COMPAT_TEXTURE texture2D
#endif

#ifdef GL_ES
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

COMPAT_ATTRIBUTE vec4 VertexCoord;
COMPAT_ATTRIBUTE vec4 COLOR;
COMPAT_ATTRIBUTE vec4 TexCoord;
COMPAT_VARYING vec4 COL0;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING float fragpos;

vec4 _oPosition1; 
uniform mat4 MVPMatrix;
uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;

// compatibility #defines
#define vTexCoord TEX0.xy
#define SourceSize vec4(TextureSize, 1.0 / TextureSize) //either TextureSize or InputSize
#define OutSize vec4(OutputSize, 1.0 / OutputSize)

#ifdef PARAMETER_UNIFORM
uniform COMPAT_PRECISION float WHATEVER;
#else
#define WHATEVER 0.0
#endif

void main()
{
    gl_Position = MVPMatrix * VertexCoord;
    TEX0.xy = TexCoord.xy*1.0001;
    fragpos=TEX0.x*OutputSize.x*TextureSize.x/InputSize.x;
}

#elif defined(FRAGMENT)

#if __VERSION__ >= 130
#define COMPAT_VARYING in
#define COMPAT_TEXTURE texture
out vec4 FragColor;
#else
#define COMPAT_VARYING varying
#define FragColor gl_FragColor
#define COMPAT_TEXTURE texture2D
#endif

#ifdef GL_ES
#ifdef GL_FRAGMENT_PRECISION_HIGH
precision highp float;
#else
precision mediump float;
#endif
#define COMPAT_PRECISION mediump
#else
#define COMPAT_PRECISION
#endif

uniform COMPAT_PRECISION int FrameDirection;
uniform COMPAT_PRECISION int FrameCount;
uniform COMPAT_PRECISION vec2 OutputSize;
uniform COMPAT_PRECISION vec2 TextureSize;
uniform COMPAT_PRECISION vec2 InputSize;
uniform sampler2D Texture;
COMPAT_VARYING vec4 TEX0;
COMPAT_VARYING float fragpos;

// compatibility #defines
#define Source Texture
#define vTexCoord TEX0.xy

#define SourceSize vec4(TextureSize, 1.0 / TextureSize) //either TextureSize or InputSize
#define OutSize vec4(OutputSize, 1.0 / OutputSize)

#ifdef PARAMETER_UNIFORM
uniform COMPAT_PRECISION float SCANLOW;
uniform COMPAT_PRECISION float SCANHIGH;
uniform COMPAT_PRECISION float MSK;
uniform COMPAT_PRECISION float BRIGHTBOOSTH;
uniform COMPAT_PRECISION float BRIGHTBOOSTL;


#else
#define SCANLOW 0.70
#define SCANHIGH 0.30
#define MSK 0.70
#define BRIGHTBOOSTL 1.0
#define BRIGHTBOOSTH 1.0

#endif

// MSK mask calculation
     
      float Mask(float pos)
      {
      float mf = fract(pos * 0.3333);

      if (mf < 0.3333) return (MSK);
      else return (1.0);
  
      }

vec2 Warp(vec2 pos)
{
    pos  = pos*2.0-1.0;    
    pos *= vec2(1.0 + (pos.y*pos.y)*0.03, 1.0 + (pos.x*pos.x)*0.05);
    
    return pos*0.5 + 0.5;
}


void main()
{   
    vec2 pos = Warp(vTexCoord*TextureSize/InputSize)*(InputSize/TextureSize);
    float OGL2Pos = pos.y*SourceSize.y;

// return y axis to nearest, exploit bilinear to create a typical filter    
    float cent = floor(OGL2Pos)+0.5;
    float ycoord = cent*SourceSize.w; 

   vec3 res = texture2D(Source, vec2(pos.x, ycoord)).rgb;
   float l = max(max(res.r,res.g),res.b);

// add some beam dynamics
    float SCANLINE = mix(SCANLOW,SCANHIGH,l);
     res *= SCANLINE*sin(fract(pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 
// add oversample here     
     res *= SCANLINE*sin(fract(1.0-pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 
     res *= SCANLINE*sin(fract(1.0+pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 

//cheap gamma in/out, mask looks better   
     res *=res;
     res *= Mask(fragpos);
     res = sqrt(res);
//some typical brightboost, looks better     
     res *= mix(BRIGHTBOOSTL,BRIGHTBOOSTH,l);
    FragColor = vec4(res,1.0);
} 
#endif

2 Likes

So strange, I tried to optimize your code a little bit, and turned out 10% slower or so.

Something that should make sense:

// add some beam dynamics
    float SCANLINE = mix(SCANLOW,SCANHIGH,l);
     res *= SCANLINE*sin(fract(pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 
// add oversample here     
     res *= SCANLINE*sin(fract(1.0-pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 
     res *= SCANLINE*sin(fract(1.0+pos.y*SourceSize.y)*pi)+1.0-SCANLINE ; 

But doesn’t:

// add some beam dynamics
    float SCANLINE  = mix(SCANLOW,SCANHIGH,l);
    float SCANLINEM = 1.0-SCANLINE;
     res *= SCANLINE*sin(fract(    OGL2Pos)*pi)+SCANLINEM ; 
// add oversample here     
     res *= SCANLINE*sin(fract(1.0-OGL2Pos)*pi)+SCANLINEM ; 
     res *= SCANLINE*sin(fract(1.0+OGL2Pos)*pi)+SCANLINEM ; 

Anyway I could reuse your optimized Warp() function and now my shader runs at around 60fps :smiley: I ditched corners and hardcoded some parameters though but could keep everything else, vignetting, temperature, blurriness (kinda s-video emulation), etc

One thing to note is that the left side is not clamped so pixels are dilated instead of ‘warp’ alpha masked. Also I think I kinda prefer now your mask effect (TVL), they are wider.

Both shader seem to run at average around 58fps or so, prolly some further optimization is on due.

2 Likes

I think you must remove color temp too, it looked way too Blue at least on previous version that I checked.

2 Likes

The opposite happens to me. You need your eyes to adapt to the new “white”, when that happens everything not corrected will look too yellow. You can tune the white point in Grade but unfortunately here I have to hard code it. Here are other screenshots where at least for me Japanese games look better at 8500K. Something not implemented here though is the saturation loss that should happen in CRTs, at least 10% down, that also helps to sell the temperature change.

EDIT: You are right, it’s too blue! I will revise the joint matrix.

On another note if someone can maybe test this shader in platforms between 9GFlops to 15GFlops that would be welcome to see if we need to optimize it further. And if it’s ok to upload this to official repo or not under zfast_crt_geo or another name.

EDIT2: Updated with more optimizations and the matrix fix. Here’s an screenshot. As you see the last missing bit is the mask for the warp distortion.

2 Likes

I would use the scanlines I gave you, imo looking better, but these look good too.

Slower because “res” is an LCD pixel, GPU starts calculating what to do with it, so it starts, sees a SCANLINE, it calculates it with “l” (luminance) in mind, then later it sees a float so it jumps out and re-calculates the SCANLINE because you took it out :stuck_out_tongue:

It sees another float, it’s something different, it guesses so, the compiler is not smart, so it has to re-calculate.

Maybe, I liked the mask more I was already working with zfast which is good, I specially liked how elegant the horizontal blur was, it uses a Quilez algo that if I’m not wrong it resembles a kawase blur which is fast, so in some way it emulates an S-Video kind of signal. I think your snippet was too sharp, maybe good for a PVM?

Anyway I implemented an automatic check for UHD Displays, so in those cases the mask size is increased, you just have to be aware that more pixels means slower, so this is targeted to 1080p displays max (or render to 1080p within RA), or larger displays/more capable hardware but you want to save on battery (Switch, SteamDeck, etc).

The screen curvature mask is also fixed and the scanline is now performed on linear light which allegedly looks better.

This is the new link. Most likely the last version. If one was to use this for Genesis games in its correct DAR of 10:7 it would use too many pixels to be performant on the Chromecast (~45fps), probably the screen distortion function should be removed to target 60fps.

1 Like

I believe that’s more like rgb than s-video. S-video should be more blurry with artifacts. Also the colors look a bit desaturated (?). Scanlines shape looks common all around without any “shape change” due to bright emulated pixels. More like a high-end consumer TV. Definetely needs some extra saturation.

This is a screen cap from my UHD monitor:

I don’t see any saturation loss, just darkening due to scanlines+mask which would seem like saturation is decreased. Also matrixing phosphor gamut may look as colors are not as pure as before but that’s how it was in reality, moreover chroma signal was attenuated so rarely full saturation was achieved in composite or s-video. I should decrease saturation 10%, but for performance I’m not doing it here, could be implemented in the joint matrix though as a RGB -> YUV -> Desaturation -> RGB -> Phosphor -> Temperature. But I’m fine with this already, need to move on.

EDIT: Updated zfast_crt_nogeo with the screen curvature removed. This is faster so it can run 320x240 “widescreen” games like Sega Genesis, NeoGeo, etc.

If you think about it adding gray lines (scanlines) and gray mask desaturates the image.

1 Like

I found some shaders on my PC which I forgot about; crt-pi-curvature and zfast_crt_curve from SoltanGris42. Anyway they create moiree on my 1080p TV.

I also tested 4:3 DAR on some SNES games like Street Fighter, F-Zero, etc. Unfortunately it’s sub 60fps (~55-59), this is still faster than Sega Genesis 10:7 (~45fps) probably due to snes9x being a faster core. I didn’t find an improvement by going down to snes9x 2010 or 2005 though.

Lastly, I tested enabling in snes9x core filters the Blarrg Composite or S-Video filters, they don’t slow down the performance BUT for some reason it mangles the framebuffer in such way that the shader scanlines+mask now don’t work as intended.

EDIT: Just updated the shader again. I implemented back round corners without a hit in performance, and exposed scanline amount, mind you if you raise scanlines too much it will create moiree.

This is how it’s looking now. 60fps on Chromecast HD

5 Likes