ScaleNx - Artifact Removal and Algorithm Improvement


It doesn’t look like anything I’d associate with interpolation in RGB but if you want to experiment with other color spaces you could bake all the transforms down to 3D LUTs which ought to be cheaper than some transforms.

Here’s RGB to LAB and here’s LAB to RGB

Blues come out a little different with these though so these files themselves aren’t worth much more than a proof of concept. Really it’d need some kind of non-linear look up since conversion from sRGB to something like JaCbC with the ciecam02 Photoshop plugin results in most of the values concentrated in the center, and this gets lost in a 32 or 16 pixel 3D LUT.

Reckon it’s a matter of making a curve what evens out the distribution of values, applying that to a giant 3D LUT like this after it’s been put through the color space conversion, and then baking that down to a size better suited to video games, especially with rAA running, and implement that curve into the look up coordinates so it all cancels out, offering higher detail in the midtones area without boosting the 3D LUT size to an absurd level needed for the tight midrange distribution of the sRGB -> JaCbC conversion.

Edit: I could just bake all that mess into the second conversion back from it, though with this method it’s no longer the JaCbC color space; It’s JaCbC with compressed chroma coordinates. It converts in and out well enough though, it’s just no longer the same usefulness as JaCbC. Here’s the LUTs in an Imgur album though for funsies.


ok, so here is the thing. don’t know if you looked into the original rAA shader which this one is based on but the filter is not a warp filter but more like a traditional upscaler think like bicubic. but with adaptive limits so that hard edges are affected less. this way you still end up with smooth color transitions instead of just thinning lines.

seems you were right, color space doesn’t seem to be the issue here. so I tested around a little bit and it seems the default sharpness value causes the transition to overshoot. let’s have a look at the RGB values in the original image of the eyebrow from up to down: (0,0,0), (115,25,0), (247,123,41). the tilt vector is big enough so that the upper subpixel becomes black. the lower subpixel becomes (228,58,0) though which isn’t really inbetween these 2 pixels. red is rather big due to the high sharpness value. on the other hand green is not because how the filter limited the tilt vector due to the difference it determined to the upper pixel where it is only 25. this causes a color which doesn’t really fit in.

try and set the sharpness to 1.0 or less and it shouldn’t be an issue anymore. I don’t know if I should lower the default value though. while it can cause these kind of issues the sharpness parameter also helps with the erosion effect. in the original rAA implementation the default value was 2.0. would need to see how the general effect with other content is if it is 1.0 instead. maybe there is something I can do with how the adaptive limitation works to avoid this even with higher sharpness values.


Taking the absolute difference between pre and post rAA both converted to YIQ with Y ignored so it’s only looking at the IQ chroma and then using a float of I times 0.5 + Q times 0.5 to mix the pre-rAA into the post-rAA when it exceeds a certain level causes the glowing parts to fade away quickly while the smoother areas stick around longer. This has an alright effect for the nearest neighbor Shantae at least. The difference between the difference of the burnt areas and the fine areas is a very small number but it’s relatively large.

The root of the idea, limiting the chroma change, could be done like avg(abs(pre.i-post.i),abs(pre.q-post.q)) and then using Inigo Quilez’s idea for smooth thresholds with a cubic polynomial to mix back to the original smoothly yet swiftly.

Though there can be situations that call for sharpened edges in textures and such, so perhaps using a highpass of the pre and post for the chroma change detection would be better for those niche situations, just to locally normalize the image. Maybe a bilinear downscaled image would work fast enough and well enough. If it isn’t enough there’s an extremely good gaussian blur, far surpassing Kawase in performance and appearance, from Sunset Lake Software.

tl;dr localized chroma spikes drive the mix between pre and post rAA toward pre.


Though it’d probably be better to set up a debug shader that outputs some different variables to the screen to see which values are at an extreme in the problem areas and then know what could be squished down to avoid the burned colors. I wouldn’t know which ones to check in rAA post 3x though.


rAA determines the tilt t of a pixel which is then added or subtracted depending on the direction to calculate the subpixel. since the algorithm limits t channel wise (depending on the difference vectors with the neighbor pixels) it can change the direction of the vector. might be better to find a common factor a so that a*t satisfies the limits for every channel. this way it only alters the length. but this could easily cause the vector to become trivial if one of the limits forces the absolute value to be very small in comparison to the others. that’s just me thinking loud, haven’t tested it so far. I’m open for other suggestions.

edit: so finding a general factor like described above does indeed fix the issue. but like expected in some cases pixels are now completely unaffected because a = 0. but I’m on the right track. I think I scale the factor depending on how much t overshoots m.

as a reminder, this is the core function of rAA which my post-3x variant of the filter is also using:

float3 res2x(float3 pre2, float3 pre1, float3 px, float3 pos1, float3 pos2)
    float3 t, m;
    float4x3 pre = float4x3(pre2, pre1,   px, pos1);
    float4x3 pos = float4x3(pre1,   px, pos1, pos2);
    float4x3  df = pos - pre;

    m = (px < 0.5) ? px : (1.0-px);
    m = SHARPNESS * min(m, min(abs(df[1]), abs(df[2])));
    t = (7 * (df[1] + df[2]) - 3 * (df[0] + df[3])) / 16;
    t = clamp(t, -m, m);

    return t;

px is the pixel we are currently looking at. pre1 and pre2 are the 2 left neighbors, post 1 and post2 the 2 right neighbors.

ps: here is the original article presenting the idea and an implementation by the author with some insightful comments

and that’s the retroarch cg implementation by Hyllian


I think I know what the problem is: min/max is per component of vectors.

min/maxing an RGB value gets you the min/max of each individual channel, not the whole color. This results in the color burn appearance in places like Shantae’s nose where new colors are made from the min/max of neighboring colors with extremes like 247 red, 0 green, 0 blue which didn’t previously exist. There was a 247,123,41, a 226,82,0, a 115,25,0, and a 0,0,0 in the area. None of those look like full-on red, but that appears to be what the per-component operations acting on the RGB values have created since it doesn’t look at RGB color’s 3 values as a whole.

min/max needs to work on the color as a whole, which can be subjective. I reckon a perceptual luminosity function might work well enough as a single value to determine which colors are the limits.

There may also be other things acting on RGB colors per-channel resulting in new colors with wide extremes like this.


yes, that’s what I meant with that the channel wise limiting of the tilt vector causes its direction to change. I’m currently looking into the solution I described above with finding one common factor for every channel so that the limits are satisfied, rather than doing it for each channel separately. while it does work, just doing this alone causes the tilt vector to become very small or vanish in some cases. for example in the shantae picture the eyebrows aren’t effected anymore at all because the tilt vector has a non trivial blue value but the difference vectors force the limit to be 0 so the overall length factor that is applied to all channels becomes 0 as well with this approach.

channel wise


overall factor


this might be fine though but I’m currently looking into scaling the factor depending on how big the overshoot is.


The eyebrows actually look smoother to me in the overall factor image. In the channel-wise image they look extremely sharp, especially compared to the rest of her face. If everything looks like that then it might be fine but as it stands, overall factor is my personal preference.

If the tilt vector is too small, and you want to limit overshoots, perhaps a parabolic curve or a plain sqrt or fast sqrt? Small values are pushed up and then taper out toward the max level. I’m not especially versed in this so I’m thinking about it like transient spikes in an audio signal that get tamed with a soft-clipper, that’s what it reminds me of and that’s how those get solved fast without its presence affecting the values around it.


thing is with the overall factor at least in this case the picture becomes very similar to the original unfiltered image so the filter loses its effect. if I’m gonna scale the length factor I’ll probably repurpose the sharpness parameter for that so the user can decide what he prefers.


Have you looked into mean curvature filters? I don’t know much about it, but from using them in GMIC in GIMP it appears to follow the flow of an image using tensors and smooth things along the path. It seems like the curvature-following aspect might be an alternate method with good results, it looks alright when I use it in GMIC on ScaleNx output. I imagine it’d be better if it followed the curvature of the image and pushed pixels outward tangentially from the curvature to push or blend the AA pixels back in.

Right now the best I can find for an implementation of it outside of GMIC is this Github project where it appears to be even better than mean-curvature and allegedly doesn’t calculate the curvature at all.


here is the new version:


I adjusted some minor things and renamed the “gradient protection” parameter to “smoothness”. but the major point is that the new version now prevents these weird colors shifts rAA sometimes produces that torridgristle reported. in small doses though it actually helps rAA achieving some of its erosion and blending effect. so there is a new parameter called “deviation”. it basically determines how high the error tolerance is.

here is an example. this is maybe not a game where you wanna use the filter but it demonstrates the issue well. notice the weird pixels in the faces of both characters.

old version


new version (deviation = 1.0)


let me show you the code:

float3 res2x(float3 pre2, float3 pre1, float3 px, float3 pos1, float3 pos2)
    float d1, d2, w;
    float3 a, m, t, t1, t2;
    float4x3 pre = float4x3(pre2, pre1,   px, pos1);
    float4x3 pos = float4x3(pre1,   px, pos1, pos2);
    float4x3  df = pos - pre;

    m = (px < 0.5) ? px : (1.0-px);
    m = RAA_SHR0 * min(m, min(abs(df[1]), abs(df[2])));   // magnitude
    t = (7 * (df[1] + df[2]) - 3 * (df[0] + df[3])) / 16; // tilt
    a = t == 0.0 ? 1.0 : m/abs(t);
    t1 = clamp(t, -m, m);                       // limit channels
    t2 = min(1.0, min(min(a.x, a.y), a.z)) * t; // limit length
    d1 = length(df[1]); d2 = length(df[2]);
    d1 = d1 == 0.0 ? 0.0 : length(cross(df[1], t1))/d1; // distance between line (px, pre1) and point px-t1
    d2 = d2 == 0.0 ? 0.0 : length(cross(df[2], t1))/d2; // distance between line (px, pos1) and point px+t1

    w = min(1.0, max(d1,d2)/0.8125); // color deviation from optimal value
    return lerp(t1, t2, pow(w, RAA_DVT0));

the tilt vector is now mixed between one that is simply clamped channel wise (t1) and one whose length is adjusted (t2) to satisfy the limits given by m. assuming that an optimal interpolation between px and pre1 is on the line (px, pre1), we can determine the error by calculating the distance from that line to the result produced by the tilt vector px-t1 by using some vector magic (*). we do the same with the right neighbor, pick the maximum and use that as the mix factor between t1 and t2 where we use our new parameter as an exponent to adjust the behavior. the default value is 1.0. if you insist on using this filter for systems such as GameBoy or NES you might wanna lower the value.

(*) reference:

@hunterk could you help me out again with the upload and shader conversion? uh, just noticed… you said there was something wrong with the slang version, is it still happening? btw there is still this bugfix in the vertex shader for AMD cards. I don’t know if this is still necessary, I just took that part over from Hyllian.


Sure, I’ll get it converted/updated. I don’t recall what the issue was. It might have been about slang not supporting the “passprev” stuff for putting other shaders in front of it (such as color modifiers, etc.), which I can work around by putting a stock pass as the first shader. I wasn’t sure if that was something enough people wanted.

EDIT: @Sp00kyFox I got it ported to GLSL with no issues and it looks very nice while also being significantly faster than the previous version. Adding a pass of FXAA after the rAA passes smooths out the rough edges quite a bit:

However, I ran into an issue when porting to slang, whereby all of the lines are fat:

Any advice? Pretty much the only difference between regular GLSL and slang has to do with the POT/NPOT textures and the resulting texCoords, so I figure it has something to do with that /shrug

The Cg version you posted unfortunately doesn’t work on my AMD (and probably Intel, too) GPU because it requires a higher Cg profile than they expose on non-Nvidia cards…


this happened to me once during development. I think it was connected to some variables becoming NaN. if you pm me the slang version I’ll take a look at it. regarding AMD compatibility, I don’t have a clue what to do there. any specific error messages? would be no problem to replace some functions or paradigms if I know which ones.

ps: this combination looks very nice. probably have to mention that you don’t wanna display rAA without any further steps. at least bilinear scaling to the target resolution is recommended.


Oh yeah, using straight-up bilinear looks quite similar without the added FXAA computation.

Here’s the slang version: pass0: pass1:

The errors I’m getting with the Cg version are: : warning C7050: "i2" might be used before being initialized

and : error C5043: profile requires index expression to be compile-time constant

The first one’s likely not a big deal and just requires some shuffling around, but I didn’t bother messing with it since the second was more serious.


for cg, try these. I bypassed the usage of dynamic array indices.


eyyy, success!

I’ll get the updated Cg and GLSL versions pushed up to the repos soon.

EDIT: I started getting the fat lines on a different GPU using the GLSL version, as well, which suggested that it could be a GPU-specific rounding issue (of course…). Poking around, it seems to be this line:

float dw = (i1.x == 0 || i1.y == 0) ? 0.0 : 2.0 * ((i1.x-1.0)/(i1.x+i1.y-2.0)) - 1.0;

Changing it to:

float dw = (i1.x == 0 || i1.y == 0) ? 0.0 : 2.0 * ((i1.x-0.9999)/(i1.x+i1.y-1.9999)) - 1.0;

gets rid of the fat lines, which works well for GLSL AFAICT but the slang version ends up having no effect at all :confused: (that is, the fat lines are gone but the output is identical to scalefx alone)


edit: never mind, it’s not a rounding issue. problem is that GPUs handle undefined variables differently. I simply assumed they would be 0 but that’s not the case. the dw line seems to be the culprit on first sight but unlike with cg I actually got an unstable image with slang (that is with a fixed test picture). the issue is that i1 and i2 aren’t properly initialized and have random memory values. try this instead:



glsl - just initialize i1 and i2 in their declaration line with zeros, see cg and slang

well turns out that warning we got from the cg version wasn’t pointless after all :sweat_smile:

ps: changing the dw line is not necessary since division by 0 can’t occur. i1.x==1 && i1.y==1 is contradictory.


w00t, success! I’ll get these pushed up to the repo ASAP.


Is it possible to use the slang version of scalefx with shaders before scalefx. For example using gdapt first and then scalefx. In the past there was the issue that this did not work (scale fx was not applied on the image). But due to the focus of some cores on vulkan like psx, it would be great to add the feature to the slang shanders as well.

Edit: hm seems not to work for vulkan. I would like to use dithering before scalefx. But the vulkan version has the issue which the gsls had moths before.

Edit2: hizzlekizzle and hunter updated the slang shader. Thank u so much!!! Now I can play PS1 games with the awesome shaders. Thanks again!