Accessing any output in multipass shaders

That sounds very strange. I don’t see why that should happen. I’ll see if I can reproduce it here. For reference, can you post the exact minimal test case that has this issue?

I noticed the same thing the other day but didn’t think anything of it. I figured it was just something weird I was doing, but it indeed shifted everything down by 1 pixel.

Please post screenshot and shader code. Tried some 2-pass tests doing ORIG - PASS1 (pass1 being dummy), and it’s a perfect overlap here :v

Ok, get here: https://anonfiles.com/file/044854f6f9329a7a141116d8957944a2

First test, put:

shader0: dummy.cg scale0:1x shader1: pass1 - dummy.cg scale1: don’t care

second test:

shader0: dummy.cg scale0: 1x shader1: pass1 - ORIG.cg scale1: don’t care

You’ll see a line will go down or up (can’t remember).

This was resolved on IRC yesterday. The issue stems from Cg “optimizing” vertex/frag struct for some reason leading to the wrong coordinate being used (very dangerous behavior, but oh well). Commenting out all non-used coords fixed it.

Maister,

I’m drawing an example of a SNES game passing through a shader. It’s something to understand better the meaning of some vars commonly used in cg shaders for Retroarch. I have some questions, though. Look at this:

About the var binded to TEXCOORD0 and passed to the shader:

  1. Is this normalized? (values between 0 and 1).

  2. Is this value referenced to the input texture origin [0;0]?

  3. What’s the difference between the var binded to TEXCOORD0 passed to the vertex and to the fragment?

  4. How many times vertex_main is called for each SNES frame in the example above?

I suppose I’ll have to write a more detailed post when I’m not sleepy.

Ok, so here’s how it all works. Dunno how much you actually know, but I’ll try to explain it all.

The pipeline for the shaders are: Vertex attribute streams ----> Vertex shader ----> Transformed position and other values (varyings) ----> Rasterizer fills in pixels and interpolates all the values that was passed from vertex shader ----> Fragment shader is invoked on every pixel with the correct interpolated values.

To be more concrete, let’s take the SNES example. The input frame is 256x224. This texture is uploaded into a POT texture to improve accuracy on texture coordinates, here, 256x256. We scale by 2x, so we render to 512x448. We make sure that texel (0, 0) corresponds to the top-left of the image.

This causes the IN.* parameters to be set like: IN.video_size = (256, 224) IN.texture_size = (256, 256) IN.output_size = (512, 448)

To blit a quad on the screen, we use a trianglestrip (because QUADs aren’t natively supported by modern GPUs). We need 4 vertex coordinates. Now a bit about vertex attribute streams. The vertex shader’s job is to eat data from vertex attribute streams, do some computation on it, and return:

  • Clip space position in gl_Position (POSITION semantic). This is done with modelViewProj * pos.
  • Optional varyings to the fragment shading stage. We have to pass texcoords somehow to the fragment shader as they have to be interpolated. By convention, texcoords are sent to vertex shader using TEXCOORD0 semantic.

The vertex position stream could look like: { (0, 0), (1, 0), (0, 1), (1, 1) } (triangle strip, looks like a ‘Z’ flipped horizontally)

Texcoord position streams could look like: float w = 256.0 / 256.0; float h = 224.0 / 256.0; { (0, h), (w, h), (0, 0), (w, 0) } (flipped vertically, this is to maintain top-left semantics. GL is bottom-left (mathematical convention).

The normalized texcoord are fairly simple. (0, 0) corresponds to the bottom-left of the texture, (1, 1) to top-right. It’s important to note that texel centers are in the center of each pixel. This is a critical realization. If your texture is 1x1, the pixel sampling point would be in (0.5, 0.5), similarily, a 2x2 texture would have sampling points at (0.25, 0.25), (0.25, 0.75), … etc. If you used bilinear and sampled at (0.5, 0.5) you’d get the average of the four pixels. The same is the case for screen coordinates.

So, the vertex shader is run 4 times, transforms the vertex positions to fill the viewport completely. Each vertex has varyings accociated with them. Often, it’s just a texture coordinate copied directly from the attribute stream (but GL doesn’t care that it’s a texture coordinate). When the rasterizer kicks in it’ll generate a fragment in the center of all screen pixels which are inside the two triangles forming a quad.

All the varyings accociated with each vertex will now be interpolated accordingly and the fragment shader is run for each of them. That’s why you can’t just read say, ORIG.tex_coord in the fragment shader. It only belongs in vertex, because there’s an attribute stream accociated with it. It cannot be read in fragment. You can only get interpolated values from the vertex stage.

How these interpolated values are generated?

I mean, what’s the difference between the texcoord got from ORIG and passed to fragment and the texcoord passed to fragment from IN texture?

I’m not sure what you mean, ORIG.tex_coord and TEXCOORD0 are the same kind of thing, *.tex_coord stuff is bound by name, TEXCOORD0, is bound by semantic. They’re still just attribute streams.

In a triangle, every vertex has varying data associated with it, say, a texcoord. When that triangle is being rasterized, it’ll just interpolate from these three vertices.

http://www.arcsynthesis.org/gltut/Basics/Tut02%20Vertex%20Attributes.html explains it better

Tks for the link. I suppose I need some theory before asking what I need. :stuck_out_tongue:

ORIG and IN textures will in most cases not be filled to the same degree, so in ORIG they might range from 0/0 to 0.5/0.4375, while IN could be from 0/0 to 1/0.875.

I don’t understand yet.

I’ll put my problem here:

1st pass shader:


/*
   Hyllian's xBR LVL1 pass0 beta
   
   Copyright (C) 2011/2012 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float coef           = 2.0;
const static float4 eq_threshold  = float4(15.0);
const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.3);

//const static float4 xbr_info  = float4(0.01, 0.02, 0.04, 0.08);


float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}




bool4 eq(float4 A, float4 B)
{
    return (df(A, B) < eq_threshold);
}

float4 weighted_distance(float4 a, float4 b, float4 c, float4 d, float4 e, float4 f, float4 g, float4 h)
{
    return (df(a,b) + df(a,c) + df(d,e) + df(d,f) + 4.0*df(g,h));
}



struct input
{
    float2 video_size;
    float2 texture_size;
    float2 output_size;
};


struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t1;
    float4 t2;
    float4 t3;
    float4 t4;
    float4 t5;
    float4 t6;
    float4 t7;
};

/*    VERTEX_SHADER    */
out_vertex main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    uniform input IN
)
{
    out_vertex OUT;

    OUT.position = mul(modelViewProj, position);
    OUT.color = color;

    float2 ps = float2(1.0/IN.texture_size.x, 1.0/IN.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    OUT.t1 = texCoord.xxxy + float4( -dx, 0, dx,-2.0*dy); // A1 B1 C1
    OUT.t2 = texCoord.xxxy + float4( -dx, 0, dx,    -dy); //  A  B  C
    OUT.t3 = texCoord.xxxy + float4( -dx, 0, dx,      0); //  D  E  F
    OUT.t4 = texCoord.xxxy + float4( -dx, 0, dx,     dy); //  G  H  I
    OUT.t5 = texCoord.xxxy + float4( -dx, 0, dx, 2.0*dy); // G5 H5 I5
    OUT.t6 = texCoord.xyyy + float4(-2.0*dx,-dy, 0,  dy); // A0 D0 G0
    OUT.t7 = texCoord.xyyy + float4( 2.0*dx,-dy, 0,  dy); // C4 F4 I4

    return OUT;
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex VAR, uniform sampler2D decal : TEXUNIT0) : COLOR
{
    bool4 edr;
    bool4 interp_restriction_lv1;


    float3 A1 = tex2D(decal, VAR.t1.xw).rgb;
    float3 B1 = tex2D(decal, VAR.t1.yw).rgb;
    float3 C1 = tex2D(decal, VAR.t1.zw).rgb;

    float3 A  = tex2D(decal, VAR.t2.xw).rgb;
    float3 B  = tex2D(decal, VAR.t2.yw).rgb;
    float3 C  = tex2D(decal, VAR.t2.zw).rgb;

    float3 D  = tex2D(decal, VAR.t3.xw).rgb;
    float3 E  = tex2D(decal, VAR.t3.yw).rgb;
    float3 F  = tex2D(decal, VAR.t3.zw).rgb;

    float3 G  = tex2D(decal, VAR.t4.xw).rgb;
    float3 H  = tex2D(decal, VAR.t4.yw).rgb;
    float3 I  = tex2D(decal, VAR.t4.zw).rgb;

    float3 G5 = tex2D(decal, VAR.t5.xw).rgb;
    float3 H5 = tex2D(decal, VAR.t5.yw).rgb;
    float3 I5 = tex2D(decal, VAR.t5.zw).rgb;

    float3 A0 = tex2D(decal, VAR.t6.xy).rgb;
    float3 D0 = tex2D(decal, VAR.t6.xz).rgb;
    float3 G0 = tex2D(decal, VAR.t6.xw).rgb;

    float3 C4 = tex2D(decal, VAR.t7.xy).rgb;
    float3 F4 = tex2D(decal, VAR.t7.xz).rgb;
    float3 I4 = tex2D(decal, VAR.t7.xw).rgb;

    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 c = mul( float4x3(C, A, G, I), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 g = c.zwxy;
    float4 h = b.zwxy;
    float4 i = c.wxyz;

    float4 i4 = mul( float4x3(I4, C1, A0, G5), yuv_weighted[0] );
    float4 i5 = mul( float4x3(I5, C4, A1, G0), yuv_weighted[0] );
    float4 h5 = mul( float4x3(H5, F4, B1, D0), yuv_weighted[0] );
    float4 f4 = h5.yzwx;

    interp_restriction_lv1      = ((e!=f) && (e!=h)  && ( !eq(f,b) && !eq(f,c) || !eq(h,d) && !eq(h,g) || eq(e,i) && (!eq(f,f4) && !eq(f,i4) || !eq(h,h5) && !eq(h,i5)) || eq(e,g) || eq(e,c)) );

    edr      = (weighted_distance( e, c, g, i, h5, f4, h, f) < weighted_distance( h, d, i5, f, i4, b, e, i)) && interp_restriction_lv1;

         if (edr.x == false && edr.y == false && edr.z == false && edr.w == false) return float4(0.0, 0.0, 0.0, 1.0);
    else if (edr.x == false && edr.y == false && edr.z == false && edr.w == true ) return float4(0.0, 0.2, 0.0, 1.0);
    else if (edr.x == false && edr.y == false && edr.z == true  && edr.w == false) return float4(0.0, 0.4, 0.0, 1.0);
    else if (edr.x == false && edr.y == false && edr.z == true  && edr.w == true ) return float4(0.0, 0.6, 0.0, 1.0);

    else if (edr.x == false && edr.y == true  && edr.z == false && edr.w == false) return float4(0.2, 0.0, 0.0, 1.0);
    else if (edr.x == false && edr.y == true  && edr.z == false && edr.w == true ) return float4(0.2, 0.2, 0.0, 1.0);
    else if (edr.x == false && edr.y == true  && edr.z == true  && edr.w == false) return float4(0.2, 0.4, 0.0, 1.0);
    else if (edr.x == false && edr.y == true  && edr.z == true  && edr.w == true ) return float4(0.2, 0.6, 0.0, 1.0);

    else if (edr.x == true  && edr.y == false && edr.z == false && edr.w == false) return float4(0.4, 0.0, 0.0, 1.0);
    else if (edr.x == true  && edr.y == false && edr.z == false && edr.w == true ) return float4(0.4, 0.2, 0.0, 1.0);
    else if (edr.x == true  && edr.y == false && edr.z == true  && edr.w == false) return float4(0.4, 0.4, 0.0, 1.0);
    else if (edr.x == true  && edr.y == false && edr.z == true  && edr.w == true ) return float4(0.4, 0.6, 0.0, 1.0);

    else if (edr.x == true  && edr.y == true  && edr.z == false && edr.w == false) return float4(0.6, 0.0, 0.0, 1.0);
    else if (edr.x == true  && edr.y == true  && edr.z == false && edr.w == true ) return float4(0.6, 0.2, 0.0, 1.0);
    else if (edr.x == true  && edr.y == true  && edr.z == true  && edr.w == false) return float4(0.6, 0.4, 0.0, 1.0);
    else if (edr.x == true  && edr.y == true  && edr.z == true  && edr.w == true ) return float4(0.6, 0.6, 0.0, 1.0);
    else return float4(1.0, 1.0, 1.0, 1.0);
}

2nd pass shader:


/*
   Hyllian's xBR LVL1 pass1 beta
   
   Copyright (C) 2011/2012 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.4, 0.4, 0.4, 0.4);

float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}


struct orig
{
    float2 tex_coord;
    uniform float2 texture_size;
    uniform sampler2D texture;
};



struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t2;
    float4 t3;
    float4 t4;
    float2 pos;
    float2 orig_tex;
};

/*    VERTEX_SHADER    */
out_vertex main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    orig ORIG
)
{
    out_vertex OUT;

    OUT.position = mul(modelViewProj, position);
    OUT.color = color;

    float2 ps = float2(1.0/ORIG.texture_size.x, 1.0/ORIG.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    OUT.texCoord = texCoord;
    OUT.orig_tex = ORIG.tex_coord;
    OUT.t2 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,  -dy); //  A  B  C
    OUT.t3 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,    0); //  D  E  F
    OUT.t4 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,   dy); //  G  H  I
    OUT.pos= ORIG.tex_coord*ORIG.texture_size;

    return OUT;
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex co, uniform sampler2D decal : TEXUNIT0, orig ORIG) : COLOR
{
    bool4 edr, px; // px = pixel, edr = edge detection rule
    bool4 nc45; // new_color
    float4 fx; // inequations of straight lines.
    float3 res1, pix1;
    float blend1;

    float2 fp = frac(co.pos);


    float3 B  = tex2D(ORIG.texture, co.t2.yw).rgb;
    float3 D  = tex2D(ORIG.texture, co.t3.xw).rgb;
    float3 E  = tex2D(ORIG.texture, co.t3.yw).rgb;
    float3 F  = tex2D(ORIG.texture, co.t3.zw).rgb;
    float3 H  = tex2D(ORIG.texture, co.t4.yw).rgb;

    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 h = b.zwxy;

    float4 Ao = float4( 1.0, -1.0, -1.0, 1.0 );
    float4 Bo = float4( 1.0,  1.0, -1.0,-1.0 );
    float4 Co = float4( 1.5,  0.5, -0.5, 0.5 );

    // These inequations define the line below which interpolation occurs.
    fx      = (Ao*fp.y+Bo*fp.x); 

    float4 fx45 = smoothstep(Co - delta, Co + delta, fx);


    float4 info  = tex2D(decal, co.texCoord);

         if (info.x == 0.0 && info.y == 0.0) edr = bool4(false, false, false, false);
    else if (info.x == 0.0 && info.y == 0.2) edr = bool4(false, false, false, true );
    else if (info.x == 0.0 && info.y == 0.4) edr = bool4(false, false, true , false);
    else if (info.x == 0.0 && info.y == 0.6) edr = bool4(false, false, true , true );

    else if (info.x == 0.2 && info.y == 0.0) edr = bool4(false, true , false, false);
    else if (info.x == 0.2 && info.y == 0.2) edr = bool4(false, true , false, true );
    else if (info.x == 0.2 && info.y == 0.4) edr = bool4(false, true , true , false);
    else if (info.x == 0.2 && info.y == 0.6) edr = bool4(false, true , true , true );

    else if (info.x == 0.4 && info.y == 0.0) edr = bool4(true , false, false, false);
    else if (info.x == 0.4 && info.y == 0.2) edr = bool4(true , false, false, true );
    else if (info.x == 0.4 && info.y == 0.4) edr = bool4(true , false, true , false);
    else if (info.x == 0.4 && info.y == 0.6) edr = bool4(true , false, true , true );

    else if (info.x == 0.6 && info.y == 0.0) edr = bool4(true , true , false, false);
    else if (info.x == 0.6 && info.y == 0.2) edr = bool4(true , true , false, true );
    else if (info.x == 0.6 && info.y == 0.4) edr = bool4(true , true , true , false);
    else if (info.x == 0.6 && info.y == 0.6) edr = bool4(true , true , true , true );
    else edr = bool4(false , false , false , false );
    

    nc45 = ( edr && bool4(fx45));

    px = (df(e,f) <= df(e,h));


    float4 maximo = nc45*fx45;


         if (nc45.x) {pix1 = px.x ? F : H; blend1 = maximo.x;}
    else if (nc45.y) {pix1 = px.y ? B : F; blend1 = maximo.y;}
    else if (nc45.z) {pix1 = px.z ? D : B; blend1 = maximo.z;}
    else if (nc45.w) {pix1 = px.w ? H : D; blend1 = maximo.w;}
    else {pix1 = E; blend1 = 0.0;}

    float3 res = lerp(E, pix1, blend1);

    return float4(res, 1.0);
}


What I’m trying to do: break a xbr shader in two passes. The first one calculates the four edges of each pixel and pack them in the color output components (only use .r and .g, the others are ignored). In the second pass, I need to read from ORIG to get the correct colors from the central pixel and some of its neighbors. And from the IN struct to get the info (packed in colors) from the 1st pass output.

If I run just the first pass, it works as expected, because the image gets black, green and red only. But, for some reason, when I run the two passes, the info read in the second pass isn’t right. This why I think, for some reason, I’m not using the texcoords correctly…

EUREKA! AT LAST!!

I didn’t find the bug in the last shaders, then I changed the way I pack the info from first pass and now it works. Though some strange thing happens (see comment in the 2nd pass code).

This is a LVL1 xBR broken in two passes. It isn’t optimized yet, though it already run very fast!

1st pass: (use 1x and nearest)


/*
   Hyllian's xBR LVL1 pass0 beta
   
   Copyright (C) 2011/2012 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float coef           = 2.0;
const static float4 eq_threshold  = float4(15.0);
const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.3);



float remapTo01(float v, float low, float high)
{
    return saturate((v - low)/(high-low));
}


float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}




bool4 eq(float4 A, float4 B)
{
    return (df(A, B) < eq_threshold);
}

float4 weighted_distance(float4 a, float4 b, float4 c, float4 d, float4 e, float4 f, float4 g, float4 h)
{
    return (df(a,b) + df(a,c) + df(d,e) + df(d,f) + 4.0*df(g,h));
}



struct input
{
    float2 video_size;
    float2 texture_size;
    float2 output_size;
};


struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t1;
    float4 t2;
    float4 t3;
    float4 t4;
    float4 t5;
    float4 t6;
    float4 t7;
};

/*    VERTEX_SHADER    */
out_vertex main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    uniform input IN
)
{
    out_vertex OUT;

    OUT.position = mul(modelViewProj, position);
    OUT.color = color;

    float2 ps = float2(1.0/IN.texture_size.x, 1.0/IN.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    OUT.t1 = texCoord.xxxy + float4( -dx, 0, dx,-2.0*dy); // A1 B1 C1
    OUT.t2 = texCoord.xxxy + float4( -dx, 0, dx,    -dy); //  A  B  C
    OUT.t3 = texCoord.xxxy + float4( -dx, 0, dx,      0); //  D  E  F
    OUT.t4 = texCoord.xxxy + float4( -dx, 0, dx,     dy); //  G  H  I
    OUT.t5 = texCoord.xxxy + float4( -dx, 0, dx, 2.0*dy); // G5 H5 I5
    OUT.t6 = texCoord.xyyy + float4(-2.0*dx,-dy, 0,  dy); // A0 D0 G0
    OUT.t7 = texCoord.xyyy + float4( 2.0*dx,-dy, 0,  dy); // C4 F4 I4

    return OUT;
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex VAR, uniform sampler2D decal : TEXUNIT0) : COLOR
{
    bool4 edr;
    bool4 interp_restriction_lv1;


    float3 A1 = tex2D(decal, VAR.t1.xw).rgb;
    float3 B1 = tex2D(decal, VAR.t1.yw).rgb;
    float3 C1 = tex2D(decal, VAR.t1.zw).rgb;

    float3 A  = tex2D(decal, VAR.t2.xw).rgb;
    float3 B  = tex2D(decal, VAR.t2.yw).rgb;
    float3 C  = tex2D(decal, VAR.t2.zw).rgb;

    float3 D  = tex2D(decal, VAR.t3.xw).rgb;
    float3 E  = tex2D(decal, VAR.t3.yw).rgb;
    float3 F  = tex2D(decal, VAR.t3.zw).rgb;

    float3 G  = tex2D(decal, VAR.t4.xw).rgb;
    float3 H  = tex2D(decal, VAR.t4.yw).rgb;
    float3 I  = tex2D(decal, VAR.t4.zw).rgb;

    float3 G5 = tex2D(decal, VAR.t5.xw).rgb;
    float3 H5 = tex2D(decal, VAR.t5.yw).rgb;
    float3 I5 = tex2D(decal, VAR.t5.zw).rgb;

    float3 A0 = tex2D(decal, VAR.t6.xy).rgb;
    float3 D0 = tex2D(decal, VAR.t6.xz).rgb;
    float3 G0 = tex2D(decal, VAR.t6.xw).rgb;

    float3 C4 = tex2D(decal, VAR.t7.xy).rgb;
    float3 F4 = tex2D(decal, VAR.t7.xz).rgb;
    float3 I4 = tex2D(decal, VAR.t7.xw).rgb;

    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 c = mul( float4x3(C, A, G, I), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 g = c.zwxy;
    float4 h = b.zwxy;
    float4 i = c.wxyz;

    float4 i4 = mul( float4x3(I4, C1, A0, G5), yuv_weighted[0] );
    float4 i5 = mul( float4x3(I5, C4, A1, G0), yuv_weighted[0] );
    float4 h5 = mul( float4x3(H5, F4, B1, D0), yuv_weighted[0] );
    float4 f4 = h5.yzwx;

    interp_restriction_lv1      = ((e!=f) && (e!=h)  && ( !eq(f,b) && !eq(f,c) || !eq(h,d) && !eq(h,g) || eq(e,i) && (!eq(f,f4) && !eq(f,i4) || !eq(h,h5) && !eq(h,i5)) || eq(e,g) || eq(e,c)) );

    edr      = (weighted_distance( e, c, g, i, h5, f4, h, f) < weighted_distance( h, d, i5, f, i4, b, e, i)) && interp_restriction_lv1;

    
    float info = dot(float4(edr), float4(8.0f, 4.0f, 2.0f, 1.0f));

    return float4(remapTo01(info, 0.0f, 15.0f), 0.0, 0.0, 1.0);
}


2nd pass: (use don’t care and nearest)


/*
   Hyllian's xBR LVL1 pass1 beta
   
   Copyright (C) 2011/2012 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.4, 0.4, 0.4, 0.4);

bool4 EDR[16] =
{
   { false, false, false, false },
   { false, false, false, true  },
   { false, false, true , false },
   { false, false, true , true  },
   { false, true , false, false },
   { false, true , false, true  },
   { false, true , true , false },
   { false, true , true , true  },
   { true , false, false, false },
   { true , false, false, true  },
   { true , false, true , false },
   { true , false, true , true  },
   { true , true , false, false },
   { true , true , false, true  },
   { true , true , true , false },
   { true , true , true , true  },
};


float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}

float remapTo01(float v, float low, float high)
{
    return saturate((v - low)/(high-low));
}

float remapFrom01(float v, float low, float high)
{
    return lerp(low, high, v);
}


struct orig
{
    float2 tex_coord;
    uniform float2 texture_size;
    uniform sampler2D texture;
};


struct input
{
  float2 video_size;
  float2 texture_size;
  float2 output_size;
  float frame_count;
  float frame_direction;
  float frame_rotation;
};



struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t2;
    float4 t3;
    float4 t4;
    float2 orig_tex;
};

/*    VERTEX_SHADER    */
void main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    orig ORIG,
    out out_vertex co
)
{
    co.position = mul(modelViewProj, position);
    co.color = color;

    float2 ps = float2(1.0/ORIG.texture_size.x, 1.0/ORIG.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    co.texCoord = texCoord;
    co.orig_tex = ORIG.tex_coord;
    co.t2 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,  -dy); //  A  B  C
    co.t3 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,    0); //  D  E  F
    co.t4 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,   dy); //  G  H  I
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex co, uniform sampler2D decal : TEXUNIT0, orig ORIG, uniform input IN) : COLOR
{
    bool4 edr, px; // px = pixel, edr = edge detection rule
    bool4 nc45; // new_color
    float4 fx; // inequations of straight lines.
    float3 res1, pix1;
    float blend1;

    float2 fp = frac(co.texCoord*IN.texture_size);


    float3 B  = tex2D(ORIG.texture, co.t2.yw).rgb;
    float3 D  = tex2D(ORIG.texture, co.t3.xw).rgb;
    float3 E  = tex2D(ORIG.texture, co.t3.yw).rgb;
    float3 F  = tex2D(ORIG.texture, co.t3.zw).rgb;
    float3 H  = tex2D(ORIG.texture, co.t4.yw).rgb;


    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 h = b.zwxy;

    float4 Ao = float4( 1.0, -1.0, -1.0, 1.0 );
    float4 Bo = float4( 1.0,  1.0, -1.0,-1.0 );
    float4 Co = float4( 1.5,  0.5, -0.5, 0.5 );

    // These inequations define the line below which interpolation occurs.
    fx      = (Ao*fp.y+Bo*fp.x); 

    float4 fx45 = smoothstep(Co - delta, Co + delta, fx);


    float4 info  = tex2D(decal, co.texCoord);

    int i = remapFrom01(info.x, 0.0f, 16.0f); // Really don't know why 15.0f instead of 16.0f doesn't work! WTF!?

         if (i ==  0) edr = EDR[ 0];
    else if (i ==  1) edr = EDR[ 1];
    else if (i ==  2) edr = EDR[ 2];
    else if (i ==  3) edr = EDR[ 3];

    else if (i ==  4) edr = EDR[ 4];
    else if (i ==  5) edr = EDR[ 5];
    else if (i ==  6) edr = EDR[ 6];
    else if (i ==  7) edr = EDR[ 7];

    else if (i ==  8) edr = EDR[ 8];
    else if (i ==  9) edr = EDR[ 9];
    else if (i == 10) edr = EDR[10];
    else if (i == 11) edr = EDR[11];

    else if (i == 12) edr = EDR[12];
    else if (i == 13) edr = EDR[13];
    else if (i == 14) edr = EDR[14];
    else if (i == 15) edr = EDR[15];


    nc45 = ( edr && bool4(fx45));

    px = (df(e,f) <= df(e,h));


    float4 maximo = nc45*fx45;


         if (nc45.x) {pix1 = px.x ? F : H; blend1 = maximo.x;}
    else if (nc45.y) {pix1 = px.y ? B : F; blend1 = maximo.y;}
    else if (nc45.z) {pix1 = px.z ? D : B; blend1 = maximo.z;}
    else if (nc45.w) {pix1 = px.w ? H : D; blend1 = maximo.w;}
    else {pix1 = E; blend1 = 0.0;}

    float3 res = lerp(E, pix1, blend1);

    return float4(res, 1.0);
}

Congratulations, dude :smiley:

It looks great!

Tks!

Now a LVL2 in two passes. It’s already faster than the one pass and is unoptimized! Run fullscreen on PS3 at 60fps!!!

I’ll optimize it a bit more before uploading to the repo. And I have some ideas I need to test.

It’s a xBR LVL2 3.8c:

1st pass (use 1x and nearest)


/*
   Hyllian's xBR v3.8c (squared) Shader
   
   Copyright (C) 2011/2013 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float coef           = 2.0;
const static float4 eq_threshold  = float4(15.0);
const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.4);


float3 remapTo01(float3 v, float3 low, float3 high)
{
    return saturate((v - low)/(high-low));
}


float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}

float c_df(float3 c1, float3 c2) {
                        float3 df = abs(c1 - c2);
                        return df.r + df.g + df.b;
                }




bool4 eq(float4 A, float4 B)
{
    return (df(A, B) < eq_threshold);
}

float4 weighted_distance(float4 a, float4 b, float4 c, float4 d, float4 e, float4 f, float4 g, float4 h)
{
    return (df(a,b) + df(a,c) + df(d,e) + df(d,f) + 4.0*df(g,h));
}



struct input
{
    float2 video_size;
    float2 texture_size;
    float2 output_size;
};


struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t1;
    float4 t2;
    float4 t3;
    float4 t4;
    float4 t5;
    float4 t6;
    float4 t7;
};

/*    VERTEX_SHADER    */
out_vertex main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    uniform input IN
)
{
    out_vertex OUT;

    OUT.position = mul(modelViewProj, position);
    OUT.color = color;

    float2 ps = float2(1.0/IN.texture_size.x, 1.0/IN.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    OUT.texCoord = texCoord;
    OUT.t1 = texCoord.xxxy + float4( -dx, 0, dx,-2.0*dy); // A1 B1 C1
    OUT.t2 = texCoord.xxxy + float4( -dx, 0, dx,    -dy); //  A  B  C
    OUT.t3 = texCoord.xxxy + float4( -dx, 0, dx,      0); //  D  E  F
    OUT.t4 = texCoord.xxxy + float4( -dx, 0, dx,     dy); //  G  H  I
    OUT.t5 = texCoord.xxxy + float4( -dx, 0, dx, 2.0*dy); // G5 H5 I5
    OUT.t6 = texCoord.xyyy + float4(-2.0*dx,-dy, 0,  dy); // A0 D0 G0
    OUT.t7 = texCoord.xyyy + float4( 2.0*dx,-dy, 0,  dy); // C4 F4 I4

    return OUT;
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex VAR, uniform sampler2D decal : TEXUNIT0, uniform input IN) : COLOR
{
    bool4 edr, edr_left, edr_up; 
    bool4 interp_restriction_lv1, interp_restriction_lv2_left, interp_restriction_lv2_up;


    float3 A1 = tex2D(decal, VAR.t1.xw).rgb;
    float3 B1 = tex2D(decal, VAR.t1.yw).rgb;
    float3 C1 = tex2D(decal, VAR.t1.zw).rgb;

    float3 A  = tex2D(decal, VAR.t2.xw).rgb;
    float3 B  = tex2D(decal, VAR.t2.yw).rgb;
    float3 C  = tex2D(decal, VAR.t2.zw).rgb;

    float3 D  = tex2D(decal, VAR.t3.xw).rgb;
    float3 E  = tex2D(decal, VAR.t3.yw).rgb;
    float3 F  = tex2D(decal, VAR.t3.zw).rgb;

    float3 G  = tex2D(decal, VAR.t4.xw).rgb;
    float3 H  = tex2D(decal, VAR.t4.yw).rgb;
    float3 I  = tex2D(decal, VAR.t4.zw).rgb;

    float3 G5 = tex2D(decal, VAR.t5.xw).rgb;
    float3 H5 = tex2D(decal, VAR.t5.yw).rgb;
    float3 I5 = tex2D(decal, VAR.t5.zw).rgb;

    float3 A0 = tex2D(decal, VAR.t6.xy).rgb;
    float3 D0 = tex2D(decal, VAR.t6.xz).rgb;
    float3 G0 = tex2D(decal, VAR.t6.xw).rgb;

    float3 C4 = tex2D(decal, VAR.t7.xy).rgb;
    float3 F4 = tex2D(decal, VAR.t7.xz).rgb;
    float3 I4 = tex2D(decal, VAR.t7.xw).rgb;

    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 c = mul( float4x3(C, A, G, I), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 g = c.zwxy;
    float4 h = b.zwxy;
    float4 i = c.wxyz;

    float4 i4 = mul( float4x3(I4, C1, A0, G5), yuv_weighted[0] );
    float4 i5 = mul( float4x3(I5, C4, A1, G0), yuv_weighted[0] );
    float4 h5 = mul( float4x3(H5, F4, B1, D0), yuv_weighted[0] );
    float4 f4 = h5.yzwx;

    interp_restriction_lv1      = ((e!=f) && (e!=h)  && ( !eq(f,b) && !eq(f,c) || !eq(h,d) && !eq(h,g) || eq(e,i) && (!eq(f,f4) && !eq(f,i4) || !eq(h,h5) && !eq(h,i5)) || eq(e,g) || eq(e,c)) );
    interp_restriction_lv2_left = ((e!=g) && (d!=g));
    interp_restriction_lv2_up   = ((e!=c) && (b!=c));

    edr      = (weighted_distance( e, c, g, i, h5, f4, h, f) < weighted_distance( h, d, i5, f, i4, b, e, i)) && interp_restriction_lv1;
    edr_left = ((coef*df(f,g)) <= df(h,c)) && interp_restriction_lv2_left && edr;
    edr_up   = (df(f,g) >= (coef*df(h,c))) && interp_restriction_lv2_up   && edr;

    float3 info;

    info.x = dot(float4(edr     ), float4(8.0f, 4.0f, 2.0f, 1.0f));
    info.y = dot(float4(edr_left), float4(8.0f, 4.0f, 2.0f, 1.0f));
    info.z = dot(float4(edr_up  ), float4(8.0f, 4.0f, 2.0f, 1.0f));

    return float4(remapTo01(info, float3(0.0f), float3(15.0f)), 1.0);

}




2nd pass (use don’t care and nearest)


/*
   Hyllian's xBR v3.8c (squared) Shader
   
   Copyright (C) 2011/2013 Hyllian/Jararaca - [email][email protected][/email]

   This program is free software; you can redistribute it and/or
   modify it under the terms of the GNU General Public License
   as published by the Free Software Foundation; either version 2
   of the License, or (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


   Incorporates some of the ideas from SABR shader. Thanks to Joshua Street.
*/

const static float coef           = 2.0;
const static float4 eq_threshold  = float4(15.0);
const static float y_weight        = 48.0;
const static float u_weight        = 7.0;
const static float v_weight        = 6.0;
const static float3x3 yuv          = float3x3(0.299, 0.587, 0.114, -0.169, -0.331, 0.499, 0.499, -0.418, -0.0813);
const static float3x3 yuv_weighted = float3x3(y_weight*yuv[0], u_weight*yuv[1], v_weight*yuv[2]);
const static float4 delta       = float4(0.3);

bool4 EDR[16] =
{
   { false, false, false, false },
   { false, false, false, true  },
   { false, false, true , false },
   { false, false, true , true  },
   { false, true , false, false },
   { false, true , false, true  },
   { false, true , true , false },
   { false, true , true , true  },
   { true , false, false, false },
   { true , false, false, true  },
   { true , false, true , false },
   { true , false, true , true  },
   { true , true , false, false },
   { true , true , false, true  },
   { true , true , true , false },
   { true , true , true , true  }
};


float4 df(float4 A, float4 B)
{
    return float4(abs(A-B));
}

float c_df(float3 c1, float3 c2) {
                        float3 df = abs(c1 - c2);
                        return df.r + df.g + df.b;
                }




bool4 eq(float4 A, float4 B)
{
    return (df(A, B) < eq_threshold);
}

float4 weighted_distance(float4 a, float4 b, float4 c, float4 d, float4 e, float4 f, float4 g, float4 h)
{
    return (df(a,b) + df(a,c) + df(d,e) + df(d,f) + 4.0*df(g,h));
}



float3 remapFrom01(float3 v, float3 low, float3 high)
{
    return lerp(low, high, v);
}


struct orig
{
    float2 tex_coord;
    uniform float2 texture_size;
    uniform sampler2D texture;
};


struct input
{
  float2 video_size;
  float2 texture_size;
  float2 output_size;
  float frame_count;
  float frame_direction;
  float frame_rotation;
};



struct out_vertex {
    float4 position : POSITION;
    float4 color    : COLOR;
    float2 texCoord : TEXCOORD0;
    float4 t2;
    float4 t3;
    float4 t4;
    float2 orig_tex;
};

/*    VERTEX_SHADER    */
void main_vertex
(
    float4 position    : POSITION,
    float4 color    : COLOR,
    float2 texCoord : TEXCOORD0,

       uniform float4x4 modelViewProj,
    orig ORIG,
    out out_vertex co
)
{
    co.position = mul(modelViewProj, position);
    co.color = color;

    float2 ps = float2(1.0/ORIG.texture_size.x, 1.0/ORIG.texture_size.y);
    float dx = ps.x;
    float dy = ps.y;

    //    A1 B1 C1
    // A0  A  B  C C4
    // D0  D  E  F F4
    // G0  G  H  I I4
    //    G5 H5 I5

    co.texCoord = texCoord;
    co.orig_tex = ORIG.tex_coord;
    co.t2 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,  -dy); //  A  B  C
    co.t3 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,    0); //  D  E  F
    co.t4 = ORIG.tex_coord.xxxy + float4( -dx, 0, dx,   dy); //  G  H  I
}


/*    FRAGMENT SHADER    */
float4 main_fragment(in out_vertex co, uniform sampler2D decal : TEXUNIT0, orig ORIG, uniform input IN) : COLOR
{

    bool4 edr, edr_left, edr_up, px; // px = pixel, edr = edge detection rule
    bool4 interp_restriction_lv1, interp_restriction_lv2_left, interp_restriction_lv2_up;
    bool4 nc, nc30, nc60, nc45; // new_color
    float4 fx, fx_left, fx_up, final_fx; // inequations of straight lines.
    float3 res1, res2, pix1, pix2;
    float blend1, blend2;

    float2 fp = frac(co.texCoord*IN.texture_size);


    float3 B  = tex2D(ORIG.texture, co.t2.yw).rgb;
    float3 D  = tex2D(ORIG.texture, co.t3.xw).rgb;
    float3 E  = tex2D(ORIG.texture, co.t3.yw).rgb;
    float3 F  = tex2D(ORIG.texture, co.t3.zw).rgb;
    float3 H  = tex2D(ORIG.texture, co.t4.yw).rgb;


    float4 b = mul( float4x3(B, D, H, F), yuv_weighted[0] );
    float4 e = mul( float4x3(E, E, E, E), yuv_weighted[0] );
    float4 d = b.yzwx;
    float4 f = b.wxyz;
    float4 h = b.zwxy;

    float4 Ao = float4( 1.0, -1.0, -1.0, 1.0 );
    float4 Bo = float4( 1.0,  1.0, -1.0,-1.0 );
    float4 Co = float4( 1.5,  0.5, -0.5, 0.5 );
    float4 Ax = float4( 1.0, -1.0, -1.0, 1.0 );
    float4 Bx = float4( 0.5,  2.0, -0.5,-2.0 );
    float4 Cx = float4( 1.0,  1.0, -0.5, 0.0 );
    float4 Ay = float4( 1.0, -1.0, -1.0, 1.0 );
    float4 By = float4( 2.0,  0.5, -2.0,-0.5 );
    float4 Cy = float4( 2.0,  0.0, -1.0, 0.5 );

    // These inequations define the line below which interpolation occurs.
    fx      = (Ao*fp.y+Bo*fp.x); 
    fx_left = (Ax*fp.y+Bx*fp.x);
    fx_up   = (Ay*fp.y+By*fp.x);

    float4 fx45 = smoothstep(Co - delta, Co + delta, fx);
    float4 fx30 = smoothstep(Cx - delta, Cx + delta, fx_left);
    float4 fx60 = smoothstep(Cy - delta, Cy + delta, fx_up);

    float4 info  = tex2D(decal, co.orig_tex);

    int3 i = round(remapFrom01(info.xyz, float3(0.0f), float3(15.0f)));

    edr.w = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.z = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.y = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.x = bool(fmod(i.x, 2));


    edr_left.w = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.z = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.y = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.x = bool(fmod(i.y, 2));


    edr_up.w = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.z = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.y = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.x = bool(fmod(i.z, 2));


    nc45 = ( edr &&      bool4(fx45));
    nc30 = ( edr_left && bool4(fx30));
    nc60 = ( edr_up   && bool4(fx60));

    px = (df(e,f) <= df(e,h));

    nc = (nc30 || nc60 || nc45);

    float4 final45 = nc45*fx45;
    float4 final30 = nc30*fx30;
    float4 final60 = nc60*fx60;

    float4 maximo = max(max(final30, final60), final45);

         if (nc.x) {pix1 = px.x ? F : H; blend1 = maximo.x;}
    else if (nc.y) {pix1 = px.y ? B : F; blend1 = maximo.y;}
    else if (nc.z) {pix1 = px.z ? D : B; blend1 = maximo.z;}
    else if (nc.w) {pix1 = px.w ? H : D; blend1 = maximo.w;}

         if (nc.w) {pix2 = px.w ? H : D; blend2 = maximo.w;}
    else if (nc.z) {pix2 = px.z ? D : B; blend2 = maximo.z;}
    else if (nc.y) {pix2 = px.y ? B : F; blend2 = maximo.y;}
    else if (nc.x) {pix2 = px.x ? F : H; blend2 = maximo.x;}

    res1 = lerp(E, pix1, blend1);
    res2 = lerp(E, pix2, blend2);

    float3 res = lerp(res1, res2, step(c_df(E, res1), c_df(E, res2)));

    return float4(res, 1.0);
}




Figured out the strange behavior pointed above (LVL1 code). I had to use round to get properly integer values.

So, exchanging this line:


int i = remapFrom01(info.x, 0.0f, 16.0f); // Really don't know why 15.0f instead of 16.0f doesn't work! WTF!?

by this:


int i = round(remapFrom01(info.x, 0.0f, 15.0f));

Worked as it was supposed to. That’s because an implicit float->int cast just discard fractional parts and I needed a true rounded integer.

Maister,

Is it possible to include a new kind of output pass access where I would just access the output of two passes behind my actual pass? I know I can use pass%u, but only if I know exactly the index for two passes behind.

I’d like some way to access the output of some pass N passes behind my current pass. This is good if I need to combine two groups of multipass shaders. For example, an mdapt cgp with one xbr cgp.

So do you think you’ll be able to add additional detection levels without too much speed impact using even more passes?

Also, this may get worked out in the optimization step, but just so you know about it:

my Intel HD4000 doesn’t like this part (2nd pass, lines 198-213):

    edr.w = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.z = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.y = bool(fmod(i.x, 2)); i.x = i.x/2;
    edr.x = bool(fmod(i.x, 2));


    edr_left.w = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.z = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.y = bool(fmod(i.y, 2)); i.y = i.y/2;
    edr_left.x = bool(fmod(i.y, 2));


    edr_up.w = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.z = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.y = bool(fmod(i.z, 2)); i.z = i.z/2;
    edr_up.x = bool(fmod(i.z, 2));

It fails with

error C1101: ambiguous overloaded function reference "fmod(int, int)"
    (0) : fixed fmod(fixed, fixed)
    (0) : half fmod(half, half)
    (0) : float fmod(float, float)

I hope so. I have many ideas to try now that only are possible in multipass.

So, your card doesn’t support int?

in the cg specs: The int type is preferably 32-bit two’s complement. Profiles may optionally treat int as float.

I’ll use float instead int and see if it works.