Lanczos2 sharp shader

Hyllian · 15 January 2017 03:33

I was reading about image processing and came out to implement some ideas to lanczos. The normal way to implement it is by separating the math in two dimensions, X and Y. I decided to use the sinc function in cylindrical coordinates and got something very sharp, though with more ringing. The result is interesting when compared to lanczos16 in common-shaders.

Here it is in beta form yet: http://pastebin.com/RMuQmbvi

Some comparative screenshots:

Nearest neighbor ------------> Bilinear ----------------------> Bicubic-sharper

Reverse AA -------------------> xBR-hybrid-v4 (2 times) —> Lanczos16

Lanczos2-sharp

hunterk · 15 January 2017 05:06

Nice! That is seriously sharp.

Hyllian · 15 January 2017 05:06

Yeah!

I like how it only needs 12 texture lookups to looks like that. In lanczos16 it needs 16 texture lookups!

Maister · 15 January 2017 05:06

The Lanczos shaders would benefit from 2-passing (separable filters and all). I assume the Lanczos16 shader is really a 4x4 filter kernel?

Hyllian · 15 January 2017 05:06

Lanczos16 is already very fast in one pass. I don’t think the speed gain in two passes is better than the simplicity of using only one shader.

Hyllian · 15 January 2017 05:06

Found another kernel extremely sharp and at the same time, smooth the jaggies. Don’t know which filter is this.

The kernel: cos(2x/pi)sin(pix)/pi*x

Here’s the lanczos shader modified to use that kernel: http://pastebin.com/XYGBxdBr

The ringing is big, though. But the way it smooth hard edges reminds me of xbr lvl1!!

system · 15 January 2017 05:06

That’s a sinc filter with a cos(2x/pi) window…not sure if there’s a name for it, since it’s different from the cosine window. You have a lot of flexibility with experimental windowing when you only have two lobes though, since you don’t take samples far enough out to worry about what the window function is doing beyond a single period or so. If you want to play with well-known window functions, check this out: https://en.wikipedia.org/wiki/Window_function

Anyway, it’s quite difficult to perform deringing (or even simple dehaloing) after the damage is done. As of the last time I checked, all of the Avisynth filters for it are pretty destructive, and they don’t even have to be real-time. It’s easier to compensate for ringing early on, and you might want to try adapting something like this: http://forum.doom9.org/showthread.php?t=145358

The author of madVR wrote that, and if I recall correctly, someone else has independently developed something very similar (there might be discussion of it somewhere in that thread).

Hyllian · 15 January 2017 05:06

I just found which filter is that I found tweaking the lanczos shader: It’s a windowed filter. More specifically, it’s a Sinc windowed by cosine filter, that is, sinc is the ideal filter for 1D signal, and as it has infinite terms, for practical realization you need to multiply it by a window that vanishes to zero at the radius you wish (in my case, it vanishes at 2, for two sinc lobes). In this case, the sinc is windowed by the cosine (showed in the kernel above). And, as in the lanczos, It’s using cylindrical coordinates and, hence the aliasing at 45º is smoothed nicely.

EDIT: LOL, beaten by 12 min! =D

Tks for the link. I’ve already knew about that, though I didn’t understand very well what it was doing. I’m more inclined to understand about the jinc filter as it seems very promising in get ridding of jaggies without much haloing.

system · 15 January 2017 05:06

Yeah, the Jinc filter can be pretty good even without a window depending on the material, but I guess that’s what you already tried. I suppose you’ve seen the windowing comparison at the bottom link? http://www.imagemagick.org/Usage/filter/#jinc http://www.imagemagick.org/Usage/filter/nicolas/

Actually, madshi’s technique works best WITH a Jinc filter, if I read his post correctly: It seems like he calculates the values of the highest/lowest contributing samples (raw samples, sinced values, or windowed sinc values?) and does simple limiting: result = min(max(result, min_sample), max_sample), but there are artifacts for a separable filter. Since you’re using a one-pass filter, avoiding those should be easier.

Hyllian · 15 January 2017 05:06

Yes, imagemagick is almost the only source of info about jinc applied to image processing. So, I’ve already read almost all threads about it. The discussions are lengthy and very technical. I’m learning about it yet.

I’m looking for a fast way to implement jinc in shader. I didn’t find anything fast enough yet. If you have a fast implementation, let me know, please.

system · 15 January 2017 05:06

The bottleneck with Jinc is taking all 16 samples. There are two ways to fudge that, but I think only one works for a filter that oscillates as much as Jinc:

First, I’m working on an sRGB texture read/write option for .cgp shaders. That will allow for gamma-correct linear filtering, and there are ways to build faster 1D cubic filters and 1D/2D blur kernels from the results of linear filtering, but 1D sinc and especially 2D jinc seem too complicated to meaningfully utilize the results of getting 4-samples-for-1 with bilinear filtering.

Second, GPU’s render 2x2 fragment quads simultaneously, and there’s a method of using high-quality derivatives with ddx/ddy (on hardware supporting high-quality derivatives) to obtain the results of texture samples in other fragments. This method is detailed in GPU Pro 2 Chapter VI.2, “Shader Amortization using Pixel Quad Message Passing.” If you arrange things correctly, you can do a 5x5 (25-tap) Jinc filter with only 9 texture samples per fragment and calculate the other 16 from the neighboring fragments using quad-pixel communication. It won’t work on all GPU’s, but it should be much faster than a two-pass 5+5-tap filter when it does work. (Older ATI cards don’t have high-quality derivatives, and Retroarch picks a Cg profile without ANY derivatives for open-source radeon drivers.)

I’ve implemented a one-pass 2D 10x10 semi-Gaussian blur using this method that only requires 9 samples per fragment. I haven’t released it yet though for three reasons: 1.) I wanted to originally release it as just one pass of a new complex shader I’m working on. 2.) It relies on leveraging bilinear samples, but this requires gamma-correct bilinear filtering, and I haven’t implemented the .cgp option for sRGB writes/reads yet, so it looks NOTHING like it should. 3.) After I implement that, I need to tweak it: It’s possible to get perfect 1D Gaussian filtering by leveraging bilinear samples, but you have to make concessions with 2D filtering, and I need to test/tweak some tradeoffs before releasing it…which I can’t do until sRGB is implemented. HOWEVER, I can email you the unreleased shader pass, and it shouldn’t be too hard to figure out the quad-pixel communication technique from it. I take 36 samples across 4 pixel fragments, i.e. a 6x6 block, and I take 9 samples from each fragment and build a 5x5 filter from them. This becomes a virtual 10x10 filter by carefully placing the sample points and leveraging bilinear filtering. You however can strip out the sample-placement code, add in nearest-neighbor sample-snapping code, and change the filter weights from Gaussian to windowed Jinc.

Why a 5x5 filter instead of 4x4? Ths technique only works neatly with odd window sizes. You can leverage the same technique for a 3x3 filter with 4 samples per fragment (using a 4x4 sample window across the 2x2 fragment quad, for 16 samples total, divided by 4 fragments for 4 samples per fragment). Unfortunately, it would be very tricky to do a 4x4 filter: You’d have to use a 5x5 window across the 2x2 pixel quad, but that’s a total of 25 taps, which is not divisible by 4…and I’m not sure if it’s possible to improve on this without using all 9 taps of a 5x5 filter.

Anyway, what’s your email address?

Scratch that actually: Since I’m more familiar with the technique, I’ll just adapt it into a 2D Jinc filter, and you can improve on it with whatever windowing function you want! It shouldn’t be too long here…

Hyllian · 15 January 2017 05:06

I don’t think it’s too much trouble for nowadays gpus. I take 25 samples in xbr shaders and nobody complains.

But, any tricks to speed things up is great! Then, I’m curious about your shader implementations.

Could you explain what’s the differencies between RGB and sRGB? I always work with RGB in my shaders.

Go ahead. I’m still learning how jinc works…

Maister · 15 January 2017 05:06

With sRGB, you apply a ~2.2 gamma function when sampling the texture. This is the correct way to do it if the input texture is designed to be displayed on screen (which is roughly like sRGB).

As for sRGB support, I suppose this is only needed for the input texture and perhaps LUTs? The rest of the pipeline could remain linear, and final pass would require gamma correction anyways.

Hyllian · 15 January 2017 05:06

So, it’s the same as this?

sRGB = GAMMA_IN(RGB);

system · 15 January 2017 05:06

Yeah, today’s GPU’s can easily handle 25 samples, but their ALU capabilities seem to be growing a lot faster than their bandwidth and texture lookup latencies too. It’s not bad at all to have 25 samples in an XBR shader, especially if you’re sampling from a small (cache-friendly) source image, but imagine stringing together a really complex shader with 8 passes and 25 taps per pass loading from huge HD FBO’s, and things start to add up a bit more. The shader I’ve been working on (on and off for months) can be a hog (depending on options), which is why I went through all the trouble of looking into these methods in the first place.

Thanks for your address, by the way. You may want to edit your post and obfuscate it a bit though, in case some spammer’s automated spider parses email addresses on this forum!

This is a bit of a complex subject, because it relates to both colorspaces and gamma-correctness. Old gaming consoles output an NTSC signal (at least in America and Japan, the only countries in the world ;)), which has a few different steps of encoding.

Colorimetry in General and for NTSC: First, the actual desired light output of the signal is specified in terms of a very specific RGB colorspace: RGB values themselves only have an unambiguous physical meaning when they’re specified in terms of specific RGB “primaries” and a white point. The “master colorspace” that defines absolute colors is the CIE 1931 XYZ colorspace, and these colors can be converted into a CIE 1931 xyY colorspace to separate chromaticity (the xy plane) from intensity (the Y value). NTSC 1953 specified RGB values in terms of RGB primaries corresponding to the measured chromaticity of RGB CRT phosphors in use at the time, but keep in mind that different-colored phosphors also emit different intensities of light when subjected to the same stimulus as well. Therefore, NTSC 1953 also had to specify the chromatic coordinates of a “white point,” the color you get when the output primaries are mixed in equal amounts like RGB (1.0, 1.0, 1.0). (I suppose if this doesn’t happen naturally, engineers apply a linear scaling factor to ensure the actual white point is as close as possible to the specified white point.) Together, RGB primaries and a white point define an absolute RGB colorspace that gives RGB values physical meaning. The specification of this colorspace actually changed over time for two reasons: 1.) People discovered better phosphor sources with different characteristics (and/or as supply of the old ones became harder to come by) 2.) People came to the consensus that the original C illuminant was a poor approximation of what people perceive as white light (there’s no objective definition of true white light, only standards that we come to agree upon by consensus for subjective reasons). The D65 illuminant is a lot better, corresponding to average sunlight (for some specified definition of average), and it’s what we usually consider today as the “one true white.” https://en.wikipedia.org/wiki/NTSC#Colorimetry I think that NTSC video game consoles are based on the SMPTE “C” phosphor specification, which used updated phosphors with the D65 illuminant for the white point. (You might note that “white” looks off in really old NTSC broadcasts that haven’t been fixed, or that REALLY old color TV’s play modern analog signals with an off-white.)

ANYWAY, the linear colorspace underlying NTSC RGB is based on the SMPTE “C” standard I think.

Colorspace Gamma in General and for NTSC: However, back in the day, the physics of CRT screens also formed part of the basis of video standards. They had a “gamma” of about 2.2-2.4 or so, which meant that their physical light output did not correspond linearly to the electrical strength of the input signal. They operated something like this: light_output_intensity = pow(signal_strength, 2.4ish); This was pretty much just a natural consequence of the process. Instead of trying to make the technology operate linearly, engineers decided to encode the signals themselves in nonlinear light: signal_strength = pow(intended_linear_light_intensity, 1/2.4ish); Therefore, on a scale of [0.0, 1.0], an NTSC RGB value of 0.5 is not 2/3 the intensity of an NTSC RGB value of 0.75. In fact, the true linear RGB light intensities (relative to the primaries and white point) are more like: NTSC RGB 0.5 = ~0.5^2.4 = 0.18946457081379978 NTSC RGB 0.75 = ~0.75^2.4 = 0.5013569413029385

That’s an approximation, since I think the NTSC gamma curve is actually specified at 2.2 rather than 2.4 (and if CRT’s have a gamma of 2.4, they’ll display images a little darker than intended by the NTSC encoding), but you get the picture: An NTSC RGB value of 0.75 corresponds to light intensity that’s much greater than that of an NTSC RGB value of 0.5. In this sense, NTSC RGB values are considered “gamma corrected” for the output display. After this, the NTSC RGB values are converted by means of a matrix multiplication into YIQ values, where the Y channel corresponds to a gamma-corrected luma channel (roughly the gamma-corrected luminance of the signal, where luminance is a specified linear measure of light intensity). These values are then converted into an actual electrical signal, and the limited chroma bandwidth and composite crosstalk and crap add in a bunch of artifacts. Then this YIQ signal is decoded on the CRT’s end back into NTSC RGB (which is nonlinear and gamma-corrected, i.e. gamma-encoded), and the physics of the CRT’s scanline beam and phosphor output nonlinearly “decode” this gamma encoding into linear light.

However - interestingly enough - our EYES don’t perceive linear intensities of light as linear either. For one thing, our pupils are always adjusting the amount of light coming in. Thanks to a bizarre coincidence though, for any given scene, our eyes roughly correspond to the gamma curve of a CRT anyway (they appear to have a gamma of about 2.33 or so: https://en.wikipedia.org/wiki/Gamma_cor … eo_display). As a result, the NTSC-encoded RGB values on an SNES (for instance) actually LOOK to increase pretty linearly to our eyes, even though they’re actually encoded with a nonlinear curve. That made things a hell of a lot easier for artists on old consoles, since they didn’t have to think much about gamma.

sRGB: As PC’s became more popular though, people needed to come up with some kind of absolute colorspace for what RGB meant in a computer context too. Because our eyes respond to linear light with a nonlinear curve, and because color values had to be specified with limited precision (e.g. 8 bits per channel for “true color”), it was important to design the gamma curve such that a linear increase in the encoded RGB value corresponded roughly to a perceived linear increase (rather than an actual linear increase)…otherwise, there wouldn’t be nearly enough precision for decent gradation in the shadowy colors (if you store an image as 8-bit linear RGB of any colorimetry, there will inevitably be a huge amount of banding in dark areas).

As a result, sRGB standard was devised in the 1990’s. This colorspace used the Rec. 709 RGB primaries, which are a bit different from the SMPTE “C” ones used by RGB (I think the NTSC shader handles this difference, or at least I hope it does). It also has a bit of a complicated gamma curve: It corresponds roughly to 2.2 gamma, except the actual curve is a bit more complex to use bandwidth a bit more ideally. JPEG files, etc. are typically encoded in sRGB, and monitors usually assume (unless told otherwise) that the RGB values they’re given are sRGB. This allows them to convert them properly to the correct/specified linear light values for output.

Implications for Us: Fast forward to now: Emulators output color in NTSC RGB, shaders handle it as some generic unspecified RGB colorspace, and then they output it to the framebuffer. The RGB values in the framebuffer are generally assumed to be in sRGB, and they’re converted into light by LCD displays (which have a gamma around 2.2-2.4 by design, like CRT displays, but the actual curve for the liquid crystals is highly nonlinear and compensated for by circuitry).

The NTSC shader presumably encodes NTSC RGB to YIQ, simulates the signal artifacts of encoding YIQ into an electrical signal and decoding it (which may require temporary conversion to a linear signal for the blurring involved? not sure), then decodes YIQ back to NTSC RGB, gamma-decodes to linear RGB (based on NTSC primaries) with the NTSC primaries, performs a colorimetry conversion to the Rec.709 primaries, then gamma-encodes from Rec.709 linear RGB to sRGB (or just does a straight gamma encoding to get close). It might also apply a slight gamma correction from the presumed 2.4 CRT gamma to the standard 2.2 gamma. If it has an “RGB version” like blargg’s filter, I think this version converts from NTSC RGB to linear RGB with NTSC colorimetry (a matrix multiplication), converts to Rec.709 colorimetry, then encodes to sRGB or encodes with 2.2 gamma.

Most shaders just ignore the colorimetry differences between NTSC RGB primaries and sRGB primaries. This isn’t perfect, so the colors are a bit off, but it’s subtle enough that nobody ever really thinks about it. HOWEVER, if a shader does linear interpolation between input color values or otherwise mixes them in any way (such as with a resampling filter), the result is too dark. This is because they’re doing the mixing in a gamma-encoded space, when it SHOULD be done in terms of linear light values. This is why bilinear interpolation seems to smear/spread black lines more than it should and thin out white lines. (However, sometimes the bandwidth-limiting analog filters and softness/sharpness filters in CRT’s did the same thing to an extent, by blurring the encoded NTSC signal horizontally. A versatile CRT shader might offer an option to do a bit of horizontal blurring in nonlinear RGB, which is something cgwg looked into a bit.) To avoid this improper linear blending of nonlinear light values, a lot of shader authors perform a rough conversion from gamma-encoded RGB to linear RGB (with assumed sRGB primaries, even though the input was actually using NTSC primaries) like this: working_rgb = pow(tex2D(texture, uv_coords), 2.2); Then they’ll work with the linear RGB values, mix them, etc., and gamma-encode them at the end for display: output_rgb = pow(working_rgb, 1/2.2); This ensures the color mixing is done in linear light, which avoids darkening and artifacts. This is kind of slow though, and it means you can’t do gamma-correct bilinear filtering. You also can’t just output linear RGB to an FBO (unless it’s a 32-bit floating point FBO), because there’s not enough precision, and it will decimate your shadowy areas.

What sRGB texturing does is this: You tell OpenGL the texture is sRGB, so when it reads the value into the shader, it automatically converts it to linear light with a precise sRGB curve. (This isn’t ideal if you’re reading NTSC RGB, but whatever.) Then, when you output the value back to an sRGB FBO, it will automatically do the conversion back to sRGB (again similar to the 2.2 gamma encoding, if a little different). This is a lot faster and allows for gamma-correct bilinear filtering.

system · 15 January 2017 05:06

That’s the thing: The rest of the pipeline can’t remain linear unless you use 32-bit floating point FBO’s (slow), because there isn’t enough precision in 8-bit (per channel) FBO’s to encode linear RGB values without decimating the shadowy areas. That’s why shaders like cgwg’s multi-pass CRT shader do a gamma-decode at the beginning of each pass and a gamma-encode at the end of each pass.

It’s also the most important technical reason why sRGB needed to be a nonlinear colorspace in the first place: 8-bit components just aren’t enough to store linear RGB values, since the perceptual difference between dim values would be too great, resulting in awful gradient banding in the shadows. (The other reasons weren’t strict necessities but practical issues: They had to make sure sRGB would display properly on oblivious CRT monitors fed by legacy VGA cards, and they also had to make sure graphics artists could linearly increase the RGB value and get what human eyes would “perceive” as a linear increase in light output…actually a much larger increase in mathematically linear terms.)

Allowing users to specify sRGB textures for the input texture and output FBO’s as a .cgp option would allow shader authors to bypass these manual gamma conversions (and if an FBO is written to as sRGB, it must be read from in later passes as sRGB as well for consistency). This is faster, because the GPU hardware/drivers are optimized for it, and it would also enable fast gamma-correct bilinear filtering.

As things stand, you can only do gamma-correct bilinear filtering if you’re reading from a 32-bit FBO (which had linear RGB values written to it) or if you manually read 4 samples, gamma-decode them, and mix them. As a side note, an FBO can only be floating point 32-bit OR sRGB, not both simultaneously, because the people who designed the GL_EXT_texture_sRGB and GL_EXT_framebuffer_sRGB OpenGL extensions (and D3D support, etc.) determined there’s no real need for a 32-bit sRGB format: A 32-bit FBO has enough precision that you can just store linear RGB values directly if you want.

The option has to default to false to avoid affecting existing shaders though: The shader author has to be aware when sRGB textures/FBO’s are being used. If the author doesn’t realize sRGB textures are being used and does manual gamma-correction, it will result in double-decodes and double-encodes. (Conversely, if the author thinks sRGB textures are being used when they aren’t, they’re liable to omit gamma correction altogether.)

Ideally there should be an option to read an LUT as an sRGB texture too, but I don’t plan on implementing support for that: As far as I can tell it would require too much refactoring, because you have to detect support for OpenGL’s sRGB extensions at runtime. Retroarch’s extension-detecting support requires coders to use your gl struct, but the gl struct is not available in shader_cg.c or shader_glsl.c, where LUT’s are loaded from.

Maister · 15 January 2017 05:06

I suppose you have a point there. That does create a necessity to pass #defines to the shader which tells it if sRGB reads and writes are supported or not.

One additional thing though is that you only have 8-bit sRGB, so RGB565 stuff needs yet another 16->32 bit conversion special path.

The gl extension checker just checks for core context or not (glGetStringi vs glGetString). That’s an easy workaround at least.

system · 15 January 2017 05:06

I just figured shader authors would set it in the .cgp file on a per-pass basis: srgb0 would affect the input texture, just like filter_linear0, and srgb1 would affect the FBO that pass0 is written to (and how that FBO is later read by any pass), just like filter_linear1. That gives shader authors the most flexibility, avoids affecting existing shaders, and avoids passing #defines. Also, it will hopefully help reinforce the right way to think of the linear option too: filter_linear1 affects how the output of pass0 is read, no matter whether it’s read from pass1, pass2, pass7, etc., and having another option that works the same might help authors remember that.

The output of the final pass is a little trickier though, since there’s no subsequent pass option to specify how it’s read. I suppose there are two options, and I’d like your input about which you’d prefer: a.)I could add a “final_output_srgb” option or something that controls whether the final pass writes to an sRGB texture or if the user must do the gamma correction themselves (like usual). b.)Leave it out, and let shader authors know (or figure out ;)) that there’s no way to specify sRGB for the final output, so they’ll have to perform a final gamma correction themselves manually.

Oh, yuck. Haha…I’ll have to keep that in mind. That said, I wonder if sRGB extensions are even supported on the architectures where it’s necessary to use RGB565 .

Yeah, but I felt like I would be dirtying up the codebase too much if I did that in shader_cg.c or shader_glsl.c. It’s up to you and Squarepusher: I can implement it for LUT’s too if you guys want.

Maister · 15 January 2017 05:06

I just figured shader authors would set it in the .cgp file on a per-pass basis: pass0_srgb would affect the input texture, just like pass0_linear, and pass1_srgb would affect the FBO that pass0 is written to (and how that FBO is later read by any pass). [/quote]

You still don’t know if sRGB read/writes are really supported in the shader (#if !LINEAR_TO_SRGB col = gamma_correction(col) #endif), and you need a way to specify if the final pass to framebuffer should do linear -> sRGB. It’s kinda hairy sRGB framebuffers work quite differently on GLES as well.

system · 15 January 2017 05:06

True, but there are a lot of other important things you don’t know from inside the shader too. For instance, I can’t determine from inside the shader whether ddx() and ddy() are even supported, because the Cg profile depends on driver capabilities: If they aren’t supported and I try to use them, the shader just fails to load and emits an error. In this case, if sRGB isn’t supported, the C program can just emit an error message mentioning it’s falling back to regular textures and FBO’s, and the output will look like crap (because of all the wrong gamma assumptions), but it’ll still run. Alternately, the C program could just decide to do the same thing as when ddx() isn’t supported: Fail to load the shader.

Of course, passing a #define WOULD allow shader authors to write fallback code themselves. This would be ideal, but…really, if we’re going down that road, there’s a LOT of extra information I wish was passed to shaders, like (time for an off-topic tangent): [ul][li]*****Profile name or some corresponding value: See if ddx/ddy/tex2Dlod, etc. are supported[/2e122dnk][/li][li]*****The final viewport resolution[/2e122dnk][/li][li]*****.cgp scale parameters: I’m not sure if there’s ever a time when a pass would need to know this value, “par say,” but I have an early pass in my shader that requires the final viewport resolution (see above ;)), and it would be insane to hardcode that as a user setting. To compensate, I set the scale type of this pass to “viewport” and copy the scale value (since the shader doesn’t know the true scale set in the .cgp), and then I calculate the final viewport resolution based on that. The setting can still get out of sync with the .cgp when the .cgp scale is changed (there’s a performance-tweaking reason to do this, because the minimum required scale depends on other user settings, and it’s very slow per-fragment), but it’s a lot more stable than the final viewport resolution itself at least. As a side note, actually setting the size of the pass to the final viewport resolution isn’t an option, because it downsizes an LUT to a much smaller scale with up-to-126-tap sinc or Lanczos scaling in just one direction (and there are two of these passes…one horizontal, and one vertical). Tiling it to the whole viewport size would be way too slow, but it needs to know the viewport size to know what scale to resize by. These two passes are optional depending on user settings, which leads into the longest bullet-point on this list.[/2e122dnk][/li][li]*****.cgp filter parameters: A reusable shader pass might be used by multiple .cgh files, and it might require different behavior depending on the filter mode…and if so, it’s the difference between being able to reuse one file and having to make a duplicate with minor adjustments[/2e122dnk][/li][li]*****Some way of selecting between multiple preset settings: I’d like to reuse the same configurable shader passes from multiple .cgh files, each specifying a different #include file containing relevant user configuration settings (so I can supply presets). This could be done by supplying a simple integer to distinguish which preset is intended, or by #define-ing the name of the .cgp file without the extension, or by #define-ing the value of some string contained within the .cgp file (the name of the intended #include file for instance).[/2e122dnk][/li][li]*****Passes requiring multiple input FBO’s can be very brittle and hard to reuse, because the numbering of those passes is tied to the structure of the .cgp file, but each pass is held responsible for specifying its inputs. I wish there was a more “absolute” way to read from the outputs of previous passes by specifying a pass name or unique identifier instead of an absolute pass number or a relative offset (2 passes ago). For instance, the shader I’m working on has a lot of different versions depending on hardware capabilities. One version has a 1-pass Gaussian blur (which I referred to earlier), but there’s another version that uses a 2-pass blur in case high-quality derivatives aren’t supported. After blurring the input, there are also two optional LUT pre-processing passes (the aforementioned 126-tap Lanczos filter passes) that go before the next pass. So…is the next pass pass1, pass2, pass3, or pass4? How do we specify, “I want to read the output of the blur” from the later shader pass itself? If the pass is used by multiple .cgp files that arrange passes differently, the output of the blur might be the output of pass0 or the output of pass1, and it might be the output of the previous pass or the output three passes ago…so both absolute and relative numbering break down. As a result, passes requiring multiple input FBO’s become less reusable. This could be fixed in a number of ways. The simplest is by the last bullet-point: If the .cgp file can specify an #include file by name (or some unique identifying number known to the shader author), the settings in that file could include, “static const int blur_pass = 1;” or something, which lets all the other passes know to use PASS1 for the blur output. However, this does run into the same problem I have with the .cgp scale parameter: If the settings file gets out of sync with the .cgp, it’ll specify the wrong pass. A more robust way might be to not only define PASS0, PASS1, etc. but also define aliases for the same structs using the actual names of the shader passes. The real problem is that the responsibility for identifying the inputs of each pass should not be the responsibility of the pass itself but of the .cgp file that includes it: A .cg file should just be able to identify “INPUT0, INPUT1,” etc., and the per-pass settings in the .cgp file would determine what FBO to feed into INPUT1.[/2e122dnk][/ul][/li] ARGH, I can’t get the lists to work right, so I added in manual bolded asterisks…is the board broken, or is it just gimping posters with a low post count?