Hey Hyllian, I’m sorry to say I won’t be able to help optimize your Jinc shader after all. I really tried, but it turns out distributing texture fetches across a 2x2 pixel quad probably won’t help for resize shaders. Sorry

The technique should be faster for texture-fetch-bound Gaussian blurs at the same resolution, because the weights can be computed statically. Resize shaders are another matter entirely though, because the weights have to be computed at runtime. A 5x5 filter (which doesn’t make sense for anything with negative lobes) requires computed all 36 weights over a 6x6 window before zeroing out the 11 outside each destination fragment’s 5x5 window. A 4x4 filter (Lanczos2) unfortunately requires the same number of samples (11 are duplicates), 25 weight calculations, and zeroing out 9 of them. I had enough morbid curiosity to finally get them to work after a few days of debugging (ugh), but all the ALU operations cause the performance to actually be slower than the regular version…so it’s slower, more complicated, and only works with derivatives.