The shaders itself have dynamic range, quite hard to profile, but i like to use the Imageviewer core in Retroarch with a test image. It’s a bit more realistic as RL performance is measured. I also used GPU ShaderAnalyzer, but it didn’t distinct between normal and special functions.
Color calcs gave me decent results with deblur, although i had a luma-deblur running quite a while ago. I think it’s comes into consideration for a performance version indeed.
Otherwise the new deblur shader is quite fast, if you use small loops. It could be optimized, as well as presets, but i try to avoid too much version entropy.