One thing I’d like to share here is something related to this subject. Because, maybe it’s not much fault of the num of shaders in the chain.
I noticed a strange behavior when I was studying crt-royale to come up with its faster version crt-royale-fast. After comparing the cg and glsl versions of that shader, I noticed that they loaded much faster than the slang one. And the code was the same for cg, but different for glsl (this one was completely unrolled because glsl can’t use #include clauses). So I first began a journey to unroll slang royale and when I finished and tested it, for my surprise, it was even slower than the default one! Yes, no matter if unrolled or not, it was much slower than the cg and glsl versions. My conclusion: there’s something very inefficient in the slang code to parse the shader. It’s like cg and glsl uses some O(n) algorithm while slang uses O(n^2), it increases exponentially with the num of code to parse.
The only way I found to load royale faster was to get rid of some complex code that I found no much worth the benefit.
I brought this here so that maybe someone could compare how the RA’ slang backend could be optimized based on the glsl counterpart.