Great work @rafan thanks for this investigation, this is really fantastic stuff! I did have a feeling its was to do with this bit of the code, as in the colour space mapping and looking at your snippet of code I think I can see the bug already:
const vec3 dcip3_colour = (scanline_colour * k709_to_XYZ) * kXYZ_to_DCIP3;
gamma_out = LinearToDCIP3(dcip3_colour);
As in this kXYZ_to_DCIP3 transform is doing a LinearToDCIP3 conversion so we’re effectively doing it twice it looks to me. I’ll look at the code and confirm though. However this doesn’t explain the issue I was seeing as I wasn’t using DCIP3 (I think! hmm). I will look at this as soon as I can I’m just in the middle of a large software project.
As for Dogway and Guest code I certainly based my first attempt on Dogways (stipping out all the stuff I didn’t want/need) but I kind of ditched most of it when I came across the Kronos Groups transforms and various issues I had related to what you’ve said and other things. It may structurally bear some resemblance but the important stuff is all based on Kronos Group examples. I did look at Guest as well and have taken some ideas where they made sense along with other shaders Ive looked at. We all stand on the shoulders of giants at the end of the day and add our own twist.
As for grouping stuff in the same pass this is all down to performance - the first passes are at 240p (fast), the main pass at 4K (slow) and you just have to do things in certain orders rather than all up front. If you have any specific sugeestions I can certainly take a look at them though.
Thanks again for this and hopefully I’ll get some time to fix these issues in the next week or so.