Cg shader profiles different on Linux and Windows?

I’ve been working on a Cg shader, and I noticed the compilation profiles are different on my Linux and Windows machines. For instance, I can get away with calling ddx, ddy, tex2D with explicit derivatives, and tex2Dlod on my Windows/nVidia setup but not my Linux/ATI setup with open source drivers. Is this an OS difference, an OpenGL/Direct3D difference, or a fallback issue where Retroarch is attempting a more advanced profile and falling back when there isn’t hardware/driver/etc. support?

More specifically, what Cg compilation profiles are used and in what circumstances?

I believe I found the answer to my own question. At least on the PC, profile selection appears to be done in the following two places:

OpenGL: The gl_cg_init function in /gfx/shader_cg.c calls cgGLGetLatestProfile(CG_GL_VERTEX) and cgGLGetLatestProfile(CG_GL_FRAGMENT).

D3D: The RenderChain::compile_shaders method in gfx/d3d9/render_chain.cpp calls cgD3D9GetLatestVertexProfile() and cgD3D9GetLatestFragmentProfile().

The above functions are part of the Cg toolkit, so it’s clear now that Retroarch imposes no limits and simply utilizes the most advanced profile the Cg toolkit detects your computer can support. Now that I know I can target any profile I want, that makes things both much simpler and much more complicated, all at the same time. :wink:

Yeah, things can definitely get complicated, particularly when AMD/Radeon cards are involved. Likewise, the closed vs open source driver issue complicates things even further.

The ddx/ddy function in fes’ ‘pixellate’ shader chokes on my Radeon card in Windows or Linux but seems to work fine for people with Nvidia

Thanks: It’s good to know to be extra cautious with ddx/ddy! I wasn’t able to even run ddx/ddy on my ATI GPU (probably due to the open source drivers), but they didn’t seem to impact my 8800 GTS at all. tex2D calls that use the derivative arguments are extremely slow though, because the GPU doesn’t know if the derivative will be the same for each 4-fragment block and serializes the texture accesses, slowing them down by a factor of 4.

Thankfully I don’t have to use any of those functions. They’re just in one codepath, depending on a user option for how to deal with artifacts caused by anisotropic filtering with manually tiled texture coords (frac/fmod causes a derivative discontinuity at tile boundaries). tex2Dlod is another option (since it can disable anisotropic filtering), but it’s also too advanced for my “baseline” profile to handle (based on what my open source radeon driver can handle ;)). There are other higher-level solutions involving rearranging the work done in each pass though. All this, and it doesn’t even matter for the ATI machine I actually play on, because it apparently isn’t using AF at all (with no way I know of to enable it).

The good thing about options is they’re there…the bad thing is I’m tempted to code in every single one.