This seems like actual Metal shader compilation limitations.
note: use -fbracket-depth=N to increase maximum nesting level. Looks like a shader compiler issue.
this offers us a hint as to how we could overcome the bracket nesting level limitation (which is 256 apparently by default), but there will likely be more issues up ahead.
Thanks, this helps, if you have other logs please send them our way, there are some of these things that we could fix in the shader code, like unused variables and possibly type errors.
But I think the big roadblock is what twinaphex mentioned about the indent level compile problem. There might also be some problems beyond that.
For this particular issue yes one thing we need to try. But someone needs to get in there and figure out where in the metal driver/compiler you would set it then build and test it on Mac.
If you know anyone who does programming on the Mac and would like to help us, please let us know.
Unfortunately, the C/UNIX world has a bad history with ill defined scope of responsibility. Infamously, ANSI C even fails to make clear whoâs responsibility it is to define how structs are structured which led to incompatibilities (most famously with the difference between win32 delphi vs visual C).
I can confirm they all appear to work now, although really really slowly, at least in the nightly build Iâm using (on my 8core m1). Perhaps due to a different setting I havenât set or have yet to set. The app has so many settings these things can be difficult to nail down.
Itâs the M1 Silicon Mac with this integrated CPU/GPU structure. Performance-wise itâs a beast and quite capable, but these shaders let them just crawl. A new M2 Mini Pro is in order, weâll see whether this can make a difference. However also the M1s are not really underpowered ⌠if you have any ideas, very happy to test!
I also have an M1 and am experiencing the same issues.
Edit: Can run Shadow of the Tomb Raider handily even though itâs running on Rosetta (cpu emulation)
Hi, here are the test results on an M1 Mac Mini with 16 Gb RAM under MacOS Ventura:
FinalBurn Neo/1942 - HSM MBZ_3_STD.slangp -> 35 fps
Nestopia/Super Mario Bros World - HSM HBZ_3_STD.slangp -> 35 fps
Snes9x/Super Mario World - HSM_HBZ_3_STD.slangp -> 30 fps
For most Mega Bezel shaders Iâm getting unplayable FPS with broken up jagged audio using an M1 on a 4k monitor. Some of the default crt shaders work fine, however.
Thanks for the tests, this helps a lot to be able to understand what you are experiencing, I really would not have expected such low performance at that viewport resolution (2392x1792) which is maybe 40% larger than hd.
Since the Mega Bezel uses a LOT of ram compared to other shaders because of so many passes at viewport resolution I wonder if it has something to do with ram bandwidth going in/out of the graphics processor. Does the GPU part of the processor share ram with the rest of the processor?
How do the Screen-Only presets and the Potato presets work? (in not suggesting that these are solutions to your issue, just wondering how they compare)
Apple silicon uses âunified memoryâ which isnât the same as shared per se. So far as I know, nobody else uses this system as basically anything on the SOC can use the same memory without paging between the different SoC components. In other words the GPU and CPU can edit the same block without copying between them but rather using the same pointer (different than NVIDIAâs old âunified memoryâ they had years back). The RAM on this system is soldered on the board via micro BGA.
Doesnât seem much different from what any modern UMA architecture APU/IGP can do for the last couple decades. Once either the CPU or GPU need to talk to anything outside of the SOC, theyâre using the same shared data bus and bandwidth unless there are separate pools of RAM connected to dedicated busses and memory controllers this is how it works. If there were separate pools then things would not be shared anymore and then copies would have to be made when the CPU and the GPU needs to work on the same data. So for all intents and purposes, shared means they share the same memory and bus and unified means the same thing.
The fact that they can use pointers to avoid some copying to improve efficiency doesnât mean that itâs not a shared memory architecture that does not posses the same advantages of a dedicated memory architecture.
One of the main constraints of using these shared/unified memory architectures is the type of RAM used. For dedicated graphics we have GDDRn memory and HBM if you want to achieve the high bandwidths required by real-time graphics rendering. For the lowest cost, slowest and worst memory bandwidth applications, DDRn is often implemented.
Which does the Apple Silicon use?
DDRn usually has lower bandwidth but also lower latencies than its GDDRn counterparts because theyâre optimized for different tasks. When you have to share either one for CPU use, you end up with a situation that can be less than optimal for at least one of them, either you have to tolerate lower total available memory bandwidth or higher latencies.