Mega Bezel on M1 Macs

This seems like actual Metal shader compilation limitations.

note: use -fbracket-depth=N to increase maximum nesting level. Looks like a shader compiler issue.

this offers us a hint as to how we could overcome the bracket nesting level limitation (which is 256 apparently by default), but there will likely be more issues up ahead.

1 Like

Thanks, this helps, if you have other logs please send them our way, there are some of these things that we could fix in the shader code, like unused variables and possibly type errors.

But I think the big roadblock is what twinaphex mentioned about the indent level compile problem. There might also be some problems beyond that.

1 Like

Can’t you just add the compiler flag with an increased value?

For this particular issue yes one thing we need to try. But someone needs to get in there and figure out where in the metal driver/compiler you would set it then build and test it on Mac.

If you know anyone who does programming on the Mac and would like to help us, please let us know.

1 Like

It seems to be a clang thing, and it’s a fairly recent change that’s bothering more people than us, it seems: https://github.com/llvm/llvm-project/issues/48973

2 Likes

I can code, but I’ve been trying to retrieve my old account because Apple doesn’t allow me to do it via email (need xcode).

2 Likes

Good catch, hunterk.

Unfortunately, the C/UNIX world has a bad history with ill defined scope of responsibility. Infamously, ANSI C even fails to make clear who’s responsibility it is to define how structs are structured which led to incompatibilities (most famously with the difference between win32 delphi vs visual C).

1 Like

Seems like it’s working now @Ball, @2V3EvG4LMJFdRe, @reypacino, @bloodtype, @wakka:

Mega Bezel Reflection Shader! - Feedback and Updates

1 Like

I can confirm they all appear to work now, although really really slowly, at least in the nightly build I’m using (on my 8core m1). Perhaps due to a different setting I haven’t set or have yet to set. The app has so many settings these things can be difficult to nail down.

At any rate, real progress!

1 Like

Hi, I can confirm that Mega Bezel works with the new Vulkan driver, but just crawls.

On Macs I highly recommend the koko-aio shader (also under bezel) that performs incredibly well and in HDR!

1 Like

What GPUs are both of you using in your systems?

1 Like

It’s the M1 Silicon Mac with this integrated CPU/GPU structure. Performance-wise it’s a beast and quite capable, but these shaders let them just crawl. A new M2 Mini Pro is in order, we‘ll see whether this can make a difference. However also the M1s are not really underpowered … if you have any ideas, very happy to test!

3 Likes

Yeah, I’ve heard the M1 Silicon is supposed to be quit good.

There is another user who has tried this and gotten good performance on their M1 Pro running on MACOS Ventura.

I’ll get more test results / numbers from them tonight that I can share with you.

2 Likes

I also have an M1 and am experiencing the same issues. Edit: Can run Shadow of the Tomb Raider handily even though it’s running on Rosetta (cpu emulation)

1 Like

Can you check what the resolution is coming out of the core?

You can do this by changing the second parameter in the list which is something like resolution debug.

Do you get the same performance when you run something like NES , SNES, Genesis?

Hi, here are the test results on an M1 Mac Mini with 16 Gb RAM under MacOS Ventura:
FinalBurn Neo/1942 - HSM MBZ_3_STD.slangp -> 35 fps
Nestopia/Super Mario Bros World - HSM HBZ_3_STD.slangp -> 35 fps
Snes9x/Super Mario World - HSM_HBZ_3_STD.slangp -> 30 fps

Screenshots:

Hope it helps :slight_smile:

3 Likes

For most Mega Bezel shaders I’m getting unplayable FPS with broken up jagged audio using an M1 on a 4k monitor. Some of the default crt shaders work fine, however.

2 Likes

Thanks for the tests, this helps a lot to be able to understand what you are experiencing, I really would not have expected such low performance at that viewport resolution (2392x1792) which is maybe 40% larger than hd.

Since the Mega Bezel uses a LOT of ram compared to other shaders because of so many passes at viewport resolution I wonder if it has something to do with ram bandwidth going in/out of the graphics processor. Does the GPU part of the processor share ram with the rest of the processor?

How do the Screen-Only presets and the Potato presets work? (in not suggesting that these are solutions to your issue, just wondering how they compare)

1 Like

Apple silicon uses “unified memory” which isn’t the same as shared per se. So far as I know, nobody else uses this system as basically anything on the SOC can use the same memory without paging between the different SoC components. In other words the GPU and CPU can edit the same block without copying between them but rather using the same pointer (different than NVIDIA’s old “unified memory” they had years back). The RAM on this system is soldered on the board via micro BGA.

I have 16GB

1 Like

Doesn’t seem much different from what any modern UMA architecture APU/IGP can do for the last couple decades. Once either the CPU or GPU need to talk to anything outside of the SOC, they’re using the same shared data bus and bandwidth unless there are separate pools of RAM connected to dedicated busses and memory controllers this is how it works. If there were separate pools then things would not be shared anymore and then copies would have to be made when the CPU and the GPU needs to work on the same data. So for all intents and purposes, shared means they share the same memory and bus and unified means the same thing.

The fact that they can use pointers to avoid some copying to improve efficiency doesn’t mean that it’s not a shared memory architecture that does not posses the same advantages of a dedicated memory architecture.

One of the main constraints of using these shared/unified memory architectures is the type of RAM used. For dedicated graphics we have GDDRn memory and HBM if you want to achieve the high bandwidths required by real-time graphics rendering. For the lowest cost, slowest and worst memory bandwidth applications, DDRn is often implemented.

Which does the Apple Silicon use?

DDRn usually has lower bandwidth but also lower latencies than its GDDRn counterparts because they’re optimized for different tasks. When you have to share either one for CPU use, you end up with a situation that can be less than optimal for at least one of them, either you have to tolerate lower total available memory bandwidth or higher latencies.

https://m.slashdot.org/story/185451