Mega Bezel on M1 Macs

Seems like itā€™s working now @Ball, @2V3EvG4LMJFdRe, @reypacino, @bloodtype, @wakka:

Mega Bezel Reflection Shader! - Feedback and Updates

1 Like

I can confirm they all appear to work now, although really really slowly, at least in the nightly build Iā€™m using (on my 8core m1). Perhaps due to a different setting I havenā€™t set or have yet to set. The app has so many settings these things can be difficult to nail down.

At any rate, real progress!

1 Like

Hi, I can confirm that Mega Bezel works with the new Vulkan driver, but just crawls.

On Macs I highly recommend the koko-aio shader (also under bezel) that performs incredibly well and in HDR!

1 Like

What GPUs are both of you using in your systems?

1 Like

Itā€™s the M1 Silicon Mac with this integrated CPU/GPU structure. Performance-wise itā€™s a beast and quite capable, but these shaders let them just crawl. A new M2 Mini Pro is in order, weā€˜ll see whether this can make a difference. However also the M1s are not really underpowered ā€¦ if you have any ideas, very happy to test!

3 Likes

Yeah, Iā€™ve heard the M1 Silicon is supposed to be quit good.

There is another user who has tried this and gotten good performance on their M1 Pro running on MACOS Ventura.

Iā€™ll get more test results / numbers from them tonight that I can share with you.

2 Likes

I also have an M1 and am experiencing the same issues. Edit: Can run Shadow of the Tomb Raider handily even though itā€™s running on Rosetta (cpu emulation)

1 Like

Can you check what the resolution is coming out of the core?

You can do this by changing the second parameter in the list which is something like resolution debug.

Do you get the same performance when you run something like NES , SNES, Genesis?

Hi, here are the test results on an M1 Mac Mini with 16 Gb RAM under MacOS Ventura:
FinalBurn Neo/1942 - HSM MBZ_3_STD.slangp -> 35 fps
Nestopia/Super Mario Bros World - HSM HBZ_3_STD.slangp -> 35 fps
Snes9x/Super Mario World - HSM_HBZ_3_STD.slangp -> 30 fps

Screenshots:

Hope it helps :slight_smile:

3 Likes

For most Mega Bezel shaders Iā€™m getting unplayable FPS with broken up jagged audio using an M1 on a 4k monitor. Some of the default crt shaders work fine, however.

2 Likes

Thanks for the tests, this helps a lot to be able to understand what you are experiencing, I really would not have expected such low performance at that viewport resolution (2392x1792) which is maybe 40% larger than hd.

Since the Mega Bezel uses a LOT of ram compared to other shaders because of so many passes at viewport resolution I wonder if it has something to do with ram bandwidth going in/out of the graphics processor. Does the GPU part of the processor share ram with the rest of the processor?

How do the Screen-Only presets and the Potato presets work? (in not suggesting that these are solutions to your issue, just wondering how they compare)

1 Like

Apple silicon uses ā€œunified memoryā€ which isnā€™t the same as shared per se. So far as I know, nobody else uses this system as basically anything on the SOC can use the same memory without paging between the different SoC components. In other words the GPU and CPU can edit the same block without copying between them but rather using the same pointer (different than NVIDIAā€™s old ā€œunified memoryā€ they had years back). The RAM on this system is soldered on the board via micro BGA.

I have 16GB

1 Like

Doesnā€™t seem much different from what any modern UMA architecture APU/IGP can do for the last couple decades. Once either the CPU or GPU need to talk to anything outside of the SOC, theyā€™re using the same shared data bus and bandwidth unless there are separate pools of RAM connected to dedicated busses and memory controllers this is how it works. If there were separate pools then things would not be shared anymore and then copies would have to be made when the CPU and the GPU needs to work on the same data. So for all intents and purposes, shared means they share the same memory and bus and unified means the same thing.

The fact that they can use pointers to avoid some copying to improve efficiency doesnā€™t mean that itā€™s not a shared memory architecture that does not posses the same advantages of a dedicated memory architecture.

One of the main constraints of using these shared/unified memory architectures is the type of RAM used. For dedicated graphics we have GDDRn memory and HBM if you want to achieve the high bandwidths required by real-time graphics rendering. For the lowest cost, slowest and worst memory bandwidth applications, DDRn is often implemented.

Which does the Apple Silicon use?

DDRn usually has lower bandwidth but also lower latencies than its GDDRn counterparts because theyā€™re optimized for different tasks. When you have to share either one for CPU use, you end up with a situation that can be less than optimal for at least one of them, either you have to tolerate lower total available memory bandwidth or higher latencies.

https://m.slashdot.org/story/185451

So here are some further insights into the HSM presets on Apple M1 Mac Mini 16 GB (yes, unified memory) - all FinalBurnNeo/1942, as the other cores donā€™t behave that differently:

HSM_MBZ_5_POTATO.slangp -> 280 fps (ffw), basically all POTATO presets I tested were at 280 fps
HSM_MBZ_4_SCREEN_ONLY_MEGATRON.slangp -> 280 fps (ffwd)
Same with HSM_MBZ_4_STD_NO-REFLECT.slangp
MBZ_4_STD-NO-REFLECT-EASYMODE.slangp -> a bit slower with 260 fps
MBZ_4_STD-NO-REFLECT-SUPER-XBR_GDV -> breaks down to 160 fps
MBZ_3_STD_MEGATRON.slangp -> crawls at 40 fps and 97 fps (ffwd, interestingly)

Note that the koko-aio shader, which I started using now in Vulkan, also achieves the 280 fps, however with all the nice additional effects that the POTATO presets do not have.

1 Like

Whether the memory is shared or unified may not matter with respect to this application, but it definitely matters when modifying data when using Appleā€™s APIs. As for the type of RAM used, there are two things traditional dedicated RAM have to do (or two bottlenecks to overcome): copy RAM via DMA transfers and quickly modify (large chunks of) data through the GPU over and over again as it traverses the graphics pipeline. Itā€™s the latter where the more it behaves like an L3 cache the better.

FWIW according to Apple:

M1 LPDDR4X-4266 128 bit 68.3 GByte/s

M1 Pro LPDDR5-6400 256 bit 204.8 GByte/s

M1 Max LPDDR5-6400 512 bit 409.6 GByte/s

M1 Ultra LPDDR5-6400 1024 bit 819.2 GByte/s

3 Likes

You may want to try out the latest release of the Mega Bezel from Github, I did an improvement which increases performance some :grin:.

3 Likes

Hi! First off thank you for your work and contribution on these shaders, they look fantastic. Allow me to add my feedback as a Mac M1 user:

Using RetroArch and downloading the slang shader form the built-in updater, followed your settings recommendations in the readme. Presets without reflections works well, however I soon as I select one with reflections the game is heavily slowed down and laggy. I was thinking it must have been a spec issue, but In comparison the koko-aio presets with a similar kind of reflection run smoothly.

Itā€™s a shame because the reflections look gorgeous. Any advice or help on fixing this would be appreciated.

Best, Thomas

1 Like

The Mega Bezel is just more computation intensive. The main difference between kokoaio and Mega Bezel is that the Mega Bezel has a fully interactive adjustable bezel and multiple adjustable image layers.

Kokoaio is a great option if you donā€™t need that configurability, but it has lots of great options and some really unique CRT features.

2 Likes

Thanks for getting back. Iā€™ll keep toying with both. Since I really wanted to go with the Mega Bezel shaders I will try and see if I find a specific parameter that reduces the computational overhead and allows games to run smoothly.

Meanwhile, if you have any pointers or starting points to suggest on which shader or parameter could be tweaked or turned off to reduce the system load that would be great!

1 Like

You can try the GDV-Mini, GDV-MINI-NTSC or Sony Megatron Color Video Monitor base presets. They have very low overhead. Also, make sure youā€™re using presets from the MBZ__3__STANDARD folder or lower. I think another Mac user said the Glass presets worked faster, but you can try all of the base presets until you find one that might magically work well with the platform.

1 Like