Performance differences with different scaling settings

Hi everyone! Got my Pi 5 a week ago and have been testing RetroArch on it. Running it via 64-bit RetroPie (installed manually on RPi OS Lite Bookworm via the RetroPie-Setup script). Things are working fine, but I did run into a peculiar thing when testing shaders. zfast-crt shader runs very quickly, which is to be expected. My tests seem to indicate that it executes in around 1 ms. Since I like crt-aperture, I decided to give it a go, even though it’s much more demanding. To my disappointment, it ran awfully, despite the Pi 5’s ~200% faster GPU (compared to the Pi 4). By coincidence, I was playing around with the scaling/aspect ratio settings while crt-aperture was active and suddenly noticed that it was running at 60 FPS.

Testing around with the scaling settings, it turns out that enabling integer scaling and 6:5 aspect ratio (not “Custom” + 6x and 5x, but the actual 6:5 setting in the list) did the trick. With some additional tests done, it seems as if some of the aspect ratio settings provide marked performance increases over others. Here’s a bunch of tests run with lr-snes9x on the title screen of Yoshi’s Island:

  • Integer scale=off, Aspect ratio: Core provided, no shader: 442 FPS
  • Integer scale=off, Aspect ratio: Core provided, crt-aperture: 47 FPS
  • Integer scale=on, Aspect ratio: Core provided, crt-aperture: 67 FPS
  • Integer scale=on, Aspect ratio: 4:3, crt-aperture: 67 FPS
  • Integer scale=on, Aspect ratio: 6:5, crt-aperture: 74 FPS
  • Integer scale=on, Aspect ratio: 1:1 PAR, crt-aperture: 77 FPS
  • Integer scale=on, Aspect ratio: Custom, 6x, 5x, crt-aperture: 44 FPS

[As a bonus, I tried crt-easymode as well. It seems to execute at around twice the speed of crt-aperture, making it quite reasonable to use on the Pi 5, atleast for some emulators.]

While I expect that there might not be much to do about this, it would be great if anyone in the know would comment. Is it expected behavior? I am guessing it could be memory bandwidth related. In my initial testing, it seems the performance is only affected if shaders are also active. Without shaders, I couldn’t see any difference with the different scaling settings.

All testing carried out with the glcore driver, slang shaders and 1920x1080 resolution.

You are lucky it run 67 fps, that’s one of the heaviest shaders around, probably SLANG helps performing better. As i read pi-5 only has 51 gflops gpu which is slow as hell to run a demanding shader. When you turn integer on the screen is smaller so it’s normal to run faster(shader calculates less pixels) . There are other shaders around that are equally good looking and performing better. I wrote “crt-sines” on GLSL for a gpu around the same speed of pi-5 (~50 gflops).

1 Like

Ugh, now I feel stupid. Should have brewed that coffee this morning. Yeah, you’re of course right regarding that it is the smaller output that makes it render faster due to less calculations. Everything is as it should be here then. :sweat_smile:

Regarding the Pi 5 computational performance, I think the 51 GFLOPS number is wrong. I saw it quoted on some weird site right after the Pi 5 was launched, but I think it’s just someone taking the Pi 4’s 32 GFLOPS value and multiplying it by the supposed frequency increase (800 MHz vs 500 MHz). That results in 51.2 GFLOPS. There seems to be two issues with this:

The V3D (3D core) of the Pi 5 does not run at 800 MHz as first reported. The correct frequency is 960 MHz. The documentation at raspberrypi.com incorrectly states that the V3D runs at 910 MHz, but it’s only the rest of the core (including ISP and decoder) that runs at 910 MHz.

There’s also this post which dumps some VC7 info and compares it to VC6. It appears the VC7 doubles up on the QPU count from 8 in the Pi 4 to 16 in the Pi 5: https://forums.raspberrypi.com/viewtopic.php?p=2139329#p2139329

So, unless other changes have been made to the core arrangement, the correct theoretical GFLOPS value for the Pi 5 would seem to be 32 * (960/500) * (16/8) = 122.9 GFLOPS.

In my own tests of the Pi 4 vs Pi 5 GPU, the Pi 5 averages 200% performance increase, with peaks in the 250-300% range. It’s still not a fast GPU, but the uplift from the Pi 4 is pretty substantial, thankfully.

EDIT: BTW, I just changed to the “gl” driver and tested the glsl version of crt-aperture. It ran at 66 FPS (vs 67 FPS for slang), so no real difference.

crt-sine looks nice! It runs at 148 FPS vs 66 FPS for crt-aperture (integer scaling=on, core provided aspect). I’ll look into using it for the Pi 5.

2 Likes

Somebody did a shader benchmark 3 years ago here. According to this, you may also be able to run Easymode-Halation, which offers more options (masks) compared to standard Easymode. Maybe also Hyllian, there has been a lot of development since then, some shaders may have gotten slower. I recommend to try CRT-Consumer, GDV-Mini, and Guest Advanced Fastest for fast shaders.

2 Likes

You’ll probably not find another shader that has hermite filter, curvature, gamma, corners cut, deconvergence, glow, “crt colors” and runs faster than crt-sines. I tweaked the hell out of it. Now i am thinking ways to put back the slotmask i removed to fit glow (and still run full speed on a 2013 htc one m7).

2 Likes

Guess you are right about 120 gflops, hd630 is ~350 gflops and runs crt-aperture ~150 fps

1 Like

Yeah, that checks out pretty nicely.

1 Like