I’ve not posted an update for a long time so here’s the current progress.
The hardware rendering core works on Raspberry Pis but there are issues with lower end ones because of the way Libretro supports hardware rendering. The list below assumes a 1920x1080 screen unless otherwise noted and assume no overclocking (Pi1s clock as d Pi Zeros).
- Pi4 - Works OK.
- Pi3 - Can’t quite support rendering at 1080 at full speed. Rendering at 1024 and upscaling with hardware linear filtering works fast enough and it’s almost impossible to tell the difference between that and rendering at 1080.
- Pi2 - No Pi2 available for testing.
- Pi1/Zero - Can’t support 1080 at anything like full speed (low 40s FPS instead of 50). Can do full speed on a 1280x1024 screen.
The render to a lower resolution and upscale approach is an effective way of getting enough extra FPS to run at full speed in many cases.
The next steps are to define what should be changeable via options and merge the current, GPU rendering specific, makefiles with the original ones.
Techie discussion of issues found follows:-
Most of the effort I’ve put in recently has been focused on trying to figure out what Retroarch/Libretro does and how it affects performance. Some of these things have a significant impact on tile based GPUs/memory bandwidth constrained systems.
Libretro supports hardware rendering by getting the core to render to a texture which the front end then copies to the frame buffer. This obviously has implications for memory bandwidth constrained systems as it has to render to the texture only to read it back and then write it to the frame buffer instead of just rendering to the frame buffer.
Retroarch always creates a texture that is square with dimensions a power of two (e.g. 1024x1024) and always big enough to contain the maximum render resolution you support rather than one that’s just big enough for your current screen. So the best case scenario for a 1080 screen is that it will create a 2048x2048 texture even though to emulate a Vectrex screen we only use 869x1080 or less than 1/4 of it. If your maximum render resolution is for a 4K screen (i.e. 2160 tall) it will still always create a 4096x4096 texture - even though you may have a small screen and only use a tiny fraction of it.
Clearing this in the normal manner, just using glClear(), would use a lot of memory bandwidth to clear some memory that does not form part of the final image. Fortunately a core can set up a scissor box to limit the clear to just the area, required which provides a useful performance boost.
There’s still a problem with this use of too big textures - the Retroarch front end does something with the texture that needlessly uses memory bandwidth. I found this out by setting the render resolution for a 1024 tall screen and testing it with the maximum supported resolution set as 1024 and 2048 gave different frame rates - even though the scissor box trick prevented that being significant from the core’s point of view. Obviously something must be happening in the front end to make this difference.This issue is likely the cause of a Pi3 needing the 1024 and upscale trick rather than being able to render at 1080.
Since limiting the maximum render resolution has such an impact on performance and trying to get optimal configurations for every GPU is an impossible task, I’m going to take the following approach:- If using OpenGLES with a version less than 3 use a maximum supported render resolution of 1024., otherwise use 2048.
There’s no easy way for a core to find out how big the screen is. It would be helpful when setting the maximum render resolution if this were available.
It would also be worth someone who understands how the front ends work looking to see if the scissor box trick can be used to improve performance for all hardware rendering cores. Another alternative would be to support non-power of two textures - there can be few GPUs that don’t support them now.