Rendering vector games on low end GPUs

dave_j · 23 April 2020 13:48

I’ve been playing about with the standalone version of the Vectrex emulator used by Libretro so see what can be done using the GPU to render the screen with a view to porting the results to the lr-vecx core.

Anti-aliasing and transparency effects were pretty straightforward although the Vectrex’s habit of usually but not always putting bright dots at the end of lines cause a bit of a problem.

Given low end GPU’s, particularly tile based ones, aren’t capable of running the usual multiple shader passes glow implementations I’ve added a cut price version of that too.

Performance wise, it works well with my 256Mb Pi1B.

Some screenshots:

Mine Storm without glow.

Mine Storm with glow.

Pole Position with glow.

What do people think? Any thoughts on the practicalities of implementing it in the libretro core?

hunterk · 23 April 2020 14:05

It looks awesome! How’s the performance impact?

dave_j · 23 April 2020 14:40

It seems fine. It’s a bit difficult to judge it exactly on my Pi1 since SDL seems to insist on a frame buffer that covers the whole screen rather than just the bit I’m using for the game so on a 16x9 screen it will be writing to nearly twice as much memory as it needs to. Running Pole Position, with it’s countdown timer, on my main PC at the same time as on my Pi1 shows that once up and running (after loading ROMs, etc) they both take the same time to finish a race.

The shaders are very simple. The fragment shader just does

vec3 colour = texture2D(texture, fragTexCoords).rgb;
gl_FragColor = vec4(colour, colour.r);

The glow bit is implemented by drawing the geometry twice. Once with wide lines at a low brightness for glow, followed by again with narrow lines and full brightness. There’s a bit more in the details but that’s basically it.

hunterk · 23 April 2020 14:50

whoa, that’s a very convincing glow for such a simple process. kudos

dave_j · 23 April 2020 15:01

It took a while to figure out something so simple.

The texture is a circle with blurry edges to get the anti-aliasing, there are different ones for glow and normal lines. The trick with the glow is to stop it getting too bright when there’s lots of overwrites. Using the frame buffer alpha and the GL blend function means additional values can have less and less impact until you reach a limit.

dave_j · 23 April 2020 15:07

Actually, I missed a line out - it’s mixed up with some #defined conditional code to help with debugging which made miss it. It should be

vec3 colour = texture2D(texture, fragTexCoords).rgb;
colour *= fragColour;
gl_FragColor = vec4(colour, colour.r);

dave_j · 10 May 2020 19:59

A quick update on this in case anyone is wondering if I’ve got anywhere with it. I’ve not been able to spend much time on it but have managed to make some progress. Today I’ve got the GPU rendering working on my main PC (i.e. not a Raspberry Pi). It’s not in a state to release, even as an experimental version, yet and there’s still a lot of work to do but getting it rendering properly is a good start. I’ll post updates when there’s any more significant developments.

hunterk · 10 May 2020 20:55

Heh, i was thinking just this morning “I wonder how that clever vector glow is coming along…” No hurry, I’m glad it’s still in the works

dave_j · 4 June 2020 21:03

I’ve not posted an update for a long time so here’s the current progress.

The hardware rendering core works on Raspberry Pis but there are issues with lower end ones because of the way Libretro supports hardware rendering. The list below assumes a 1920x1080 screen unless otherwise noted and assume no overclocking (Pi1s clock as d Pi Zeros).

Pi4 - Works OK.
Pi3 - Can’t quite support rendering at 1080 at full speed. Rendering at 1024 and upscaling with hardware linear filtering works fast enough and it’s almost impossible to tell the difference between that and rendering at 1080.
Pi2 - No Pi2 available for testing.
Pi1/Zero - Can’t support 1080 at anything like full speed (low 40s FPS instead of 50). Can do full speed on a 1280x1024 screen.

The render to a lower resolution and upscale approach is an effective way of getting enough extra FPS to run at full speed in many cases.

The next steps are to define what should be changeable via options and merge the current, GPU rendering specific, makefiles with the original ones.

Techie discussion of issues found follows:-

Most of the effort I’ve put in recently has been focused on trying to figure out what Retroarch/Libretro does and how it affects performance. Some of these things have a significant impact on tile based GPUs/memory bandwidth constrained systems.

Libretro supports hardware rendering by getting the core to render to a texture which the front end then copies to the frame buffer. This obviously has implications for memory bandwidth constrained systems as it has to render to the texture only to read it back and then write it to the frame buffer instead of just rendering to the frame buffer.

Retroarch always creates a texture that is square with dimensions a power of two (e.g. 1024x1024) and always big enough to contain the maximum render resolution you support rather than one that’s just big enough for your current screen. So the best case scenario for a 1080 screen is that it will create a 2048x2048 texture even though to emulate a Vectrex screen we only use 869x1080 or less than 1/4 of it. If your maximum render resolution is for a 4K screen (i.e. 2160 tall) it will still always create a 4096x4096 texture - even though you may have a small screen and only use a tiny fraction of it.

Clearing this in the normal manner, just using glClear(), would use a lot of memory bandwidth to clear some memory that does not form part of the final image. Fortunately a core can set up a scissor box to limit the clear to just the area, required which provides a useful performance boost.

There’s still a problem with this use of too big textures - the Retroarch front end does something with the texture that needlessly uses memory bandwidth. I found this out by setting the render resolution for a 1024 tall screen and testing it with the maximum supported resolution set as 1024 and 2048 gave different frame rates - even though the scissor box trick prevented that being significant from the core’s point of view. Obviously something must be happening in the front end to make this difference.This issue is likely the cause of a Pi3 needing the 1024 and upscale trick rather than being able to render at 1080.

Since limiting the maximum render resolution has such an impact on performance and trying to get optimal configurations for every GPU is an impossible task, I’m going to take the following approach:- If using OpenGLES with a version less than 3 use a maximum supported render resolution of 1024., otherwise use 2048.

There’s no easy way for a core to find out how big the screen is. It would be helpful when setting the maximum render resolution if this were available.

It would also be worth someone who understands how the front ends work looking to see if the scissor box trick can be used to improve performance for all hardware rendering cores. Another alternative would be to support non-power of two textures - there can be few GPUs that don’t support them now.

dave_j · 12 September 2020 16:59

Finally another update - with something to release!

The real world got in the way for a bit and I’ve been investigating a couple of issues that I haven’t found solutions for but I thought I’d release what I’ve produced so at least people can do some testing.

The outstanding I have are:

Raspberry Pis don’t run this core well. When I actually started measuring the FPS it turns out low end Pis can’t run this at full speed. More puzzlingly, although Pi4s can run the game fast enough, they won’t run the game the frame rate the core tells libretro the game runs at (50 FPS) unless you can also get your screen to run at 50 FPS. Everything runs fine on my main PC and neither of these problems occur on Pis with the standalone version. If you want to use this on Raspberry Pis (or other low end systems) I’m going to do something about releasing the standalone version in the near future.

The other problem is in switching from software to hardware rendering and back. Once a game is running, there doesn’t seem to be a way to get the front end to switch the renderer used without causing problems. Switching renderer isn’t something people will do often and there is a fairly easy work around, exit the emulator and start a new game (not restart the same one), so it’s not a huge hassle. It would be nice to get it working though. If anyone knows how to get libretro to switch renderer after it’s already started running with the other one then please let me know.

You can download the code from: https://github.com/grolliffe/libretro-vecx

You just need to use make to compile it.

If you want to test it from the command line use: retroarch -L vecx_libretro.so substituting whatever the name shared library has on your platform.

A few notes about the current state of the code:

I only run Linux and so haven’t been able to test it on Windows or Macs. If you can, please try it on those systems and let me know the results.
There have been a number of changes to the parent project since I forked it. I haven’t merged them into my version yet but will before submitting my changes to the parent project.
The parent project supports systems that don’t have OpenGL. My code currently requires OpenGL support. I’ll fix this before submitting my changes to the parent project.

Try it out and let me know what you think.

dave_j · 14 September 2020 22:08

I’ve merged the changes to the parent project since the fork and added bits to enable it to be compiled without GPU support.

Testing the new make files against the old ones (with make -B -n) produces the same compiler/linker options for cores that don’t support OpenGL and similarly if you disable GPU support (by passing NOGPU=1 as an argument to make) for those that do. This gives a reasonable degree of confidence that I haven’t broken the build process but it would still be nice to get some testing (both of building and running) done on Windows and Macs before I issue a pull request to the parent project.

The problem I was having with Pi4s appears to have been caused by an using older version of Retroarch. The core runs correctly at 50 FPS on a monitor displaying at 60 FPS now.

The only outstanding issue is what to do about switching renders. If the core is compiled with hardware support, it now defaults to hardware rendering enabled. The idea of this is that it provides the best results for new users.

hunterk · 15 September 2020 00:24

It should be possible to have a core option to choose between renderers the same way, e.g., beetle-psx does it. I/we don’t expect you to do that before merging, though, of course. We can always do it later

dave_j · 16 September 2020 14:19

I’ve checked what beetle-psx does and it has the same problem. (Confirming that was made more complicated by it supporting a software renderer and a software render to texture for the OpenGL renderer.) As it fundamentally has the same problem switching between software and OpenGL renderers and it appears to be a front end issue rather than just my code I’ve now issued a pull request.

ectorhga · 20 April 2021 12:11

Hi. I’m using a Raspberry Pi 4 with Retropie on it. I recently changed my graphic driver to KMS but now I can’t get lr-vecx to work with either hardware rendering (no picture, only the overlay) or software rendering (graphics are very choppy). I don’t know what other infos might be of help.

hunterk · 20 April 2021 13:13

I don’t know why the hardware rendering wouldn’t work, but for software rendering, did you increase the internal res multiplier in the core options?

ectorhga · 20 April 2021 15:08

Yes I did. The vectors themselves look ok, but there is a lot of jerkiness and flickering whenever I press any buttons. For example when I drive around in “Armor Attack” the borders start to flicker and sometimes my tank seems to jump “back” while driving. I tried 50Hz and 60Hz output, but it made no difference.

ectorhga · 17 November 2021 08:29

After a long time, I found the cause of the rendering errors: Somehow I had 1 frame run-ahead enabled which caused all the glitches. Everything is (very) fine now!

hunterk · 17 November 2021 13:47

ohhhh, good catch! Thanks for following up with your solution!