Rendering vector games on low end GPUs

I’ve been playing about with the standalone version of the Vectrex emulator used by Libretro so see what can be done using the GPU to render the screen with a view to porting the results to the lr-vecx core.

Anti-aliasing and transparency effects were pretty straightforward although the Vectrex’s habit of usually but not always putting bright dots at the end of lines cause a bit of a problem.

Given low end GPU’s, particularly tile based ones, aren’t capable of running the usual multiple shader passes glow implementations I’ve added a cut price version of that too.

Performance wise, it works well with my 256Mb Pi1B.

Some screenshots:

Mine Storm without glow.

Mine Storm with glow.

Pole Position with glow.

What do people think? Any thoughts on the practicalities of implementing it in the libretro core?

1 Like

It looks awesome! How’s the performance impact?

It seems fine. It’s a bit difficult to judge it exactly on my Pi1 since SDL seems to insist on a frame buffer that covers the whole screen rather than just the bit I’m using for the game so on a 16x9 screen it will be writing to nearly twice as much memory as it needs to. Running Pole Position, with it’s countdown timer, on my main PC at the same time as on my Pi1 shows that once up and running (after loading ROMs, etc) they both take the same time to finish a race.

The shaders are very simple. The fragment shader just does

vec3 colour = texture2D(texture, fragTexCoords).rgb;
gl_FragColor = vec4(colour, colour.r);

The glow bit is implemented by drawing the geometry twice. Once with wide lines at a low brightness for glow, followed by again with narrow lines and full brightness. There’s a bit more in the details but that’s basically it.

whoa, that’s a very convincing glow for such a simple process. kudos

It took a while to figure out something so simple. :smiley:

The texture is a circle with blurry edges to get the anti-aliasing, there are different ones for glow and normal lines. The trick with the glow is to stop it getting too bright when there’s lots of overwrites. Using the frame buffer alpha and the GL blend function means additional values can have less and less impact until you reach a limit.

1 Like

Actually, I missed a line out - it’s mixed up with some #defined conditional code to help with debugging which made miss it. It should be

vec3 colour = texture2D(texture, fragTexCoords).rgb;
colour *= fragColour;
gl_FragColor = vec4(colour, colour.r);
1 Like

A quick update on this in case anyone is wondering if I’ve got anywhere with it. I’ve not been able to spend much time on it but have managed to make some progress. Today I’ve got the GPU rendering working on my main PC (i.e. not a Raspberry Pi). It’s not in a state to release, even as an experimental version, yet and there’s still a lot of work to do but getting it rendering properly is a good start. I’ll post updates when there’s any more significant developments.

2 Likes

Heh, i was thinking just this morning “I wonder how that clever vector glow is coming along…” No hurry, I’m glad it’s still in the works :slight_smile:

I’ve not posted an update for a long time so here’s the current progress.

The hardware rendering core works on Raspberry Pis but there are issues with lower end ones because of the way Libretro supports hardware rendering. The list below assumes a 1920x1080 screen unless otherwise noted and assume no overclocking (Pi1s clock as d Pi Zeros).

  • Pi4 - Works OK.
  • Pi3 - Can’t quite support rendering at 1080 at full speed. Rendering at 1024 and upscaling with hardware linear filtering works fast enough and it’s almost impossible to tell the difference between that and rendering at 1080.
  • Pi2 - No Pi2 available for testing.
  • Pi1/Zero - Can’t support 1080 at anything like full speed (low 40s FPS instead of 50). Can do full speed on a 1280x1024 screen.

The render to a lower resolution and upscale approach is an effective way of getting enough extra FPS to run at full speed in many cases.

The next steps are to define what should be changeable via options and merge the current, GPU rendering specific, makefiles with the original ones.

Techie discussion of issues found follows:-

Most of the effort I’ve put in recently has been focused on trying to figure out what Retroarch/Libretro does and how it affects performance. Some of these things have a significant impact on tile based GPUs/memory bandwidth constrained systems.

Libretro supports hardware rendering by getting the core to render to a texture which the front end then copies to the frame buffer. This obviously has implications for memory bandwidth constrained systems as it has to render to the texture only to read it back and then write it to the frame buffer instead of just rendering to the frame buffer.

Retroarch always creates a texture that is square with dimensions a power of two (e.g. 1024x1024) and always big enough to contain the maximum render resolution you support rather than one that’s just big enough for your current screen. So the best case scenario for a 1080 screen is that it will create a 2048x2048 texture even though to emulate a Vectrex screen we only use 869x1080 or less than 1/4 of it. If your maximum render resolution is for a 4K screen (i.e. 2160 tall) it will still always create a 4096x4096 texture - even though you may have a small screen and only use a tiny fraction of it.

Clearing this in the normal manner, just using glClear(), would use a lot of memory bandwidth to clear some memory that does not form part of the final image. Fortunately a core can set up a scissor box to limit the clear to just the area, required which provides a useful performance boost.

There’s still a problem with this use of too big textures - the Retroarch front end does something with the texture that needlessly uses memory bandwidth. I found this out by setting the render resolution for a 1024 tall screen and testing it with the maximum supported resolution set as 1024 and 2048 gave different frame rates - even though the scissor box trick prevented that being significant from the core’s point of view. Obviously something must be happening in the front end to make this difference.This issue is likely the cause of a Pi3 needing the 1024 and upscale trick rather than being able to render at 1080.

Since limiting the maximum render resolution has such an impact on performance and trying to get optimal configurations for every GPU is an impossible task, I’m going to take the following approach:- If using OpenGLES with a version less than 3 use a maximum supported render resolution of 1024., otherwise use 2048.

There’s no easy way for a core to find out how big the screen is. It would be helpful when setting the maximum render resolution if this were available.

It would also be worth someone who understands how the front ends work looking to see if the scissor box trick can be used to improve performance for all hardware rendering cores. Another alternative would be to support non-power of two textures - there can be few GPUs that don’t support them now.