Fantastic work - thank you so much!
So many shaders, so little time…
Greetings @kokoko3k, I’m not sure if this has been asked already but seeing that this is such an efficient shader when it comes to quality/performance cost, is it possible to further push the efficiency envelope by doing away with the Bezel and ambient lighting if a user wanted to do so?
I’m not that awake rn, but have you looked through the documentation, iirc there’s something about commenting out parameters to hardcode the values for a performance boost (I’m unsure how relavent it is to this tho, I do understand this is different from outright removing the code completely, but speed boost is speed boost)
It would surely go up, but not that much.
By pure gut feeling, it would go from about 100fps on my haswell at 1080p and 240p content to 110-120.
This is because I’ve put to about 100fps the target speed and a certain set of features while developing, so the shader has the maximum performance/cost with those features enabled.
If you disable everything, it will not fly, but will reveal a quite high basal consumption.
To simplify, setting the params to statics will lower it because the shader have not to check for every single pixel if it has or not to do that or the other feature, but the basal cost due to the multiple passes (which in turn are good when they work in sinergy to provide all the features) will stay still.
However It is possible to reuse the existing code to create a new shader without all the bells and whistles (just mask,glow,bloom) and there it would be much faster.
Can I ask you why you’re asking?
Just wondering how fast it can go because not everyone needs the Bezel, reflections and lighting effects.
The faster it can go, the lower the performance requirements and the wider the range of hardware it can run on.
Wondering if anyone has realistic FPS targets for content greater then 240p on Haswell? I’m running an i5-4200m in Windows, so using GLCore, at 1080p. I’ve been testing against Championship Sprint, which is 512x384 @60.09Hz, using the Mame core. I’m having a hard time maintaining 60FPS using the Monitor-Balanced preset and also taking advantage of config-user.txt. Seemed like using config-user.txt only gave me a few FPS boost.
Considering that final fight (384x224) reaches about 100fps with that preset here (i5-4590), yeah, it seems we are 20% under for championship sprint, at least on paper. Roughly.
However I think i tried to play hires games that don’t use hardware renderers (so mame) and they were ok; I’ll tryn) asap.
Btw, you’ve several options you can mix and I’ve read tou already tried config-user.txt, but still:
-
config-user.txt turn parameters static, used fully, gave me 25% boost, it should suffice, but not that handy.
-
again, config.txt, have you already used:
#define HALVE_BORDER_UPDATE 1.0
#define SKIP_RANDOM 1.0 -
still config.txt:
#define DELTA_RENDER 1.0
then tune the relative parameters: (I use the following on my s10 to save battery, limit throttling and gain fps):
#define DELTA_RENDER_FORCE_REFRESH 7.0 #define DELTA_RENDER_CHECK_AREA 3.0
Delta render works very well for static content like Championship Sprint.
- This would easilly double ypur fps allowing me to play Flycast 640x480 (which taxes the gpu by its own (being hardware redndered) at 1080p with shader on:
In shader menu, turn the pass named flick_and_noise from 2x to 1x scaling and apply the changes.
After that some effects will cease to work like fxaa, other will behave different (glow blur pass may need sharpness adjust), but that is not that important for hires content.
This is what Batpcera uses for hires content. maybe I should release some 1x presets.
I’ll update this post as soon as I’ll test CS on my intel rig.
EDIT-
Just checked, and without any mod on my side, I can reach 70+ fps without any special setting.
Don’t pay attention to the scanline size, it is so because the shader is interlacing it:
This is on Archlinux with GLCore, with an Intel® Core™ i5-4590 CPU @ 3.30GHz
Hmm, thank you for the ideas and for testing! I am pretty new to going any deeper with shaders than switching presets, so will try your ideas out. I’ve got another machine running Batocera on an i5-4670s, and I see the same low FPS there. I’m using Vulkan on the Batocera machine, though, and was trying the various Duimon presets it includes. I see the FPS drop for other games with the same res like Super Sprint and APB. I don’t know what the right term is, but the screen shakes and hops on those games, too. Arch Rivals is another game that shakes and hops really bad, but it’s at a different res and runs at 30hz. Seems likes maybe there are some RA settings I need to tweak rather than tweaking the shader settings.
Your cpu seems slightly faster than mine.
It has to be something specific with your system (to start with, i never tried running koko-aio on windows on my Haswell).
But the fact that you are testing Batocera too, which is Linux, puts the blame to something different than the operating system.
Batocera uses Vulkan (which is not well supported on Haswell and for me it gives lower performances), so I’d try to switch to GLCore there (remember to restart retroarch after the change).
As per the “shake”, if you mean continuous line flickering up and down, then it is because the shader is emulating the interlacing seen in old CRTs.
The effect does not appear to be good when the core refresh is just 30fps, and maybe you would not like it on 60fps games (supersprint).
You can turn it off; set flicker power to 0.0.
Optionally, you can set Hi-res scanlines type to 2.0 to clear the rest of the flickering.
You could seek help with Batocera on Discord; there is a thread specific to koko-aio, User Duglim adapted some “tweaked for speed” shader-sets based on koko-aio.
Thanks again, I will start trying some of this stuff out later today. All of my issues seem to be tied to any game with a res higher than about 240p or so. It’s those higher res games that have the really annoying super flicker in addition to poor FPS while the 240p and under games look fine.
Starting from yesterday, the development repo (and next official release) requires Retroarch to be at least at version 1.16, because koko-aio is using some features not available before.
Older retroarch versions will simply refuse to load the shader.
To overcome slowness when applying the shader to hires content like flycast core that outputs at 640x480, I made a copy of the presets that works with halved internal resolution.
(unfortunately retroarch won’t allow to do that runtime, hence the need of duping)
For my simpler managment, the new directory structure will be:
koko-aio
L Presets_Handhelds-ng
L [..]
L Presets-4.1
L [..]*1
L Presets-ng
L [..]*2
L Presets_Halfres
L Presets-4.1
L [..]*1
L Presets-ng
L [..]*2
So, Presets_Halfres is the new folder and will contain Presets-4.1 and Presets-ng which, in turn will contain exactly the same presets found in the other Presets-4.1 and Presets-ng folders.
This would allow to run Hires 60hz games at the same speed (IGPs) of lowres one and with the same quality.
I’m here for an advice for a descriptive name for the new folder.
I don’t think Presets_Halfres is enough to describe the intent of running hires games at the same gpu use/speed of low resolution one.
So, do you have any advice? i’m considering:
- Presets_HiresCore_HalfResGPU
- Presets_HiresCoresOptimized
- Presets_OptimizedForHiresCores
- Presets_HiResOptimized
- Presets_HiRes_Perf
- Presets_FastHiRes
- Presets_HiResPerformance
- Presets_HiResLowGPU
- Presets_HiResSpeed
- Presets_HiResAccel
- Presets_HiresCores_Fast
- Anything else!
Thanks
I think this is the most accurately descriptive of the bunch.
I’m new to a lot of this, but what are you recommending to someone like myself focused on arcade/mame? It’s not necessarily a hires core, it’s a hires game. And what do you consider hires for an old i5 iGPU - for me I guess it’s roms running > 240p.
Like we dicussed on Discord, I ended up getting the performance I wanted for the > 240p games by dropping scale from 2.0 to 1.0 and setting as many params as I could in config-user.
On the flip side, on my non-VRR monitor, I still got crazy jitter for games like Arch Rivals that run at 512x480@30 Hz. I ended up hacking together something from Duglim’s N100 Batocera presets to solve that.
Hires is a vague term indeed; i’d say an ideal square of at least 400x400px.
As per the recommendation, I’m not sure I’ve got it. What’s the question?
So for mame core, use the standard presets for under 400x400 roms and these new hires ones for 400x400 and over roms?
Prefer the standard ones if your system can handle them, revert to new halfres presets when it struggles, by keeping in mind that halfres presets quality depends on the core resolution.
That’s the rule.
I think I’ll pick the suggested name of Presets_HiresCores_Fast and will update the dev repo later, today.
At the risk of not really understanding the big picture with RA and cores, I’d suggest swapping “cores” with “content” in the naming convention because if you’re a newb like me, you’d think it’s one-preset-fits-all for a core and that’s not always the case.
Lol, Agreed!
Indeed I just pushed just before reading your message.
I decided to go for Presets_HiresGames_Fast.
Also added a short readme.txt in that folder eaplaining as follows:
These presets match the upper folder’s options, but without internal upscaling.
This lowers GPU load and maintains similar quality,
assuming the game’s native resolution is sufficient.