What would it take to get a dynarec into beetle-saturn?

Beetle Saturn works well on my 2019 laptop but surely that isn’t a budget laptop, it’s an i7-7700 paired with a gtx1060. Probably medium to high range still today for a laptop. The same core runs rock solid 60 frames on a sd865 phone, but that chip is a beast for emulation probably even faster than the laptop’s i7. It even runs Switch emulators almost full speed. For those h700/a133p handhelds i would go with yabasanshiro/yabause core and try to fix them to be usable at least. Don’t know if injecting yabause dynarec to beetle saturn would be less work. Yabasanshiro standalone even runs moderately “ok” on a rk3326 handheld i have, and very acceptable on a133p (Trimui Brick). Standalone Yabasanshiro performance of brick is almost identical on sd650 android (2x1.8ghz A72 ARM and 4x1.4ghz A53 i think), 1 frame skip for full speed pace.

You’ve had better luck with laptops than I have. Usually the fans fail after a few years, especially with such a high power CPU. I’m amazed that lasted 6 years. I often end up using my old Celeron N4020 laptop because it doesn’t overheat.

Yabause and Kronos fix some of the problems with Yabasanshiro, but break other things. Looking at the code, there’s actually quite a lot of differences. I’m not sure what parts are worth keeping. Does anyone want to test a bunch of games and make a list of everything that’s broken or fixed in each of these emulators?

Beetle-saturn seems like an easier starting point as it mostly works, even if rather slow in some cases.

Burning Rangers on sd650 Android yabasanshiro core loads, played some of training. Too slow like 28-30 fps on 3D, 57-60 fps on 2D. Frame skip not working which is the most important issue imo.

Yabause loads with abysmal performance like 18 fps on 2D, i guess not using dynarec at all. Using openGL in both.

Burning Rangers crashes the dynarec, so you must have been testing without dynarec. The crash can be fixed with the patch I posted above, if you are able to build from source.

Frame skip in Yabause is broken. It works in some games, but causes Burning Rangers to glitch majorly.

I’m not sure where to start trying to fix that. Fixing the dynarec was easier, in a way. It was clear what it was supposed to be doing, I just needed to fix the cases that were broken. I’m don’t know how the frameskip is supposed to work, or more specifically what parts of the VDP1/VDP2 rendering can be safely skipped.

Some testing on a Celeron N4020 @ 2.6 GHz, which is faster than a Snapdragon 650 or Rockchip RK3566, but still not fast enough for this.

Burning Rangers in Yabause with SH2 interpreter, disable frameskip and start the training, 43 FPS
Burning Rangers in Yabause with dynamic recompiler, disable frameskip and start the training, 47 FPS
Burning Rangers in Mednafen, start the training, 46 FPS, but only 16 displayed, rest skipped
Burning Rangers in Mednafen, start the training, frameskip off (video.frameskip 0) 42 FPS

I’m also seeing a weird input lag bug in mednafen standalone, where keypresses cause it to slow down and drop frames. This doesn’t happen in Yabause.

Mednafen frameskip actually doesn’t save much CPU time. I assume it’s still doing most of the rendering internally, just not displaying it. Yabause is skipping something important, and it glitches the game, making the frameskip unusable. For games where it can get away with this ‘unsafe’ frameskip, it does better, but I suspect this can never really work for every game.

SH2 emulation is around 20% to 30% of the CPU time, and using dynamic recompilation saves about half of that. This is the ‘easy’ and most obvious optimization, but it only saves so much.

As mentioned before, a big chunk of the time is spent on 68K and sound emulation. It might be possible to optimize that by doing some of it in a separate thread.

1 Like

Yabasanshiro dynarec works (or seems to) on sd650 android. It’s 30-35 fps on Radiant silvergun with dynarec and 20 without. Core seems to be broken, it starts the actual game on 60 fps for some seconds then slowdown and drops to 30-35. Or it could be that it skips 30 frames but not how it should, instead of 30/60 it runs 30/30

Radiant Silvergun seems to work fine in Yabause 0.9.15, with dynarec and frameskip enabled.

The issue with frameskip seems to be that it skips some of the VDP1 drawing. For some games this works, but if the game waits on the VDP1 then it hangs. Radiant Silvergun seems to tolerate this, and Burning Rangers doesn’t.

I’d have to look into it more to see exactly what’s going on, but I don’t know if there’s a way to fix this 100% and still retain the speedup from skipping that drawing.

BTW the dynarec crash in Burning Rangers was also the result of aggressive optimization. Specifically this code in Burning Rangers:

6030ef4: MOV.L @(0,r1),r0
6030ef6: MOV.L @(4,r1),r0
6030ef8: SUB r5,r0
6030efa: CMP/PZ r0
6030efc: BT  6030f00
6030efe: MOV #0,r0

The first load is a no-op since the second load overwrites r0. The register allocator ignores values that aren’t used, so it didn’t allocate a register, which then caused the assembler to fail. Simple fix is to just allocate a register.

Yabause is fast, but it’s definitely not 100% accurate.

3 Likes

Burning Rangers or Radiant silvergun never crash on my devices.

Performance on sd865

Beetle Saturn 60 fps, Yabasanshiro 180 fps, Yabause 120 fps.

Divide these numbers with 6 to get sd650 performance e.g. Yabause around 20. On sd865 (no need any frame skip anyway) looks like games are perfectly playable, but slower chips get absolutely butchered.

Perhaps if sort of fix frame skip and off-load 68k and sound to another thread will improve things a lot. In example Saturn Bomberman runs around 40. If had that little push it could reach full speed, at least the “light” games as a first step forward.

Yabasanshiro seems to be the less compatible, while beetle Saturn the most. Saturn Bomberman chd won’t load in yabasanshiro, loads in yabause but no music in menus, and loads with music in beetle Saturn.

IIRC there are some shenanigans about this game sending more cmds to vdp1’s cmd list queue than it can contain. I have a feeling your issue might be related.

What seems to happen in Burning Rangers with frameskip enabled is that the sound continues but the drawing lags behind. This happens in Yabause and Yabasanshiro. Maybe it’s just not processing the queue. I haven’t investigated it because there is a long list of other issues:

  • Input lag issue in Mednafen 1.32.1 where holding down a keypress causes the whole emulator to slow down and drop frames
  • Kronos 2.7.0 fails to start, with ‘Cannot initialize Glew’
  • Random lockups in VIDSoftVdp2DrawEnd() in Yabause 0.9.15
  • Save/load state sometimes doesn’t work in Yabause. It seems to help if you go into the game, then load state. Probably something is not getting initialized properly.
  • Lots of sound glitches in Yabause and Yabasanshiro
  • Nights into Dreams doesn’t work in Yabause 0.9.15. Seems mostly okay in Yabasanshiro, with minor graphical glitches.
  • Sprite/layer priority is wrong in Arcana Strikes in Yabasanshiro 1.9.0. This seems to be fixed in Yabause 0.9.15, but the world map is not visible.
  • Magic Knight Rayearth intro video is glitched in Yabasanshiro 1.9.0, crash when starting a game. Seems okay in Yabause 0.9.15
  • Crash during intro video in Panzer Dragoon in Yabasanshiro 1.9.0. Seems okay in Yabause 0.9.15
  • Dynamic recompiler in Yabause doesn’t work on ARM64. Not sure if what Yabasanshiro has would be usable.
  • Dynamic recompiler doesn’t emulate SH2 cache (minor issue, most games don’t need it)
  • No CHD support in Yabause

This is way more issues than I can look into right now. Maybe we can start a fund for development. Saturn emulation is in a pretty bad state in general.

2 Likes

Not sure why you are trying to use glew, this core doesn’t want it, but anyway Kronos requires OpenGL 4.3 and won’t work with gles, it should be possible to fix kronos’s shaders for gles usage but i highly doubt the accurate vdp1 hardware rendering would be friendly with gpus from low-end devices (on desktop computer it is recommended to have a GTX 1650 or better), so it feels like a waste of time for your use case (there should be benefits from porting its cached interpreter to other cores though)

Edit: i’m not even sure how you managed to re-enable the glew codepath, there used to be a parameter to force this instead of libretro’s glsym in the makefile but i removed it years ago.

1 Like

I built Kronos standalone because the instructions for building the libretro core seem to be out of date (setting KRONOS_LIBRETRO_CORE doesn’t actually do anything).

Anyway, testing OpenGL stuff seems like a waste of time right now, when there are much more basic compatibility problems. OpenGL renderer would be great, but first the games need to actually be playable.

1 Like

I don’t know what’s that supposed to mean but to build a libretro core you usually just need to locate its libretro makefile and run make on it. Kronos’s libretro makefile is located here. When having doubts you can also look at the .gitlab-ci.yml file at the root of repository, it includes the location of the makefile.

So one way to build it would be to run this :

# clone kronos branch into a folder named Kronos
git clone https://github.com/libretro/yabause.git -b kronos Kronos
# change directory
cd Kronos/yabause/src/libretro/
# run make with 4 parallel jobs
make -j4
1 Like

I believe to make a core running “ok” on medium range handhelds it would be better start from the ground, except if pick one “light” core and start fixing things. Messing with all cores probably won’t be very productive as some are much heavier than others, you could end up with a dynarec for beetle saturn and run worse than yabasanshiro as i think 3D emulation eats many resources too.

I downloaded Kronos 2.7.0_official_release from github and tried to build that, but the instructions for building the libretro core were outdated or incorrect.

Building the source from libretro git did work. I really don’t have the right GPU for this. It runs, but slowly. There are some graphical glitches, although overall it seems a bit better than the old yabause.

Saturn emulation in general is in a pretty bad state, and I don’t know what’s really worth fixing.

One path would be to try to reintegrate the various changes from Yabause, YabaSanshiro, and Kronos. Each of these seems to fix some things, and break others.

The other path would be to try to improve beetle-saturn. The compatibility with games is actually pretty good here, it’s just slow and missing some features.

There are a bunch of things that I would need to understand better before I could really work on this. One is how does the frameskip in Yabause actually work. Mednafen/beetle doesn’t speed up much with frameskip, whereas dropping frames in Yabause actually saves a bunch of time, but glitches some games.

The HLE BIOS in Yabause seems to work, so I’m wondering why other emulators dropped this since it seems like it could speed up some things.

The dynarec also seems to work pretty well, so I’m wondering what obstacles there are to using this. Maintaining a dynarec does take some extra work since there is the risk that the assembly code breaks due to changes in APIs or operating systems, but emulators for pretty much everything newer than Saturn (DC, PS2, GC, etc) use dynamic recompilation, so this seems manageable.

The SCSP (sound processor) emulation takes a lot of CPU time, so I’m wondering what exactly is going on there. Can this be done in a separate thread?

Medanfen has some major input lag issues and I don’t see this with PS1 emulation, so this seems specific to the Saturn core. I don’t see a similar slowdown in Yabause.

I’m really not sure where to start with a lot of this.

Because there are countless bugs with it, and it doesn’t even allow to swap disc with m3u.

It had 3 frames of input lag on average the last time i checked, which is pretty low for saturn emulation as previously discussed. Why are you comparing apples and melons ?

That’s not the issue. The problem is that, at least with keyboard input, when you press a key it stalls the entire emulator. Even if it has only 3 frames of input lag, it takes much longer to process those frames. With frameskip enabled, it will drop some frames.

I don’t know what’s going on with that. It’s one of many issues to investigate.

I looked at yabause core source, that looks crazy complex lol. 8.000 lines of code for vdp1 only iirc. Talking about 100.000+ lines of code probably.

Did some game tests on i7-7700 Linux laptop, Hexen looks like it runs more smooth on Yabause than the other 2 cores (beetle saturn, yabasanshiro). Like higher internal framerate or something, with some minor slowdowns. Would need some high determination and elite expertise on coding and Saturn internals to improve things. Probably drop anything you do on your free time for the next 6 months (if one has the elite level i mentioned).

I would need to understand the VDP1 code better before I could make a lot of progress on this. Unfortunately, I probably won’t have a lot of free time to work on this over the next 6 months. This is why I suggested sponsoring another developer.

I saw the dynarec as a straightforward way to fix some of the performance problems, so that was the first thing I tried. It does work, and gives some speed boost, but it’s clear that this alone isn’t going to be enough to achieve the desired performance goals. The VDP1 and SCSP need a lot of work, and there are still quite a lot of other bugs.

Saturn emulation is currently somewhat acceptable with the right hardware, but there really isn’t a good solution for handheld or mobile, or even many laptops. Moreover, some games don’t work in certain emulators. This likely isn’t going to change soon, unless we can find someone who really understands this stuff and is able to work on it.

Medanfen has some major input lag issues and I don’t see this with PS1 emulation, so this seems specific to the Saturn core. I don’t see a similar slowdown in Yabause.

(I’m not an emu dev. Just my experience and what I read)

Saturn games often (though not always) had more input lag than their PS1 counterpart. This apparently happened on hardware as well.

Resident Evil for example lags quite noticeably more on Saturn than on PS1. While Street Fighter Alpha (1) seems on part with PS1.

Likewise, some arcade ports lag quite a bit more on Saturn than on arcade (the Saturn King of Fighters ports for example) while other ports feel pretty much the same. Ex: Sat port of Super Gem Fighters Mini Mix. Maybe one additional frame. Usually, I found Capcom arcade ports are quite good in this regard while SNK’s ports…not so much.

There’s also Saturn exclusive such as SteamGear Mash that feels extremely responsive -almost next frame but most likely 1 frame.

So basically seems it was very dependent on how the games were programmed and optimized and whether or not the devs optimized the processing chain. If they didn’t, then yeah, it could lag quite a bit more than PS1.

All this to say I don’t believe Beetle Saturn adds any significant lag -it can be pretty much on part with hardware, provided the user uses lag reduction options like hard sync on their end of course (there’s also an option in Beetle Saturn called “Mid-frame Input Synchronization” which might help decrease it further but will add an additional CPU cost).

The slowdown when keys are pressed seems to be specific to mednafen standalone, I don’t notice it with beetle-saturn (mednafen_saturn_libretro.so, built from github.com/libretro/beetle-saturn-libretro.git)

This seems to be an issue with how the emulator handles keyboard input, which is different than the issues with the Saturn hardware.

I’m not quite sure what the Mid-frame Input Synchronization setting does. There doesn’t seem to be an equivalent in Mednafen 1.32.1. Unlike mednafen standalone, beetle-saturn doesn’t seem to have any frameskip, it just slows down and the audio stutters.

Also, save states work in mednafen, and not in beetle-saturn. Beetle-saturn isn’t quite the same as mednafen that it’s based upon.