What would it take to get a dynarec into beetle-saturn?

Yabasanshiro inherited yabause’s GPL-2 license and is not a closed source project, but the guy was twisting the license by releasing his source code weeks/months after releasing his binaries.

2 Likes

YabaSanshiro is really the only thing that runs at a playable speed on rk3566 or similar ARM devices, but it is far too buggy to be usable. Many games do not work, and even for the games that do work, it crashes frequently.

Yabause 0.9.15 (the latest version, from 2016) seems to fix some of the bugs in Yabasanshiro. I noticed that the VDP2 layer priority in Arcana Strikes is fixed. It looks like yabasanshiro was based on an older version of yabause and didn’t get those fixes from upstream.

Even with those fixes, Yabause still has some major issues. There are random crashes, and sound emulation is glitchy. The frameskip is also a bit wacky, where the VDP1 and VDP2 seem to be out of sync such that things are drawn in the wrong place and not aligned. It might be somewhat usable if the crashes could be fixed, but I’m hesitant to put a lot of development effort into something so old and unmaintained.

On PC, mednafen/beetle isn’t too bad. It runs the VDP2 in a separate thread which gives some performance improvement, but in many cases it is slower than Yabause. The frameskip works better, so the slowdown isn’t quite as noticeable, but in games like Burning Rangers and Panzer Dragoon Saga, the lag is noticeable unless you have a fast CPU. This is where a dynamic recompiler would help, and enabling this in Yabause actually does speed up both of those games.

Kronos and Ymir, I couldn’t get working. I assume I’m missing some dependencies, but I don’t know what. Kronos actually builds from source, but doesn’t run.

There really seems to be no good options for Saturn emulation on the go. Mednafen is mostly usable if you have a newish laptop (not something 6-7 years old). Tiger Lake i7 from a few years ago seems to be adequate, but a Celeron definitely isn’t. Otherwise, if I’m going to have to use a desktop, I might as well get an FPGA. I might get a Mister Pi or something similar anyway, but that’s not very portable.

If I want something usable on ARM, what are the options? Try to fix yabause or yabasanshiro, which seems like it would take a lot of work and maybe not worth the effort. Or try to put the dynarec from yabause into beetle-saturn, which is possibly doable, but there is the issue that 32-bit is officially unsupported in mednafen. I haven’t looked into what exactly the issues are there. Since nothing in the Saturn is 64-bit, building for 64-bit just tends to slow things down, so you probably don’t want to build for 64-bit ARM if 32-bit is usable.

When I first saw yabasanshiro running on an arm handheld, I thought this would be great if it didn’t crash so much, but fixing this turns out to be much more difficult than I ever expected.

Also nothing for RPi5:

1 Like

Beetle Saturn works well on my 2019 laptop but surely that isn’t a budget laptop, it’s an i7-7700 paired with a gtx1060. Probably medium to high range still today for a laptop. The same core runs rock solid 60 frames on a sd865 phone, but that chip is a beast for emulation probably even faster than the laptop’s i7. It even runs Switch emulators almost full speed. For those h700/a133p handhelds i would go with yabasanshiro/yabause core and try to fix them to be usable at least. Don’t know if injecting yabause dynarec to beetle saturn would be less work. Yabasanshiro standalone even runs moderately “ok” on a rk3326 handheld i have, and very acceptable on a133p (Trimui Brick). Standalone Yabasanshiro performance of brick is almost identical on sd650 android (2x1.8ghz A72 ARM and 4x1.4ghz A53 i think), 1 frame skip for full speed pace.

You’ve had better luck with laptops than I have. Usually the fans fail after a few years, especially with such a high power CPU. I’m amazed that lasted 6 years. I often end up using my old Celeron N4020 laptop because it doesn’t overheat.

Yabause and Kronos fix some of the problems with Yabasanshiro, but break other things. Looking at the code, there’s actually quite a lot of differences. I’m not sure what parts are worth keeping. Does anyone want to test a bunch of games and make a list of everything that’s broken or fixed in each of these emulators?

Beetle-saturn seems like an easier starting point as it mostly works, even if rather slow in some cases.

Burning Rangers on sd650 Android yabasanshiro core loads, played some of training. Too slow like 28-30 fps on 3D, 57-60 fps on 2D. Frame skip not working which is the most important issue imo.

Yabause loads with abysmal performance like 18 fps on 2D, i guess not using dynarec at all. Using openGL in both.

Burning Rangers crashes the dynarec, so you must have been testing without dynarec. The crash can be fixed with the patch I posted above, if you are able to build from source.

Frame skip in Yabause is broken. It works in some games, but causes Burning Rangers to glitch majorly.

I’m not sure where to start trying to fix that. Fixing the dynarec was easier, in a way. It was clear what it was supposed to be doing, I just needed to fix the cases that were broken. I’m don’t know how the frameskip is supposed to work, or more specifically what parts of the VDP1/VDP2 rendering can be safely skipped.

Some testing on a Celeron N4020 @ 2.6 GHz, which is faster than a Snapdragon 650 or Rockchip RK3566, but still not fast enough for this.

Burning Rangers in Yabause with SH2 interpreter, disable frameskip and start the training, 43 FPS
Burning Rangers in Yabause with dynamic recompiler, disable frameskip and start the training, 47 FPS
Burning Rangers in Mednafen, start the training, 46 FPS, but only 16 displayed, rest skipped
Burning Rangers in Mednafen, start the training, frameskip off (video.frameskip 0) 42 FPS

I’m also seeing a weird input lag bug in mednafen standalone, where keypresses cause it to slow down and drop frames. This doesn’t happen in Yabause.

Mednafen frameskip actually doesn’t save much CPU time. I assume it’s still doing most of the rendering internally, just not displaying it. Yabause is skipping something important, and it glitches the game, making the frameskip unusable. For games where it can get away with this ‘unsafe’ frameskip, it does better, but I suspect this can never really work for every game.

SH2 emulation is around 20% to 30% of the CPU time, and using dynamic recompilation saves about half of that. This is the ‘easy’ and most obvious optimization, but it only saves so much.

As mentioned before, a big chunk of the time is spent on 68K and sound emulation. It might be possible to optimize that by doing some of it in a separate thread.

1 Like

Yabasanshiro dynarec works (or seems to) on sd650 android. It’s 30-35 fps on Radiant silvergun with dynarec and 20 without. Core seems to be broken, it starts the actual game on 60 fps for some seconds then slowdown and drops to 30-35. Or it could be that it skips 30 frames but not how it should, instead of 30/60 it runs 30/30

Radiant Silvergun seems to work fine in Yabause 0.9.15, with dynarec and frameskip enabled.

The issue with frameskip seems to be that it skips some of the VDP1 drawing. For some games this works, but if the game waits on the VDP1 then it hangs. Radiant Silvergun seems to tolerate this, and Burning Rangers doesn’t.

I’d have to look into it more to see exactly what’s going on, but I don’t know if there’s a way to fix this 100% and still retain the speedup from skipping that drawing.

BTW the dynarec crash in Burning Rangers was also the result of aggressive optimization. Specifically this code in Burning Rangers:

6030ef4: MOV.L @(0,r1),r0
6030ef6: MOV.L @(4,r1),r0
6030ef8: SUB r5,r0
6030efa: CMP/PZ r0
6030efc: BT  6030f00
6030efe: MOV #0,r0

The first load is a no-op since the second load overwrites r0. The register allocator ignores values that aren’t used, so it didn’t allocate a register, which then caused the assembler to fail. Simple fix is to just allocate a register.

Yabause is fast, but it’s definitely not 100% accurate.

3 Likes

Burning Rangers or Radiant silvergun never crash on my devices.

Performance on sd865

Beetle Saturn 60 fps, Yabasanshiro 180 fps, Yabause 120 fps.

Divide these numbers with 6 to get sd650 performance e.g. Yabause around 20. On sd865 (no need any frame skip anyway) looks like games are perfectly playable, but slower chips get absolutely butchered.

Perhaps if sort of fix frame skip and off-load 68k and sound to another thread will improve things a lot. In example Saturn Bomberman runs around 40. If had that little push it could reach full speed, at least the “light” games as a first step forward.

Yabasanshiro seems to be the less compatible, while beetle Saturn the most. Saturn Bomberman chd won’t load in yabasanshiro, loads in yabause but no music in menus, and loads with music in beetle Saturn.

IIRC there are some shenanigans about this game sending more cmds to vdp1’s cmd list queue than it can contain. I have a feeling your issue might be related.

What seems to happen in Burning Rangers with frameskip enabled is that the sound continues but the drawing lags behind. This happens in Yabause and Yabasanshiro. Maybe it’s just not processing the queue. I haven’t investigated it because there is a long list of other issues:

  • Input lag issue in Mednafen 1.32.1 where holding down a keypress causes the whole emulator to slow down and drop frames
  • Kronos 2.7.0 fails to start, with ‘Cannot initialize Glew’
  • Random lockups in VIDSoftVdp2DrawEnd() in Yabause 0.9.15
  • Save/load state sometimes doesn’t work in Yabause. It seems to help if you go into the game, then load state. Probably something is not getting initialized properly.
  • Lots of sound glitches in Yabause and Yabasanshiro
  • Nights into Dreams doesn’t work in Yabause 0.9.15. Seems mostly okay in Yabasanshiro, with minor graphical glitches.
  • Sprite/layer priority is wrong in Arcana Strikes in Yabasanshiro 1.9.0. This seems to be fixed in Yabause 0.9.15, but the world map is not visible.
  • Magic Knight Rayearth intro video is glitched in Yabasanshiro 1.9.0, crash when starting a game. Seems okay in Yabause 0.9.15
  • Crash during intro video in Panzer Dragoon in Yabasanshiro 1.9.0. Seems okay in Yabause 0.9.15
  • Dynamic recompiler in Yabause doesn’t work on ARM64. Not sure if what Yabasanshiro has would be usable.
  • Dynamic recompiler doesn’t emulate SH2 cache (minor issue, most games don’t need it)
  • No CHD support in Yabause

This is way more issues than I can look into right now. Maybe we can start a fund for development. Saturn emulation is in a pretty bad state in general.

2 Likes

Not sure why you are trying to use glew, this core doesn’t want it, but anyway Kronos requires OpenGL 4.3 and won’t work with gles, it should be possible to fix kronos’s shaders for gles usage but i highly doubt the accurate vdp1 hardware rendering would be friendly with gpus from low-end devices (on desktop computer it is recommended to have a GTX 1650 or better), so it feels like a waste of time for your use case (there should be benefits from porting its cached interpreter to other cores though)

Edit: i’m not even sure how you managed to re-enable the glew codepath, there used to be a parameter to force this instead of libretro’s glsym in the makefile but i removed it years ago.

1 Like

I built Kronos standalone because the instructions for building the libretro core seem to be out of date (setting KRONOS_LIBRETRO_CORE doesn’t actually do anything).

Anyway, testing OpenGL stuff seems like a waste of time right now, when there are much more basic compatibility problems. OpenGL renderer would be great, but first the games need to actually be playable.

1 Like

I don’t know what’s that supposed to mean but to build a libretro core you usually just need to locate its libretro makefile and run make on it. Kronos’s libretro makefile is located here. When having doubts you can also look at the .gitlab-ci.yml file at the root of repository, it includes the location of the makefile.

So one way to build it would be to run this :

# clone kronos branch into a folder named Kronos
git clone https://github.com/libretro/yabause.git -b kronos Kronos
# change directory
cd Kronos/yabause/src/libretro/
# run make with 4 parallel jobs
make -j4
1 Like

I believe to make a core running “ok” on medium range handhelds it would be better start from the ground, except if pick one “light” core and start fixing things. Messing with all cores probably won’t be very productive as some are much heavier than others, you could end up with a dynarec for beetle saturn and run worse than yabasanshiro as i think 3D emulation eats many resources too.

I downloaded Kronos 2.7.0_official_release from github and tried to build that, but the instructions for building the libretro core were outdated or incorrect.

Building the source from libretro git did work. I really don’t have the right GPU for this. It runs, but slowly. There are some graphical glitches, although overall it seems a bit better than the old yabause.

Saturn emulation in general is in a pretty bad state, and I don’t know what’s really worth fixing.

One path would be to try to reintegrate the various changes from Yabause, YabaSanshiro, and Kronos. Each of these seems to fix some things, and break others.

The other path would be to try to improve beetle-saturn. The compatibility with games is actually pretty good here, it’s just slow and missing some features.

There are a bunch of things that I would need to understand better before I could really work on this. One is how does the frameskip in Yabause actually work. Mednafen/beetle doesn’t speed up much with frameskip, whereas dropping frames in Yabause actually saves a bunch of time, but glitches some games.

The HLE BIOS in Yabause seems to work, so I’m wondering why other emulators dropped this since it seems like it could speed up some things.

The dynarec also seems to work pretty well, so I’m wondering what obstacles there are to using this. Maintaining a dynarec does take some extra work since there is the risk that the assembly code breaks due to changes in APIs or operating systems, but emulators for pretty much everything newer than Saturn (DC, PS2, GC, etc) use dynamic recompilation, so this seems manageable.

The SCSP (sound processor) emulation takes a lot of CPU time, so I’m wondering what exactly is going on there. Can this be done in a separate thread?

Medanfen has some major input lag issues and I don’t see this with PS1 emulation, so this seems specific to the Saturn core. I don’t see a similar slowdown in Yabause.

I’m really not sure where to start with a lot of this.

Because there are countless bugs with it, and it doesn’t even allow to swap disc with m3u.

It had 3 frames of input lag on average the last time i checked, which is pretty low for saturn emulation as previously discussed. Why are you comparing apples and melons ?

That’s not the issue. The problem is that, at least with keyboard input, when you press a key it stalls the entire emulator. Even if it has only 3 frames of input lag, it takes much longer to process those frames. With frameskip enabled, it will drop some frames.

I don’t know what’s going on with that. It’s one of many issues to investigate.

I looked at yabause core source, that looks crazy complex lol. 8.000 lines of code for vdp1 only iirc. Talking about 100.000+ lines of code probably.

Did some game tests on i7-7700 Linux laptop, Hexen looks like it runs more smooth on Yabause than the other 2 cores (beetle saturn, yabasanshiro). Like higher internal framerate or something, with some minor slowdowns. Would need some high determination and elite expertise on coding and Saturn internals to improve things. Probably drop anything you do on your free time for the next 6 months (if one has the elite level i mentioned).

I would need to understand the VDP1 code better before I could make a lot of progress on this. Unfortunately, I probably won’t have a lot of free time to work on this over the next 6 months. This is why I suggested sponsoring another developer.

I saw the dynarec as a straightforward way to fix some of the performance problems, so that was the first thing I tried. It does work, and gives some speed boost, but it’s clear that this alone isn’t going to be enough to achieve the desired performance goals. The VDP1 and SCSP need a lot of work, and there are still quite a lot of other bugs.

Saturn emulation is currently somewhat acceptable with the right hardware, but there really isn’t a good solution for handheld or mobile, or even many laptops. Moreover, some games don’t work in certain emulators. This likely isn’t going to change soon, unless we can find someone who really understands this stuff and is able to work on it.