What would it take to get a dynarec into beetle-saturn?

I talked about this on kronos’s discord and it seems 2 frames is the theoretical limit hardware-wise if you properly optimize your game for saturn’s smpc. I only got vague answers when i asked about games actually achieving those 2 frames though :sweat_smile:. Also, it seems only achievable on 2D games, 3D games need an additional frame.

2 Likes

What about PS/N64 games, say a 60FPS 3D game such as Tekken 3 and maybe F-Zero 64? Have anyone tested them? Maybe Virtua Fighter and other 3D fighters on the Saturn would be good to measure input lag?

Some time ago I overclocked Quake II on the PS1, via Swanstation, the game has unlocked frame rates so we can reach 60 FPS and it’s transformative compared to how it plays originally, one thing that is very noticeable is the input lag that is vastly improved. I may have a footage of that still.

With that, I believe that when PCs get better and they are already so fast today, we’ll be able to even surpass these Saturn/N64 input lag issues, like we can with the PS1.

Edit: Here it is:

https://youtu.be/pSCllF_9EqA

BTW, when I mentioned improving input lag with Quake II on PS1, it’s the lag inherent from the game’s own performance and not run ahead or preemptive frames and the such, this is mostly due to the console 3D capabilities with specific game engines.

If anyone wants to try this, this patch gets the dynarec working on x86-64 in Yabause 0.9.15 (tested on Ubuntu 24.10)

https://paste.debian.net/1378050/

Changes:

  1. Changed the asm code to use PIC (PLT and rip-relative)
  2. Moved the code generation buffer so that call/jmp can use 32-bit offsets
  3. Changed all the other pointers to 64 bits
  4. Fixed some stack alignment which is needed for SSE/SSE2
  5. Fixed the calls to MappedMemory* which changed in Yabause 0.9.15
  6. Added the call to new_scsp_exec which is needed since version 0.9.15
  7. Fixed the crash in Burning Rangers (register allocation in load_alloc)
  8. Applied the Qt5 compatibility patches from debian (HexValidator)

I tried a bunch of games and all seem to work fine with dynarec enabled.

Surprisingly, Linkle Liver Story seems to work, which is one game that is reported to have issues with accurate emulation of the CPU cache. So I’m not sure which games actually need that. Also, surprisingly, the HLE BIOS works.

Yabause 0.9.15 is outdated and definitely has some bugs, but the goal here was just to see if the dynarec would work, and it seems to work fine.

Speedup is around 5-10 FPS in some parts, but where the SH2 is not used much there is little difference.

Some games which crash YabaSanshiro 1.9.0 do work in Yabause 0.9.15, including Magic Knight Rayearth and Panzer Dragoon. However, I see some random crashes in both emulators. When I investigated the crashes in Yabause 0.9.15, most were in the VDP2 code.

Would it be more productive to try to fix YabaSanshiro, or work on another emulator?

4 Likes

The yabasanshiro core is abandonned (5 years behind upstream) and in a pretty horrible state (stability issues), the only reason someone would ever want to use it is due to the lack of alternatives for arm devices, and using standalone would still be highly recommended in that case. Fixing it for x86-64 seems like a waste of time since there are better options there (beetle-saturn, kronos).

1 Like

The only thing I would want for that outdated core is to get it to work with the vulkan driver just so I can use my slang shader presets on them on a Android device other than that probably no point if it’s not being updated to the current upstream version.

That’s what I’m thinking. The only reason I tested this on x86-64 is that it was much easier to debug than on ARM. Having demonstrated that the dynarec works and is compatible with the majority of games, I don’t see much reason to spend more effort on yabause or yabasanshiro.

My original plan was to put a dynarec into beetle-saturn for some speed improvement. While that still seems possible, it’s not a simple drop-in replacement for the existing SH2 core. The SH2 core needs a bunch of callbacks for handling memory mapping, timing, interrupts, etc, and all that would need to be reworked.

Dynarec in kronos on x86-64 would be doable as it’s based on yabause, but I don’t know how useful that would be since a major goal here would be to get something working on ARM or Android.

2 Likes

You can ask the author if he is interested at https://github.com/FCare/Kronos, however, as previously said, the cached sh2 interpreter in kronos is supposedly quite fast already.

I’m not sure, as FCare has indicated that he is prioritizing other projects.

I tried building Kronos, but I get “Cannot initialize Glew” when I run it. Maybe I’m missing something obvious, but libglew-dev is installed. I can try tracking down why gl is apparently not getting initialized properly, but even if I can get this working on x86, many of the ARM platforms that I would want to run this on have limited OpenGL support, so I’m wondering if I’m just wasting my time with this.

Basically, getting Saturn emulation at a playable speed on a wide range of platforms seems to run into two major issues

  1. Most newer emulators don’t support a dynamic recompiler
  2. Most newer emulators have increased dependence on OpenGL or Vulkan which isn’t supported well on some platforms

So this leaves a bunch of outdated cores which are buggy and don’t support many games.

One possible path forward is to combine the software renderer from mednafen/beetle with the dynarec from yabause. Despite not having been updated in more than ten years, the dynarec seems to work for the most part, although there is the issue of missing support for arm64.

Maybe there is some other path to getting this working that I’m missing?

Right now the current status is: Android/arm Handhelds: Yabasanshiro standalone looks like the best option. Moderately fast on medium range (for a handheld) devices like those rk3566, A133p or H700 chips. Yabasanshiro core works but with bugs (probably crashes, frameskip looks like not working). Yabause core works on Android too, and somewhat decent speed (is it any more recent than yabasanshiro?). Beetle saturn looks like it’s slow as hell (no arm64 dynarec? It still works full speed on a sd865 phone i have, but that’s miles ahead of those H700s of Ambernic). As for PCs i wouldn’t mind much, Beetle Saturn looks like it’s fast enough on any decent 6-7 years old PC.

As for arm, one device that uses those arm a53s is the raspberry pi, like zero 2w or pi 3b (i mean for easier debugging).

2 Likes

I rewrote the yabause port right after being done with porting yabasanshiro/kronos if that’s what you are wondering.

Yabause itself is older than yabasanshiro. Technically, yabasanshiro is a hard fork of yabause. I heard the author disrespected their license by distributing binaries while hoarding the changes he was doing to the source code, which is apparently what killed the yabause project. I stopped maintaining that core after he did something similar to me.

4 Likes

Basically, the guy was feeding open-source contributions into a closed-source project? Interesting. People normally out devs like that. Guess it didn’t pique their interest.

Yabasanshiro inherited yabause’s GPL-2 license and is not a closed source project, but the guy was twisting the license by releasing his source code weeks/months after releasing his binaries.

1 Like

YabaSanshiro is really the only thing that runs at a playable speed on rk3566 or similar ARM devices, but it is far too buggy to be usable. Many games do not work, and even for the games that do work, it crashes frequently.

Yabause 0.9.15 (the latest version, from 2016) seems to fix some of the bugs in Yabasanshiro. I noticed that the VDP2 layer priority in Arcana Strikes is fixed. It looks like yabasanshiro was based on an older version of yabause and didn’t get those fixes from upstream.

Even with those fixes, Yabause still has some major issues. There are random crashes, and sound emulation is glitchy. The frameskip is also a bit wacky, where the VDP1 and VDP2 seem to be out of sync such that things are drawn in the wrong place and not aligned. It might be somewhat usable if the crashes could be fixed, but I’m hesitant to put a lot of development effort into something so old and unmaintained.

On PC, mednafen/beetle isn’t too bad. It runs the VDP2 in a separate thread which gives some performance improvement, but in many cases it is slower than Yabause. The frameskip works better, so the slowdown isn’t quite as noticeable, but in games like Burning Rangers and Panzer Dragoon Saga, the lag is noticeable unless you have a fast CPU. This is where a dynamic recompiler would help, and enabling this in Yabause actually does speed up both of those games.

Kronos and Ymir, I couldn’t get working. I assume I’m missing some dependencies, but I don’t know what. Kronos actually builds from source, but doesn’t run.

There really seems to be no good options for Saturn emulation on the go. Mednafen is mostly usable if you have a newish laptop (not something 6-7 years old). Tiger Lake i7 from a few years ago seems to be adequate, but a Celeron definitely isn’t. Otherwise, if I’m going to have to use a desktop, I might as well get an FPGA. I might get a Mister Pi or something similar anyway, but that’s not very portable.

If I want something usable on ARM, what are the options? Try to fix yabause or yabasanshiro, which seems like it would take a lot of work and maybe not worth the effort. Or try to put the dynarec from yabause into beetle-saturn, which is possibly doable, but there is the issue that 32-bit is officially unsupported in mednafen. I haven’t looked into what exactly the issues are there. Since nothing in the Saturn is 64-bit, building for 64-bit just tends to slow things down, so you probably don’t want to build for 64-bit ARM if 32-bit is usable.

When I first saw yabasanshiro running on an arm handheld, I thought this would be great if it didn’t crash so much, but fixing this turns out to be much more difficult than I ever expected.

Also nothing for RPi5:

1 Like

Beetle Saturn works well on my 2019 laptop but surely that isn’t a budget laptop, it’s an i7-7700 paired with a gtx1060. Probably medium to high range still today for a laptop. The same core runs rock solid 60 frames on a sd865 phone, but that chip is a beast for emulation probably even faster than the laptop’s i7. It even runs Switch emulators almost full speed. For those h700/a133p handhelds i would go with yabasanshiro/yabause core and try to fix them to be usable at least. Don’t know if injecting yabause dynarec to beetle saturn would be less work. Yabasanshiro standalone even runs moderately “ok” on a rk3326 handheld i have, and very acceptable on a133p (Trimui Brick). Standalone Yabasanshiro performance of brick is almost identical on sd650 android (2x1.8ghz A72 ARM and 4x1.4ghz A53 i think), 1 frame skip for full speed pace.

You’ve had better luck with laptops than I have. Usually the fans fail after a few years, especially with such a high power CPU. I’m amazed that lasted 6 years. I often end up using my old Celeron N4020 laptop because it doesn’t overheat.

Yabause and Kronos fix some of the problems with Yabasanshiro, but break other things. Looking at the code, there’s actually quite a lot of differences. I’m not sure what parts are worth keeping. Does anyone want to test a bunch of games and make a list of everything that’s broken or fixed in each of these emulators?

Beetle-saturn seems like an easier starting point as it mostly works, even if rather slow in some cases.

Burning Rangers on sd650 Android yabasanshiro core loads, played some of training. Too slow like 28-30 fps on 3D, 57-60 fps on 2D. Frame skip not working which is the most important issue imo.

Yabause loads with abysmal performance like 18 fps on 2D, i guess not using dynarec at all. Using openGL in both.

Burning Rangers crashes the dynarec, so you must have been testing without dynarec. The crash can be fixed with the patch I posted above, if you are able to build from source.

Frame skip in Yabause is broken. It works in some games, but causes Burning Rangers to glitch majorly.

I’m not sure where to start trying to fix that. Fixing the dynarec was easier, in a way. It was clear what it was supposed to be doing, I just needed to fix the cases that were broken. I’m don’t know how the frameskip is supposed to work, or more specifically what parts of the VDP1/VDP2 rendering can be safely skipped.

Some testing on a Celeron N4020 @ 2.6 GHz, which is faster than a Snapdragon 650 or Rockchip RK3566, but still not fast enough for this.

Burning Rangers in Yabause with SH2 interpreter, disable frameskip and start the training, 43 FPS
Burning Rangers in Yabause with dynamic recompiler, disable frameskip and start the training, 47 FPS
Burning Rangers in Mednafen, start the training, 46 FPS, but only 16 displayed, rest skipped
Burning Rangers in Mednafen, start the training, frameskip off (video.frameskip 0) 42 FPS

I’m also seeing a weird input lag bug in mednafen standalone, where keypresses cause it to slow down and drop frames. This doesn’t happen in Yabause.

Mednafen frameskip actually doesn’t save much CPU time. I assume it’s still doing most of the rendering internally, just not displaying it. Yabause is skipping something important, and it glitches the game, making the frameskip unusable. For games where it can get away with this ‘unsafe’ frameskip, it does better, but I suspect this can never really work for every game.

SH2 emulation is around 20% to 30% of the CPU time, and using dynamic recompilation saves about half of that. This is the ‘easy’ and most obvious optimization, but it only saves so much.

As mentioned before, a big chunk of the time is spent on 68K and sound emulation. It might be possible to optimize that by doing some of it in a separate thread.

1 Like

Yabasanshiro dynarec works (or seems to) on sd650 android. It’s 30-35 fps on Radiant silvergun with dynarec and 20 without. Core seems to be broken, it starts the actual game on 60 fps for some seconds then slowdown and drops to 30-35. Or it could be that it skips 30 frames but not how it should, instead of 30/60 it runs 30/30

Radiant Silvergun seems to work fine in Yabause 0.9.15, with dynarec and frameskip enabled.

The issue with frameskip seems to be that it skips some of the VDP1 drawing. For some games this works, but if the game waits on the VDP1 then it hangs. Radiant Silvergun seems to tolerate this, and Burning Rangers doesn’t.

I’d have to look into it more to see exactly what’s going on, but I don’t know if there’s a way to fix this 100% and still retain the speedup from skipping that drawing.

BTW the dynarec crash in Burning Rangers was also the result of aggressive optimization. Specifically this code in Burning Rangers:

6030ef4: MOV.L @(0,r1),r0
6030ef6: MOV.L @(4,r1),r0
6030ef8: SUB r5,r0
6030efa: CMP/PZ r0
6030efc: BT  6030f00
6030efe: MOV #0,r0

The first load is a no-op since the second load overwrites r0. The register allocator ignores values that aren’t used, so it didn’t allocate a register, which then caused the assembler to fail. Simple fix is to just allocate a register.

Yabause is fast, but it’s definitely not 100% accurate.

3 Likes

Burning Rangers or Radiant silvergun never crash on my devices.

Performance on sd865

Beetle Saturn 60 fps, Yabasanshiro 180 fps, Yabause 120 fps.

Divide these numbers with 6 to get sd650 performance e.g. Yabause around 20. On sd865 (no need any frame skip anyway) looks like games are perfectly playable, but slower chips get absolutely butchered.

Perhaps if sort of fix frame skip and off-load 68k and sound to another thread will improve things a lot. In example Saturn Bomberman runs around 40. If had that little push it could reach full speed, at least the “light” games as a first step forward.