Building with NEON support on armv8 (Rpi3)

Hi guys,

I am trying to build an armv8 (64bit kernel and libs) RetroArch with NEON support, so I do:

CFLAGS="-O3 -march=armv8-a+crc -mtune=cortex-a53" CXXFLAGS="-O3 -march=armv8-a+crc -mtune=cortex-a53" ./configure --enable-neon

But unfortunately I get this:

Checking presence of predefined macro ARM_NEON … no Build assumed that ARM_NEON is defined, but it’s not. Exiting …

So, what is going on here? Is NEON inherent to armv8 or something like that? Not available? I have seen the switch specific Makefile hast HAVE_NEON, for example… NEON is important on those ARM plaftorms because it saves a lof of CPU on the audio resampling code.

32-bit NEON and 64-bit NEON have slightly different, incompatible syntax. I don’t think we have any 64-bit NEON optimizations in RetroArch.

I think adding -mfpu=neon-fp-armv8 could fix --enable-neon, though I have only used it in 32 bit Raspbian.

I got this from RetroPie’s script [link].

# note the rpi3 currently uses the rpi2 binaries - for ease of maintenance - rebuilding from source
# could improve performance with the compiler options below but needs further testing
function platform_rpi3() {
    __default_cflags="-O2 -march=armv8-a+crc -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mfloat-abi=hard -ftree-vectorize -funsafe-math-optimizations"
    __default_asflags=""
    __default_makeflags="-j2"
    __platform_flags="arm armv8 neon rpi gles"
}

EDIT: this reference specifies -mneon-for-64bits also.

@metchebe I have tried both CFLAGS and the configuration script still fails. But thanks for trying to help me! I believe these are for 32bit NEON.

@hunterk I had forgotten about 32bit NEON vs 64bit NEON… Yeah, that seems to be the problem. So, what about platforms like Switch? I have looked better and HAVE_NEON is only activated in Makefile.switch when RA is built for GRIFFIN, so no NEON for native Switch either. I understand that 64bit-only registers compensate for the absence of NEON on Arm64 platforms like Switch, right?

I’m not sure about that, actually. Perhaps @m4xw can shed some light on it for us.

Hello @vanfanel, just wondering, were you successful?

If I may ask, is there a benefit to running 64-bit RA on a Pi 3? I was thinking about trying the arm64 builds of ubuntu but I would have to build the cores myself (they are not on the buildbot) and also I don’t know if it affects the dynamic recompilers some cores use.

Yes, there are VERY clear speed benefits, try stage 2 in Contra 3 in Snes9x core for example (max_swapchain=2). Its only fullspeed on a 64bit system/build.

And yes, it affects dynarecs, which are 32bit-specific AFAIK. I seem to recall there is some work going on for 64bit dyarec on certain cores, however.