Bulding on aarch64 with NEON resampler, memcpy, etc

Hi there,

I am building RA on an aarch64 platform (Rpi3 booting with a 64bit kernel and rootfs). This way RA cores really perform better: It’s easy to see how with a max_swapchain=2 config, the Pi3 REQUIRES to run a 64bit kernel, libs and RA to emulate Contra 3 second stage full speed, so it’s THE way to run RA on a Pi3.

The thing is, I believe we can squeeze a bit more of performance with HAVE_NEON=1 so the NEON-optimized versions of memcpy and the NEON-optimized resampler are used. However, trying to do so results on this:

AS libretro-common/audio/resampler/drivers/sinc_resampler_neon.S
AS audio/drivers_resampler/cc_resampler_neon.S
AS memory/neon/memcpy-neon.S
memory/neon/memcpy-neon.S: Assembler messages:
memory/neon/memcpy-neon.S:11: Error: unknown pseudo-op: `.arm'
memory/neon/memcpy-neon.S:12: Error: unknown pseudo-op: `.fpu'
memory/neon/memcpy-neon.S:14: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:31: Error: unknown pseudo-op: `.fnstart'
memory/neon/memcpy-neon.S:32: Error: operand 1 must be an integer register -- `mov ip,r0'
memory/neon/memcpy-neon.S:33: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,#16'
memory/neon/memcpy-neon.S:34: Error: unexpected characters following instruction at operand 1 -- `blt 4f@Have less than 16 bytes to copy'
memory/neon/memcpy-neon.S:36: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:37: Error: operand 1 must be an integer register -- `tst r0,#0xF'
memory/neon/memcpy-neon.S:39: Error: operand 1 must be an integer register -- `tst r0,#1'
memory/neon/memcpy-neon.S:40: Error: unknown mnemonic `ldrneb' -- `ldrneb r3,[r1],#1'
memory/neon/memcpy-neon.S:41: Error: unknown mnemonic `strneb' -- `strneb r3,[ip],#1'
memory/neon/memcpy-neon.S:42: Error: unknown mnemonic `subne' -- `subne r2,r2,#1'
memory/neon/memcpy-neon.S:43: Error: operand 1 must be an integer register -- `tst ip,#2'
memory/neon/memcpy-neon.S:45: Error: unknown mnemonic `ldrneh' -- `ldrneh r3,[r1],#2'
memory/neon/memcpy-neon.S:46: Error: unknown mnemonic `strneh' -- `strneh r3,[ip],#2'
memory/neon/memcpy-neon.S:53: Error: unknown mnemonic `subne' -- `subne r2,r2,#2'
memory/neon/memcpy-neon.S:55: Error: operand 1 must be an integer register -- `tst ip,#4'
memory/neon/memcpy-neon.S:57: Error: unknown mnemonic `vld4.8' -- `vld4.8 {d0[0],d1[0],d2[0],d3[0]},[r1]!'
memory/neon/memcpy-neon.S:58: Error: unknown mnemonic `vst4.8' -- `vst4.8 {d0[0],d1[0],d2[0],d3[0]},[ip,:32]!'
memory/neon/memcpy-neon.S:59: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#4'
memory/neon/memcpy-neon.S:61: Error: operand 1 must be an integer register -- `tst ip,#8'
memory/neon/memcpy-neon.S:63: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0},[r1]!'
memory/neon/memcpy-neon.S:64: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0},[ip,:64]!'
memory/neon/memcpy-neon.S:65: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#8'
memory/neon/memcpy-neon.S:67: Error: operand 1 must be an integer register -- `subs r2,r2,#32'
memory/neon/memcpy-neon.S:69: Error: operand 1 must be an integer register -- `mov r3,#32'
memory/neon/memcpy-neon.S:71: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:72: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:73: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:74: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:76: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0-d3},[r1]!'
memory/neon/memcpy-neon.S:77: Error: operand 1 must be an integer or stack pointer register -- `cmp r3,#(320-32)'
memory/neon/memcpy-neon.S:78: Error: unknown mnemonic `pld' -- `pld [r1,r3]'
memory/neon/memcpy-neon.S:79: Error: unknown mnemonic `addle' -- `addle r3,r3,#32'
memory/neon/memcpy-neon.S:80: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0-d3},[ip,:128]!'
memory/neon/memcpy-neon.S:81: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#32'
memory/neon/memcpy-neon.S:82: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,r3'
memory/neon/memcpy-neon.S:84: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,#0'
memory/neon/memcpy-neon.S:86: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:87: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0-d3},[r1]!'
memory/neon/memcpy-neon.S:88: Error: operand 1 must be an integer register -- `subs r2,r2,#32'
memory/neon/memcpy-neon.S:89: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0-d3},[ip,:128]!'
memory/neon/memcpy-neon.S:91: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:92: Error: operand 1 must be an integer register -- `tst r2,#16'
memory/neon/memcpy-neon.S:94: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0,d1},[r1]!'
memory/neon/memcpy-neon.S:95: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0,d1},[ip,:128]!'
memory/neon/memcpy-neon.S:97: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:98: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:99: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:100: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:101: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:102: Error: junk at end of line, first unrecognized character is `@'
memory/neon/memcpy-neon.S:105: Error: operand 1 must be an SVE predicate register -- `movs r3,r2,lsl#29'
memory/neon/memcpy-neon.S:106: Error: unknown mnemonic `ldrcs' -- `ldrcs r3,[r1],#4'
memory/neon/memcpy-neon.S:107: Error: unknown mnemonic `strcs' -- `strcs r3,[ip],#4'
memory/neon/memcpy-neon.S:108: Error: unknown mnemonic `ldrcs' -- `ldrcs r3,[r1],#4'
memory/neon/memcpy-neon.S:109: Error: unknown mnemonic `strcs' -- `strcs r3,[ip],#4'
memory/neon/memcpy-neon.S:110: Error: unknown mnemonic `ldrmi' -- `ldrmi r3,[r1],#4'
memory/neon/memcpy-neon.S:111: Error: unknown mnemonic `strmi' -- `strmi r3,[ip],#4'
memory/neon/memcpy-neon.S:112: Error: operand 1 must be an SVE predicate register -- `movs r2,r2,lsl#31'
memory/neon/memcpy-neon.S:113: Error: unknown mnemonic `ldrcsh' -- `ldrcsh r3,[r1],#2'
memory/neon/memcpy-neon.S:114: Error: unknown mnemonic `strcsh' -- `strcsh r3,[ip],#2'
memory/neon/memcpy-neon.S:115: Error: unknown mnemonic `ldrmib' -- `ldrmib r3,[r1],#1'
memory/neon/memcpy-neon.S:116: Error: unknown mnemonic `strmib' -- `strmib r3,[ip],#1'
memory/neon/memcpy-neon.S:139: Error: unknown mnemonic `bx' -- `bx lr'
memory/neon/memcpy-neon.S:140: Error: unknown pseudo-op: `.fnend'

My guess is that these NEON-optimized functions are armv7 only, but the flags I am passing for a 64bit build are these:

-march=armv8-a+crc -mtune=cortex-a53

Am I right? In that case, are there 64bit neon versions of those? For example, the Nintendo Switch is aarch64 if I am not mistaken. I have been looking at Makefile.switch for comparision, and it seems it does NOT use NEON after all. Can somebody please clear this out for me? Thanks!

As far as I know, the NEON extensions in existing cores are all 32-bit NEON syntax, so they don’t work with 64-bit systems. I don’t think it’s a huge deal to port 32-bit NEON to 64-bit NEON, but none of our regular contributors are experts on that, AFAIK.

You can still run 32-bit RA in your 64-bit environment to take advantage of the 32-bit NEON and asm, but you won’t gain the benefit of the 64-bit registers, etc.

@hunterk: Thanks, that’s what I suspected! However, 64bit registers are WAY more important than NEON-optimized code, just look at the SNES9x core performance on armv7+NEON vs armv8 without NEON…

Having both working would give another good performance stretch to armv8 platforms like the Switch and the Pi3 in 64bit mode! But I don’t know ARM asm either, sadly :frowning: