Hi there,
I am building RA on an aarch64 platform (Rpi3 booting with a 64bit kernel and rootfs). This way RA cores really perform better: It’s easy to see how with a max_swapchain=2 config, the Pi3 REQUIRES to run a 64bit kernel, libs and RA to emulate Contra 3 second stage full speed, so it’s THE way to run RA on a Pi3.
The thing is, I believe we can squeeze a bit more of performance with HAVE_NEON=1 so the NEON-optimized versions of memcpy and the NEON-optimized resampler are used. However, trying to do so results on this:
AS libretro-common/audio/resampler/drivers/sinc_resampler_neon.S AS audio/drivers_resampler/cc_resampler_neon.S AS memory/neon/memcpy-neon.S memory/neon/memcpy-neon.S: Assembler messages: memory/neon/memcpy-neon.S:11: Error: unknown pseudo-op: `.arm' memory/neon/memcpy-neon.S:12: Error: unknown pseudo-op: `.fpu' memory/neon/memcpy-neon.S:14: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:31: Error: unknown pseudo-op: `.fnstart' memory/neon/memcpy-neon.S:32: Error: operand 1 must be an integer register -- `mov ip,r0' memory/neon/memcpy-neon.S:33: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,#16' memory/neon/memcpy-neon.S:34: Error: unexpected characters following instruction at operand 1 -- `blt 4f@Have less than 16 bytes to copy' memory/neon/memcpy-neon.S:36: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:37: Error: operand 1 must be an integer register -- `tst r0,#0xF' memory/neon/memcpy-neon.S:39: Error: operand 1 must be an integer register -- `tst r0,#1' memory/neon/memcpy-neon.S:40: Error: unknown mnemonic `ldrneb' -- `ldrneb r3,[r1],#1' memory/neon/memcpy-neon.S:41: Error: unknown mnemonic `strneb' -- `strneb r3,[ip],#1' memory/neon/memcpy-neon.S:42: Error: unknown mnemonic `subne' -- `subne r2,r2,#1' memory/neon/memcpy-neon.S:43: Error: operand 1 must be an integer register -- `tst ip,#2' memory/neon/memcpy-neon.S:45: Error: unknown mnemonic `ldrneh' -- `ldrneh r3,[r1],#2' memory/neon/memcpy-neon.S:46: Error: unknown mnemonic `strneh' -- `strneh r3,[ip],#2' memory/neon/memcpy-neon.S:53: Error: unknown mnemonic `subne' -- `subne r2,r2,#2' memory/neon/memcpy-neon.S:55: Error: operand 1 must be an integer register -- `tst ip,#4' memory/neon/memcpy-neon.S:57: Error: unknown mnemonic `vld4.8' -- `vld4.8 {d0[0],d1[0],d2[0],d3[0]},[r1]!' memory/neon/memcpy-neon.S:58: Error: unknown mnemonic `vst4.8' -- `vst4.8 {d0[0],d1[0],d2[0],d3[0]},[ip,:32]!' memory/neon/memcpy-neon.S:59: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#4' memory/neon/memcpy-neon.S:61: Error: operand 1 must be an integer register -- `tst ip,#8' memory/neon/memcpy-neon.S:63: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0},[r1]!' memory/neon/memcpy-neon.S:64: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0},[ip,:64]!' memory/neon/memcpy-neon.S:65: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#8' memory/neon/memcpy-neon.S:67: Error: operand 1 must be an integer register -- `subs r2,r2,#32' memory/neon/memcpy-neon.S:69: Error: operand 1 must be an integer register -- `mov r3,#32' memory/neon/memcpy-neon.S:71: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:72: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:73: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:74: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:76: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0-d3},[r1]!' memory/neon/memcpy-neon.S:77: Error: operand 1 must be an integer or stack pointer register -- `cmp r3,#(320-32)' memory/neon/memcpy-neon.S:78: Error: unknown mnemonic `pld' -- `pld [r1,r3]' memory/neon/memcpy-neon.S:79: Error: unknown mnemonic `addle' -- `addle r3,r3,#32' memory/neon/memcpy-neon.S:80: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0-d3},[ip,:128]!' memory/neon/memcpy-neon.S:81: Error: operand 1 must be an integer or stack pointer register -- `sub r2,r2,#32' memory/neon/memcpy-neon.S:82: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,r3' memory/neon/memcpy-neon.S:84: Error: operand 1 must be an integer or stack pointer register -- `cmp r2,#0' memory/neon/memcpy-neon.S:86: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:87: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0-d3},[r1]!' memory/neon/memcpy-neon.S:88: Error: operand 1 must be an integer register -- `subs r2,r2,#32' memory/neon/memcpy-neon.S:89: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0-d3},[ip,:128]!' memory/neon/memcpy-neon.S:91: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:92: Error: operand 1 must be an integer register -- `tst r2,#16' memory/neon/memcpy-neon.S:94: Error: unknown mnemonic `vld1.8' -- `vld1.8 {d0,d1},[r1]!' memory/neon/memcpy-neon.S:95: Error: unknown mnemonic `vst1.8' -- `vst1.8 {d0,d1},[ip,:128]!' memory/neon/memcpy-neon.S:97: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:98: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:99: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:100: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:101: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:102: Error: junk at end of line, first unrecognized character is `@' memory/neon/memcpy-neon.S:105: Error: operand 1 must be an SVE predicate register -- `movs r3,r2,lsl#29' memory/neon/memcpy-neon.S:106: Error: unknown mnemonic `ldrcs' -- `ldrcs r3,[r1],#4' memory/neon/memcpy-neon.S:107: Error: unknown mnemonic `strcs' -- `strcs r3,[ip],#4' memory/neon/memcpy-neon.S:108: Error: unknown mnemonic `ldrcs' -- `ldrcs r3,[r1],#4' memory/neon/memcpy-neon.S:109: Error: unknown mnemonic `strcs' -- `strcs r3,[ip],#4' memory/neon/memcpy-neon.S:110: Error: unknown mnemonic `ldrmi' -- `ldrmi r3,[r1],#4' memory/neon/memcpy-neon.S:111: Error: unknown mnemonic `strmi' -- `strmi r3,[ip],#4' memory/neon/memcpy-neon.S:112: Error: operand 1 must be an SVE predicate register -- `movs r2,r2,lsl#31' memory/neon/memcpy-neon.S:113: Error: unknown mnemonic `ldrcsh' -- `ldrcsh r3,[r1],#2' memory/neon/memcpy-neon.S:114: Error: unknown mnemonic `strcsh' -- `strcsh r3,[ip],#2' memory/neon/memcpy-neon.S:115: Error: unknown mnemonic `ldrmib' -- `ldrmib r3,[r1],#1' memory/neon/memcpy-neon.S:116: Error: unknown mnemonic `strmib' -- `strmib r3,[ip],#1' memory/neon/memcpy-neon.S:139: Error: unknown mnemonic `bx' -- `bx lr' memory/neon/memcpy-neon.S:140: Error: unknown pseudo-op: `.fnend'
My guess is that these NEON-optimized functions are armv7 only, but the flags I am passing for a 64bit build are these:
-march=armv8-a+crc -mtune=cortex-a53
Am I right? In that case, are there 64bit neon versions of those? For example, the Nintendo Switch is aarch64 if I am not mistaken. I have been looking at Makefile.switch for comparision, and it seems it does NOT use NEON after all. Can somebody please clear this out for me? Thanks!