SUCCESS!! 
At last! I achieved the lowest possible latency on Wayland! Same as on KMS and Windows exclusive fullscreen.
It seems RADV was not playing nice by supplying 4 swap images instead of the requested 2. I installed AMDVLK and now RetroArch gets the requested 2 swap images (I checked on the log). And direct scan-out also seems to be working because now I get only 54ms of latency on Wayland, the theoretical minimum on my system!
So here is the updated table:
- 95ms (5.7 frames) on Composited X (RADV)
- 77ms (4.6 frames) on Kwin Wayland (RADV)
- 75ms (4.5 frames) on Weston (RADV)
- 65ms (3.9 frames) on Uncomposited X (RADV)
- 54ms (3.3 frames) on Kwin Wayland (AMDVLK)
- 53ms (3.2 frames) on KMS (RADV)
- 51ms (3.0 frames) on Windows exclusive-fullscreen
It is quite possible X also benefits from the AMDVLK swapchain, as it was getting 3 images in RADV instead of the requested 2, but I’ll leave those tests for another day. I’m tired of counting literally thousands of frames by hand 
These values also mean direct scan-out is probably working correctly on Wayland, otherwise there would be an additional 16.6ms of latency (at 60fps).
EDIT: Direct scan-out has no impact here. See my next post.
Anyways, I’m happy that I can enjoy KMS levels of latency on my Wayland desktop now. Next I’ll tighten the Frame Delay setting as much as possible and I’ll call it a day.
Thanks a lot for helping me figure this out! @e-tank @RealNC
I hope this info reaches people trying to reduce latency as much as they can on their Linux PCs!