Unfortunately, I have not tested the difference between having threaded video disabled/enabled. I always have it disabled. I only run NES and SNES emulation currently and without using shaders and using a 1080p display, I can use max_swapchain_images=2 and a frame delay of ~6. Using a higher resolution display, such as a 4K TV, will increase the system load noticeably, although I haven’t quantified it. If you’re using a 4K screen, I would suggest actually running RetroArch in 1920x1080 instead.
There’s one more thing: A while back I updated my Ubuntu 16.10 installation. This updated the Linux kernel (don’t remember the version) and Intel GPU driver. After this, I noticed significantly worse GPU performance and I had to downgrade to restore performance. So, if everything else fails, try a completely fresh Ubuntu 16.10 installation with no updates.