SOLVED.
I found the culprit: it was “Threaded optimization” in the NVidia control panel settings. Setting this to either ON or AUTO resulted in horribly high cpu usage in Retroarch. I created a custom setting for RA and set “Threaded optimization” to OFF and now my cpu usage is much lower and RA is showing similar cpu usage as Nestopia standalone (if not better, I haven’t looked at it closely enough yet to tell).
The question remains: why did Nestopia standalone not suffer this problem?
Is threaded optimization a setting that is enabled in Retroarch by default, but not in Nestopia SA?
I’m thinking that my driver might not be playing nice with something, either my OS or my hardware or something. If I was having basic problems with threaded optimization, then it would make sense that I was having problems with high cpu usage in Retroarch if Retroarch is designed to use threaded optimization by default.
Or it could be that RA is not designed to play nice with threaded optimization, or there is a conflict there somewhere. What I need to do is find a program that I know uses threaded optimization and see if I get similar performance problems - high cpu usage and cpu spikes.
Anyway it’s still a mystery to me, but by creating a custom setting and forcing threaded optimization OFF for Retroarch, my cpu performance using Nestopia now seems on par with Nestopia SA.
I will need to play test this for a while in order to see if the A/V hiccup problem will also be solved by this, as I think it is a related problem to the high cpu usage.