Default audio_rate_control_delta value 0.005 is bad and quite audible

The default audio_rate_control_delta value is badly chosen. 0.005, despite being thought of as “inaudible”, is way too high, and very much audible. If you keep a game playing a sustained tone, you can hear it warble a lot.

I had to set the value way down to something like “0.000400” to reduce such audible artifacts. I can still hear it warble slightly, but at that rate, it avoids stutters and gaps in the audio. I might have also gotten 0.0003 to work without stutters if I raise audio latency to 64.

This configuration option used to be visible in the UI, but is now buried deep in the configuration file, and that is bad.

As for the audio_max_timing_skew variable, what does this do? I’ve tried setting it to both huge values and small values and can’t figure out what it’s supposed to do. It seems to work best when left at 0.05.

It’s still in the UI, you probably have this set to OFF:

settings->user interface->show advanced settings

0.005 (0.5%) difference in pitch is in fact inaudible.
I had a problem on the secondary screen where the refresh rate was not stable, only then the audio wobbling was clearly audible, but the difference in refresh rate was way beyond 0.005 (0.5%).

Your problem is probably related to VSync more that refresh rate inconsistencies.
I suggest using ONLY the primary monitor in a single monitor setup on Windoes7,8,10.

When maister was first working on DRC, we did a bunch of ABX testing with different pitch variation and no one was even close to detecting 0.005. In fact, academic research suggests that the typical “just noticeable difference” threshold is 10x greater, at 0.05.

Being able to hear variations 2 orders of magnitude smaller than normal is impressive to say the least, but it’s very far from the norm and hence shouldn’t influence default settings.

It seems that my Vertical Refresh Rate setting was somehow set to something weird, like 59.940060, but after it gathered 2048 timing samples, it was flickering between 59.999 and 60.002 FPS.

Maybe the vertical refresh rate estimator should discard the first second worth of samples.

Anyway, I’m doing some math right now.

A NES generates video at an average of 60.0988138974405 FPS (or 60.0984775561123 FPS when rendering is disabled). This works out to 733.79152 samples per frame at 44100Hz.

If we want to play back at 60FPS instead of 60.0988FPS, we need to play back the samples we generate at a rate of 44027.4912Hz. This is a difference of 2.8488211801687 cents, which is acceptable.

So far so good.

The problems is that even when changing vertical refresh rate to 60, if I use 0.005 as the audio_rate_control_delta, I get warbles. How do I see what the program is actually trying to do? Ideally, it should resample to 44027.4912Hz and stay there, and never warble.

If you read the Audio Video Synchronization guide you would know that “Audio Maximum Timing Skew” will stretch the native refresh rate of the console (60.08 for nes) to the refresh rate you set in “Vertical Refresh Rate” when the Core starts (and only then), this including the Audio pitch and video speed.
If the difference between the two is larger than 0.05 (5%) the stretch will not occur.

Summary:

  • “Audio Maximum Timing Skew” will stretch the audio/video of the core to your monitor refresh rate ONLY when a core starts (not real time).
    This option prevents dropping a single frame every few seconds.
  • “Dynamic Audio Rate Control” will fix micro deviations in real time of your monitor refreshrate caused by heat etc…

Please read the guide I’ve linked.

IIRC, it works on the buffer rather than the frequency itself. That is, it doesn’t look for the freq difference and then just resample exactly to that, it runs at the normal pitch until the buffer is close to empty, then it resamples it up by the DRC amount and runs until the buffer fills back up, then resamples back to normal.

An analogy would be a bucket with a hole in it that represents the slowly emptying buffer, and it’s being fed by a hose that runs at a given, slightly slower rate. As the bucket gets close to empty, RetroArch increases the flow from the host until the bucket fills back up, then it drops the feed back to normal once it’s full again. Repeat as necessary.

So, if your DRC value isn’t high enough to cover the shortfall, it’ll still crackle, and it will always warble with a fairly predictable pattern.

I’m looking at ratecontrol.pdf right now.

So far, I see an issue with the algorithm. Changes made are in the wrong units.

When you’re dealing with humans hearing pitches, the units should be semitones and cents. A semitone is also called a half step, it’s the difference between C and C#. A cent is 1/100th of a semitone.

These units are logarithmic, and deal with the relative difference between two frequencies. Difference in semitones = 12 * (Log2(F1) - Log2(F2))

So if you have frequency 44100, and frequency 44027, that is a difference of about 0.028681 semitones, or 2.8681 cents.

Another example, if you wanted to add 5 cents to frequency 44100, you’d do this: F = 44100 * 2 ^ (0.05 / 12), and get about 44227.5499.

I think I will review the algorithm used and see if there are any problems from not doing adjustments in units of semitones and cents.

1 Like

Not sure how well this correlates with what RetroArch is doing, but it should still be useful regardless: http://paste.debian.net/hidden/d061719d/ (compile on Linux like: g++ -Wall -O2 -o sine sine.cpp -lsndfile)

I can tell the difference fairly easily down to about 1.003, but my accuracy starts fleeing through the window in terror at 1.002.

At 1.005, I guessed all of them correctly, despite them all sounding the same to me on a conscious level (so I guess there’s a subconscious “hunch” thing going on). I was completely lost, though, jumping down to 1.003.

Interesting stuff.

Could someone possibly up a Windows binary of this test tool somewhere?

Looks like it needs cygwin, otherwise it won’t be able to open /dev/urandom

Can Retroarch dump WAV files?

I think so…? I’ve never tried it, but the ffmpeg a/v dumping should work if you just use a null codec for the video.

Wow, I thought I had bad hearing, but I can hear all the way down to 1.001. 1.0005 is indistinguishable. I really didn’t think the variance with DRC was that significant, but I guess it can be.

Guys, can somebody tell what this test is exactly about / what it does?

Unfortunately I don’t have a linux machine to test it out myself :roll_eyes:

Do these results imply that it’s better to set the Retroarch audio skew to something like 0.001 instead of the default 0.005?

im probably too old now, as i dont really hear noticeable sound issue with the default during gameplay.

how are you guys testing this? by how the cores sounded? or do we have a reference studio quality audio file and sophisticated measuring equipment

Check out my posts in this thread.

I think a better test would be to load a ROM that plays tones and we can test them right in RetroArch. I seem to recall such a thing existing but I can’t find any examples now.

I did. But your comment about 0.005 being inaudible doesn’t really concur with three other people that say they can hear a difference when using the test tool posted by @Mednafen (here )