Default audio_rate_control_delta value 0.005 is bad and quite audible

When maister was first working on DRC, we did a bunch of ABX testing with different pitch variation and no one was even close to detecting 0.005. In fact, academic research suggests that the typical “just noticeable difference” threshold is 10x greater, at 0.05.

Being able to hear variations 2 orders of magnitude smaller than normal is impressive to say the least, but it’s very far from the norm and hence shouldn’t influence default settings.

It seems that my Vertical Refresh Rate setting was somehow set to something weird, like 59.940060, but after it gathered 2048 timing samples, it was flickering between 59.999 and 60.002 FPS.

Maybe the vertical refresh rate estimator should discard the first second worth of samples.

Anyway, I’m doing some math right now.

A NES generates video at an average of 60.0988138974405 FPS (or 60.0984775561123 FPS when rendering is disabled). This works out to 733.79152 samples per frame at 44100Hz.

If we want to play back at 60FPS instead of 60.0988FPS, we need to play back the samples we generate at a rate of 44027.4912Hz. This is a difference of 2.8488211801687 cents, which is acceptable.

So far so good.

The problems is that even when changing vertical refresh rate to 60, if I use 0.005 as the audio_rate_control_delta, I get warbles. How do I see what the program is actually trying to do? Ideally, it should resample to 44027.4912Hz and stay there, and never warble.

If you read the Audio Video Synchronization guide you would know that “Audio Maximum Timing Skew” will stretch the native refresh rate of the console (60.08 for nes) to the refresh rate you set in “Vertical Refresh Rate” when the Core starts (and only then), this including the Audio pitch and video speed.
If the difference between the two is larger than 0.05 (5%) the stretch will not occur.

Summary:

  • “Audio Maximum Timing Skew” will stretch the audio/video of the core to your monitor refresh rate ONLY when a core starts (not real time).
    This option prevents dropping a single frame every few seconds.
  • “Dynamic Audio Rate Control” will fix micro deviations in real time of your monitor refreshrate caused by heat etc…

Please read the guide I’ve linked.

IIRC, it works on the buffer rather than the frequency itself. That is, it doesn’t look for the freq difference and then just resample exactly to that, it runs at the normal pitch until the buffer is close to empty, then it resamples it up by the DRC amount and runs until the buffer fills back up, then resamples back to normal.

An analogy would be a bucket with a hole in it that represents the slowly emptying buffer, and it’s being fed by a hose that runs at a given, slightly slower rate. As the bucket gets close to empty, RetroArch increases the flow from the host until the bucket fills back up, then it drops the feed back to normal once it’s full again. Repeat as necessary.

So, if your DRC value isn’t high enough to cover the shortfall, it’ll still crackle, and it will always warble with a fairly predictable pattern.

I’m looking at ratecontrol.pdf right now.

So far, I see an issue with the algorithm. Changes made are in the wrong units.

When you’re dealing with humans hearing pitches, the units should be semitones and cents. A semitone is also called a half step, it’s the difference between C and C#. A cent is 1/100th of a semitone.

These units are logarithmic, and deal with the relative difference between two frequencies. Difference in semitones = 12 * (Log2(F1) - Log2(F2))

So if you have frequency 44100, and frequency 44027, that is a difference of about 0.028681 semitones, or 2.8681 cents.

Another example, if you wanted to add 5 cents to frequency 44100, you’d do this: F = 44100 * 2 ^ (0.05 / 12), and get about 44227.5499.

I think I will review the algorithm used and see if there are any problems from not doing adjustments in units of semitones and cents.

1 Like

Not sure how well this correlates with what RetroArch is doing, but it should still be useful regardless: http://paste.debian.net/hidden/d061719d/ (compile on Linux like: g++ -Wall -O2 -o sine sine.cpp -lsndfile)

I can tell the difference fairly easily down to about 1.003, but my accuracy starts fleeing through the window in terror at 1.002.

At 1.005, I guessed all of them correctly, despite them all sounding the same to me on a conscious level (so I guess there’s a subconscious “hunch” thing going on). I was completely lost, though, jumping down to 1.003.

Interesting stuff.

Could someone possibly up a Windows binary of this test tool somewhere?

Looks like it needs cygwin, otherwise it won’t be able to open /dev/urandom

Can Retroarch dump WAV files?

I think so…? I’ve never tried it, but the ffmpeg a/v dumping should work if you just use a null codec for the video.

Wow, I thought I had bad hearing, but I can hear all the way down to 1.001. 1.0005 is indistinguishable. I really didn’t think the variance with DRC was that significant, but I guess it can be.

Guys, can somebody tell what this test is exactly about / what it does?

Unfortunately I don’t have a linux machine to test it out myself :roll_eyes:

Do these results imply that it’s better to set the Retroarch audio skew to something like 0.001 instead of the default 0.005?

im probably too old now, as i dont really hear noticeable sound issue with the default during gameplay.

how are you guys testing this? by how the cores sounded? or do we have a reference studio quality audio file and sophisticated measuring equipment

Check out my posts in this thread.

I think a better test would be to load a ROM that plays tones and we can test them right in RetroArch. I seem to recall such a thing existing but I can’t find any examples now.

I did. But your comment about 0.005 being inaudible doesn’t really concur with three other people that say they can hear a difference when using the test tool posted by @Mednafen (here )

@hunterk Sound like the best route. It would be great if you could retrieve that ROM somewhere.

Would love to test this after reading these comments :grinning:

If you can clearly hear the working of “Dynamic Audio Rate Control” then there is something wrong with the stability of the refreshrate of your display or vsync dropouts.

Lets say your display is exactly 60.00Hz, so +0.5% is 60.3Hz and -0.5% is 59.7Hz.
No display fluctuates like that!
Usually the display refreshrate fluctuates at the second or third decimal point form heat so 60.000 ± 0.01Hz.

@James-F I don’t hear anything strange when using Retroarch, but I was/am referring to the tests that have been done with the tool posted by Mednafen.

Did you have a chance to test the tool by @Mednafen posted previously in the thread?

See his post here