ABX-testing dynamic rate control

I’d like to do more scientific testing of the dynamic rate control approach. The goal is for the method to be transparent. We thus need to find at which point the pitch distortion is transparent to the human ear. And please no trolling about “accuracy”. That is irrelevant to this test.

I have not found any papers on this, so we’ll have to start with the basics, ABX testing.

The main parameter of dynamic rate control is the delta factor d. The factor d constrains resampling ratio factor between ratio * [1 - d, 1 + d].

Creating test samples: Checkout RetroArch from Git, in audio/test, build testing binaries with make. You can create samples with test-rate-control.sh. Example:


./test-rate-control.sh reference_music.flac reference.wav 0.000 # Assumes music.flac is 44.1kHz and stereo. Resamples to 48kHz with d = 0 (no rate control).
./test-rate-control.sh reference_music.flac drc_020.wav 0.020 # Resamples to 48kHz with 2% pitch deviation (will be very audible).
./test-rate-control.sh reference_music.flac drc_002.wav 0.002 # 0.2% pitch deviation (probably not audible).

The script uses test-sinc-highest (well over 100 dB SNR) to resample, and FFmpeg to do raw PCM conversions, etc.

For each input block of N frames, a resampling ratio will be chosen randomly between ratio * (1 - d) and ratio * (1 + d). The distribution is uniform. Note that this ratio doesn’t really correspond to the same ratio used in RetroArch as the actual ratio variance is generally far lower than the maximum allowed value. This test will simulate the worst case.

ABX tool: Windows: fb2k (duh) *nix: Squishyball (from xiph.org): http://svn.xiph.org/trunk/squishyball/, Arch package: http://svn.xiph.org/trunk/squishyball/

When ABX-testing, remember to add in a “beep” or something when swapping samples, or it is very simple to notice a difference due to tiny timing deviations.

I found a deb package for Squishyball for any other debian/ubuntu users who would like to try it out: https://launchpad.net/~cxl/+archive/misc/+build/4311505

I haven’t run the tests yet, myself, I’ll probably give it a shot this evening.

EDIT: I made some samples for testing, if anyone wants to give it a shot without having to make their own: http://www.mediafire.com/?ay2o3ig2c8e5emi

EDIT2: my wife and I both tried it with the 5 vgm samples linked above and neither of us could distinguish between the reference and 002 tracks but could easily spot the 020s. (listening through Audio Technica ATH-M50 headphones)

We’ll try again with some non-vgm samples and some flat tones.

EDIT3: again, neither of us could distinguish between reference and 002 with 2 regular songs and a 1khz reference tone. If I have time tomorrow, I’ll try to get some of student workers to try it, too.

For what relevance it has, I can consciously tell that a 1003Hz sine wave is higher in frequency than a 1000Hz sine wave(both generated in audacity) if I’m paying attention. I can barely consciously tell which is higher when comparing 1002Hz to 1000Hz; anything under that I’d have to do blind testing since the difference is too subtle.

3 Hz deviation at 1000 Hz was cited as the “human limit” in a paper I found, so your test sounds legit. However, in this case, the pitch is constantly varying, so it might not be as obvious. I have done some ABX testing, but haven’t found the “limit” for me yet.