http://www.mainly.me.uk/resampling/index.htmlPerformance of audio resampling software1 IntroductionI have a lot of 16-bit stereo audio files to convert from 48 kHz sample rate to 44.1 kHz. There are many audio resampling programs to choose from, so I have been testing a few packages to see which to use. These are the candidates I found with a quick search:
By the way I use Linux here so it's software available for Linux that I have been testing. Also, I use Audacity under Linux for audio editing and that too has a resampler. However I am currently testing only command-line resamplers. 2 Frequency response and simple aliasing2.1 MethodThe resamplers were all asked to process a 16-bit mono 48 kHz-sampled WAVE file (sweep.wav) containing a linear frequency sweep from 0 Hz to 23.999 kHz. A sweep rate of 1 kHz/s was used for initial evaluation and 100 Hz/s for more detailed testing. This allows the instantaneous frequency to be known from the time (provided the sweep time is long enough). The input files for this test and the others below were generated by a custom C program. The signal level for the sweep was set at -1 dBFS (so that any pass-band ripples will not clip). The following resampling program commands were used: $ sr-convert sweep.wav - sr-convert.wav 44100 $ sox sweep.wav -r 44100 sox.wav polyphase $ sndfile-resample -to 44100 sweep.wav sndfile-resample.wav $ ssrc --rate 44100 sweep.wav ssrc.wav $ ResampAudio --srate 44100 sweep.wav ResampAudio.wav $ resample -to 44100 sweep.wav resample.wav The above commands all use default settings for resampling to 44.1 kHz. Note that SoX has three resamplers - "rate", "resample" and "polyphase" of which only the polyphase method is tested. 2.2 ResultsFigure 1 (below) shows the "frequency responses" of all six resamplers (in the same order as above) with the 1 kHz/s sweep. The application used to display the sweep is Audacity. The horizontal scale is in seconds but may be read as kHz. The vertical scale is linear amplitude. Figure 1. Frequency response of six resamplers (1 kHz/s sweep) Figure 2 (below) shows the "frequency responses" of the three best resamplers (see the discussion below) with the 100 Hz/s sweep. The horizontal scale starts at 21.7 kHz on the left and the cursor is positioned at the Nyquist frequency - 22.05 kHz. The vertical scale is logarithmic amplitude this time (dBFS). Figure 2. Frequency response of the three best resamplers (100 Hz/s sweep) The execution times on a 2.4 GHz AMD Athlon XP for resampling a 2,500 second stereo 16-bit WAVE file from 48 kHz to 44.1 kHz were:
2.3 DiscussionIn theory the frequency responses in figure 1 should be flat up to a little less than 22.05 kHz and then should drop as rapidly as possible to zero at 22.05 kHz and above. In practice a frequency response flat to 20 kHz is quite sufficient for an audio re-sampler (or a little less than 20 kHz in my case as I can no longer hear that high at my age). There really should be nothing significant above 22.05 kHz in the frequency responses as this represents aliasing, which is a fault in a resampler. When resampling from 48 kHz to 44.1 kHz the aliasing products shown by this test cannot fall below 20.1 kHz so they will not be audible. However, this test may not show all of the aliasing possible from a resampler. First of all I can rule out resample which has a -3 dB point at 17.9 kHz. Also resample seems to have a -1.5 dB default gain for this conversion (this is not seen in figure 1), rather than 0 dB as achieved by the rest. It seems to lack a frequency-dependent amplitude correction factor. It is possible to generate a better filter for resample to use but I think I have easier alternatives. To be fair, resample is the quickest of the batch by a long way although a better filter may slow it down. On the grounds of their aliasing I can also rule out ResampAudio and sr-convert. (Also resample would fall here if it had not already been eliminated on its frequency response.) Their level of aliasing is sufficiently obvious from the fast frequency sweep that I did not need to look further. That leaves ssrc, SoX (polyphase) and sndfile-resample, which I have examined using the slower 100 Hz/s sweep (figure 2). SoX (polyphase) shows a small amount of aliasing in this test. It's sufficiently small, at -71.2 dBFS peak in the range 22.05 kHz to 22.16 kHz and better than -90 dBFS elsewhere. sndfile-resample and ssrc show no visible aliasing - just noise levels of better than -90 dBFS above 22.05 kHz. These three are perfect for most purposes. Of these three ssrc looks like the best at this point. It has an exceptionally extended flat response, a very sharp cutoff very close to the Nyquist limit at 22.05 kHz, and the lowest RMS noise floor (light red is RMS, dark red is peak in the figures above). Also it runs quickly (and it has an even quicker "--profile fast" option which, while not as flat as the default profile, is still quite good enough for this job). This is the most attractive option compared to the slowerSoX (polyphase) or sndfile-resample. If I had to choose a runner-up at this stage it would be sndfile-resample. 3 Intermodulation Distortion3.1 MethodThe intermodulation test was performed on the three best resamplers from section 2. A 7.5-second long 48 kHz 16 bit WAVE file was generated containing two tones plus dither. The tones were SMPTE-standard IMD test tones of 60 Hz at -5 dBFS RMS and 7 kHz at -17 dBFS RMS). The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz and a FFT-based spectrum analysis was performed using Octave. 3.2 Results
3.3 DiscussionBoth ssrc and SoX (polyphase method) emerge cleanly from this test. That means there's nothing here to displace ssrc as the best resampler of the batch, nor anything to dent its superlative performance. However we now know that sndfile-resample has a small defect in its intermodulation performance. Nevertheless this does not seem to be too significant for 16-bit resampling. It matches the small but probably insignificant aliasing defect from SoX (polyphase method), so choosing the runner-up is not now so clear. 4 Aliasing4.1 MethodThe aliasing test was also performed on the three best resamplers from section 2. A 7.5-second long 48 kHz 16 bit WAVE file was generated containing a 23 kHz tone at -4 dBFS RMS plus dither. The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz. A FFT-based spectrum analysis was performed using Octave. The primary alias of 23 kHz will be at 21.1 kHz in this test. Given that the frequency response varies in the aliasing region above 22.05 kHz we might better use multiple tones or high levels of broadband noise to test aliasing. This may be possible in future. 4.2 Results
4.3 DiscussionWith a TPDF dither noise floor at about -96 dBFS we might expect to still hear music a further 15 dB down or so. That's at about -111 dBFS. None of these three resamplers has aliasing above that level (or spurious lines from other effects). However, possibly a better test should be devised than a single tone. Even the worst performer, sndfile-resample, achieved -113 dBFS which is almost certainly not going to be audible. The performance of SoX (polyphase method) is slightly better and ssrc is several dB better still. Thus ssrc retains its place as the best of the batch; SoX (polyphase method) and sndfile-resample remain as indivisible runners-up. 5 Harmonic Distortion5.1 MethodThe harmonic distortion test was performed on the three best resamplers from section 2. A 7.5-second long 48 kHz 16 bit WAVE file was generated containing a SMPTE-standard 1 kHz THD test tone at -4 dBFS RMS plus dither. The dither was TPDF noise of 2 LSBs peak-to-peak (-96.3 dBFS RMS). The file was resampled to 44.1 kHz. A FFT-based spectrum analysis was performed using Octave. 5.2 Results
5.3 DiscussionProbably we have detected only very low levels of harmonic distortion in all of these resamplers. The spurious products at higher levels may be due to other mathematical effects such as clipping, rounding or truncation. The worst performer, whether its spurious outputs are harmonic distortion or not is sndfile-resample. It achieved -100 dBFS which, although within the normal levels of audio noise, may possibly be at a higher level than the lowest audible level for music. The performances from SoX (polyphase method) and ssrc are about the same, with their worst spurious products at about -130 dBFS and -127 dBFS respectively. Yet again ssrc retains its place as the best of the batch. Perhaps SoX (polyphase method) nudges ahead as runner-up because of sndfile-resample's relatively poor show here. John A. Phillips, 21st August 2005. |