[LAD] Rubber Band v1.0 - an audio time-stretching and pitch-shifting library and utility

Chris Cannam cannam at all-day-breakfast.com
Fri Dec 14 13:10:24 UTC 2007

On Friday 14 December 2007 09:01, Tim Goetze wrote:
> Indeed, the chorus/aliasing effect is as good as gone with
> --no-peaklock, but the vocals begin to suffer from periodic amplitude
> modulation (tremolo).

I can believe that.  Crispness 2 or 3 (combined with --no-peaklock) might get 
rid of that -- you're essentially trading off tricks and artifacts that 
are "useful" in certain cases against others.  Let me expand a bit on that 
and on Rubber Band in general.

First off, there's nothing "new" about any of the techniques used, and Rubber 
Band doesn't claim to be the best sounding timestretcher out there.  Much as 
I would like it to be, if I'd set out with that intention I would never have 
got anything released.

What Rubber Band does aim to do is fill the gap in the free software ecology 
for a timestretching library that sounds good enough for general musical use, 
and that also meets the other requirements that make it useful in practical 
applications such as the capability for sample-exact stretching, real-time 
safety, known latency, the ability to change ratios dynamically, support for 
any number of channels at any sample rate, and not blowing up too easily when 
faced with extreme ratios.  Though perhaps not all of those at once (e.g. it 
isn't quite sample-exact in real-time mode).

In terms of DSP, Rubber Band is a phase vocoder (standard STFT 
analysis/synthesis with phase-unwrapping) with a few additional techniques, 
all of which have their tradeoffs:

 * Phase locking to peak frequencies (after Laroche & Dolson 1999).  This 
reduces the vagueness and phasiness that the phase vocoder introduces at 
ratios relatively close to 1, but makes everything sound metallic at long 
stretches.  Rubber Band reduces the region of influence around each peak as 
the frequency decreases (otherwise the bass quickly goes out of tune) and 
cuts down gradually on the amount of locking it does as the stretch increases 
(though it seems not enough).  This technique is the one you're switching off 
with --no-peaklock.  Long stretches tend to make their own demands on phase 
to produce satisfying (as opposed to strictly accurate) results -- c.f. Paul 
Nasca's paulstretch, designed for very very long stretch factors, which 
randomizes phases altogether.

 * Phase resynchronisation -- resetting the synthesis phases from the raw 
analysis phases -- at noisy transients (a simplistic take on Duxbury et al 
2002).  This is usually an effective technique when stretching things that 
have crisp transients, particularly drum loops and the like.  It doesn't work 
so well in some other cases.  The most serious cases are sounds with mostly 
stable frequencies (e.g. a smooth vocal) together with something that may not 
be loud but is perceived by the stretcher as having transient attacks (e.g. 
acoustic guitar): the stretcher will exaggerate the guitar onsets and leave a 
corresponding tremolo in the vocal.  This technique also sounds very bad if 
the transient locations are mis-identified (particularly if they're picked up 
one or two FFT frames too late, as can happen with some sorts of transient).  
To switch this off, use --no-transients.  Rubber Band also supports a band 
limited mode (--bl-transients) which resets phases only outside the most 
likely range for low order harmonics; this can sound better in some 
situations, but it will also lose you some ongoing phase coherence again.

 * Variable stretch factor -- reducing the amount of stretch around transients 
and increasing it in relatively "still" sections.  This can improve the 
transient sound over a plain phase vocoder, even with --no-transients, but it 
can also lead to mis-timing if Rubber Band fails to identify a transient or 
if the timing within a "still" section is unusually important.  You can 
disable this with --precise.

With --precise --no-peaklock --no-transients, you should have pretty much a 
classic phase vocoder.

I can think of a small number of potential improvements to make, but tuning 
this stuff is quite hard -- push on one side and something pops out on the 
other.  Almost every change you make that improves one test case seems to 
result in a deterioration in another, and quality is more subjective than you 
might expect.  There are plenty of papers out there that claim improvements 
in performance but actually produce lousy results for most real music.

> At what stretching/compressing ratios have you run your tests, and
> with what kind of source material?

Some tests with "individual track" sources (individual instruments, drum loops 
etc) and some with "complex mixture" sources (folk songs, pop songs etc).  As 
you surmise, this is mostly with ratios in the "time correction" range, up to 
about 25% either way.  I did a few runs with longer ratios like 3x and 5x, 
but I've been less picky about the results.  Time stretching only -- I 
actually haven't run any listening tests at all on pitch shifting.

> While the code is certainly doing quite fine, I have a feeling that my 2x
> test runs could be a bit out of line with the intended kind of use.

That's not outside the intended range, but it may be a bit of a weak spot.  
The peaklock reduction doesn't strongly kick in until rather long ratios, so 
you will probably get less metallic results (though fuzzier) by default at 3x 
than at 2x.


More information about the Linux-audio-dev mailing list