On Friday 14 December 2007 09:01, Tim Goetze wrote:
Indeed, the chorus/aliasing effect is as good as gone
with
--no-peaklock, but the vocals begin to suffer from periodic amplitude
modulation (tremolo).
I can believe that. Crispness 2 or 3 (combined with --no-peaklock) might get
rid of that -- you're essentially trading off tricks and artifacts that
are "useful" in certain cases against others. Let me expand a bit on that
and on Rubber Band in general.
First off, there's nothing "new" about any of the techniques used, and
Rubber
Band doesn't claim to be the best sounding timestretcher out there. Much as
I would like it to be, if I'd set out with that intention I would never have
got anything released.
What Rubber Band does aim to do is fill the gap in the free software ecology
for a timestretching library that sounds good enough for general musical use,
and that also meets the other requirements that make it useful in practical
applications such as the capability for sample-exact stretching, real-time
safety, known latency, the ability to change ratios dynamically, support for
any number of channels at any sample rate, and not blowing up too easily when
faced with extreme ratios. Though perhaps not all of those at once (e.g. it
isn't quite sample-exact in real-time mode).
In terms of DSP, Rubber Band is a phase vocoder (standard STFT
analysis/synthesis with phase-unwrapping) with a few additional techniques,
all of which have their tradeoffs:
* Phase locking to peak frequencies (after Laroche & Dolson 1999). This
reduces the vagueness and phasiness that the phase vocoder introduces at
ratios relatively close to 1, but makes everything sound metallic at long
stretches. Rubber Band reduces the region of influence around each peak as
the frequency decreases (otherwise the bass quickly goes out of tune) and
cuts down gradually on the amount of locking it does as the stretch increases
(though it seems not enough). This technique is the one you're switching off
with --no-peaklock. Long stretches tend to make their own demands on phase
to produce satisfying (as opposed to strictly accurate) results -- c.f. Paul
Nasca's paulstretch, designed for very very long stretch factors, which
randomizes phases altogether.
* Phase resynchronisation -- resetting the synthesis phases from the raw
analysis phases -- at noisy transients (a simplistic take on Duxbury et al
2002). This is usually an effective technique when stretching things that
have crisp transients, particularly drum loops and the like. It doesn't work
so well in some other cases. The most serious cases are sounds with mostly
stable frequencies (e.g. a smooth vocal) together with something that may not
be loud but is perceived by the stretcher as having transient attacks (e.g.
acoustic guitar): the stretcher will exaggerate the guitar onsets and leave a
corresponding tremolo in the vocal. This technique also sounds very bad if
the transient locations are mis-identified (particularly if they're picked up
one or two FFT frames too late, as can happen with some sorts of transient).
To switch this off, use --no-transients. Rubber Band also supports a band
limited mode (--bl-transients) which resets phases only outside the most
likely range for low order harmonics; this can sound better in some
situations, but it will also lose you some ongoing phase coherence again.
* Variable stretch factor -- reducing the amount of stretch around transients
and increasing it in relatively "still" sections. This can improve the
transient sound over a plain phase vocoder, even with --no-transients, but it
can also lead to mis-timing if Rubber Band fails to identify a transient or
if the timing within a "still" section is unusually important. You can
disable this with --precise.
With --precise --no-peaklock --no-transients, you should have pretty much a
classic phase vocoder.
I can think of a small number of potential improvements to make, but tuning
this stuff is quite hard -- push on one side and something pops out on the
other. Almost every change you make that improves one test case seems to
result in a deterioration in another, and quality is more subjective than you
might expect. There are plenty of papers out there that claim improvements
in performance but actually produce lousy results for most real music.
At what stretching/compressing ratios have you run
your tests, and
with what kind of source material?
Some tests with "individual track" sources (individual instruments, drum loops
etc) and some with "complex mixture" sources (folk songs, pop songs etc). As
you surmise, this is mostly with ratios in the "time correction" range, up to
about 25% either way. I did a few runs with longer ratios like 3x and 5x,
but I've been less picky about the results. Time stretching only -- I
actually haven't run any listening tests at all on pitch shifting.
While the code is certainly doing quite fine, I have a
feeling that my 2x
test runs could be a bit out of line with the intended kind of use.
That's not outside the intended range, but it may be a bit of a weak spot.
The peaklock reduction doesn't strongly kick in until rather long ratios, so
you will probably get less metallic results (though fuzzier) by default at 3x
than at 2x.
Chris