[linux-audio-dev] Pitchshift/Timestretch project..

Florian Schmidt mista.tapas at gmx.net
Tue Apr 6 11:16:14 UTC 2004

On Tue, 06 Apr 2004 11:51:34 +0200
Cournapeau David <cournape at enst.fr> wrote:

> Well, kind of. The idea of the phase vocoder, which more or less 
> describes what you said,
> is to decompose each time-domain frame into N frequency bins, and to 
> suppose that there is only one underlying stationary sinusoidal in
> each frequency canal. If this is the case, you unwrap the phase to
> have the frequency of the sinusoid, and you resynthetise it with a
> longer or shorter time frame.
> The problem is that it demands short windows (for the hypothesis one 
> stationary sinusoid in each frequency canal to be valid), which means 
> very poor frequency resolution. 

Actually it gives very poor frequency resolution at low frequencies [
1/F > frame ] if i remember my math right.

> Basically, you have to make a trade
> off between time resolution and frequency resolution (nothing new here
> ;)). So the idea is to adapt the window size to the content of the
> signal, which means being able to detect the transient (which are
> better stretched with small windows)...

This is, IMHO, the preferred application domain for wavelet transform.
By using the multiscale nature of wavelet transform you get the best of
both worlds.. You get good frequency resolution across the frequency
range, and also a good time resolution for high frequency material.. I
suppose lesser time resolution on low frequency components is ok, since
the time resolution of the human ear for low frequencies isn't as good
as for high freqs either..  But my math is a bit shaky on the subject..
Also my assumption about the time resolution of the human ear concerning
low frequency material might be wrong..

> J Bonada wrote his PhD on this subject, if you are interested:
> http://www.kug.ac.at/iem/lehre/arbeiten/hammer.pdf

Thanks! I'll take a look..


More information about the Linux-audio-dev mailing list