On Sat, Jun 07, 2014 at 10:15:41AM +0200, hermann meyer wrote:
Thanks have to go to Stephan M. Bernsee from
dspdimension as well.
GxDetune is based on his work here:
http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/
This sort of works, but it's not what it claims to be.
The whole part that finds the exact frequency by comparing
phases is completely redundant. This information is never
really used. It just looks as if it is used.
For example, for one octave up, you could just as well take
the magnitude and phase of bin k, multiply the phase by 2 and
put the result in the input bin 2*k of the inverse FFT. The
result would be just the same. No frequency calculation is
ever made.
The net result is also equivalent to:
- overlap
- windowing
(as in your code) but then:
- downsample by 2
- repeat the result so you get the original length
- add to output
Which doesn't even require an FFT.
The way to really use the computed frequencies would
be quite different.
If you have a signal at some frequency F there will
be significant energy in a number of bins close to F.
The correct value of F can be found by comparing the
phases as explained by Bernsee. Given this F you need
some way to determine which contiguous group of bins
is representative of that signal (one way would be to
look for minima in magnitude left and right).
Now for correct frequency scaling, you need to move
that whole group up or down (as determined by the ratio,
e.g. 2 for one octave up) *** but without scaling the
group itself ***. In other words, if bin k moves to 2*k,
then bin k-1 moves to 2*k-1 etc.
This requires an *interpretation* of the signal: do bins
that are close together
1. represent a single frequency signal, or
2. multiple signals that are close together.
In case (1) the envelope of the signal is represented by
the relative magitudes and phases of the adjacent bins.
To preserve this envolope (i.e. to correctly reproduce
transient signals), these bins need to remain adjacent.
Another way to state this that any algorithm that does
frequency scaling (or time stretching) needs some way
to decide if certain features of the signal need to be
interpreted as significant in the time domain or in the
frequency domain. The correct decision depends on how
a human listener would interpret that feature.
It is not even possible to *define* a frequency scaling
or time stretching algorithm without at least implicitly
defining a way to decide on this.
The implicit assumption in the current algorithm is that
each bin is an separate feature in the frequency domain,
and thus needs to be scaled independently of all others.
Ciao,
--
FA
A world of exhaustive, reliable metadata would be an utopia.
It's also a pipe-dream, founded on self-delusion, nerd hubris
and hysterically inflated market opportunities. (Cory Doctorow)