[LAU] Subject: Albums under a label recorded and/or mixed with Linux

fons at kokkinizita.net fons at kokkinizita.net
Sun Sep 26 11:12:49 UTC 2010


On Sun, Sep 26, 2010 at 12:45:30AM -0700, Patrick Shirkey wrote:
 
> Just to clarify again, are you saying here that convolution will produce a
> cleaner result than fft?
> Or that the implementation of the fft algorithm in jamin is not the most
> efficient method of doing the transform?

The problem is not the use of an FFT nor the implementation
of that FFT. It's about *how* the FFT is used.

Let's look at a basic vocoder algorithm. We have two 
inputs signals wich we want to combine spectrally, 
using N-point FFT operations.

1. T1 = N samples from the first input.
2. F1 = FFT (T1)
3. T2 = N samples from the second input.
4. F2 = FFT (T2)
5. Multiply F1 and F2 point by point -> F3
6. T3 = inverse FFT (F3).
7. output T3.

Repeat for the next block.

Since we don't want clicks at block boundaries the blocks
of N samples processed in this way are made to overlap, and
the output is crossfaded between them.

This is usually done by windowing before steps (2) and (4)
and after step (6), and then just adding the output blocks
that overlap. For example a square-root-raised-cosine window
applied at all three places will have the net result of a
raised-cosine window on the output, and this provides 
perfect crossfading between blocks that overlap by N/2
samples.

Jamin's algorithm is just the same, except that F2 is
not the result of an FFT but constructed from the data
provided by the GUI. As long as the user does not change
the frequency response, F2 is constant.

A common mistake is to think that in that case this
algorithm is just a filter, but it isn't.

The reason is that step (5), multiplication in the 
frequency domain, is equivalent to convolution in the
time domain. And the convolution of two signals of N
samples (T1 and T2) has lenght 2*N-1. This means that
this result is wrapped around in step (6) since the
inverse FFT produces only N samples. 

The windowing on the inputs will reduce the errors
resulting from this, and anyway for a vocoder or other
spectral processor it doesn't matter so much since this
is an effect anyway. Jamin reduces the errors by using
a much larger overlap (IRCC the step is N/8 or N/16
instead of the usual N/2).

To get the correct result, T1 and T2 must be limited
to N/2 samples followed by as many zeros. Then each
block of N/2 input samples produces N output samples
without wrapping around, and they can just be added
with an overlap of N/2. This is how linear convolution
engines such as BruteFir or Jconvolver work. There is
no windowing involved at all.

For Jamin that would also mean that the required
frequency response, F2, must the the N-point FFT of
an impulse response that is not longer than N/2
samples. The data from the GUI must be preprocessed
to ensure that this condition is satisfied.

Now let's look at the CPU load involved. Assuming a
step size of N/8, Jamin performs 16 N-point FFTs per
channel per N samples.

Using the exact linear convolution algo it would 
perform two FFTs of twice the size. This is roughly
the same as 4 N-point FFTs, or 1/4 of the work.

Ciao,

-- 
FA

There are three of them, and Alleline.



More information about the Linux-audio-user mailing list