[LAU] Subject: Albums under a label recorded and/or mixed with Linux

Patrick Shirkey pshirkey at boosthardware.com
Sun Sep 26 12:50:24 UTC 2010


On Sun, September 26, 2010 4:12 am, fons at kokkinizita.net wrote:
> On Sun, Sep 26, 2010 at 12:45:30AM -0700, Patrick Shirkey wrote:
>
>> Just to clarify again, are you saying here that convolution will produce
>> a
>> cleaner result than fft?
>> Or that the implementation of the fft algorithm in jamin is not the most
>> efficient method of doing the transform?
>
> The problem is not the use of an FFT nor the implementation
> of that FFT. It's about *how* the FFT is used.
>
> Let's look at a basic vocoder algorithm. We have two
> inputs signals wich we want to combine spectrally,
> using N-point FFT operations.
>
> 1. T1 = N samples from the first input.
> 2. F1 = FFT (T1)
> 3. T2 = N samples from the second input.
> 4. F2 = FFT (T2)
> 5. Multiply F1 and F2 point by point -> F3
> 6. T3 = inverse FFT (F3).
> 7. output T3.
>
> Repeat for the next block.
>
> Since we don't want clicks at block boundaries the blocks
> of N samples processed in this way are made to overlap, and
> the output is crossfaded between them.
>
> This is usually done by windowing before steps (2) and (4)
> and after step (6), and then just adding the output blocks
> that overlap. For example a square-root-raised-cosine window
> applied at all three places will have the net result of a
> raised-cosine window on the output, and this provides
> perfect crossfading between blocks that overlap by N/2
> samples.
>
> Jamin's algorithm is just the same, except that F2 is
> not the result of an FFT but constructed from the data
> provided by the GUI. As long as the user does not change
> the frequency response, F2 is constant.
>
> A common mistake is to think that in that case this
> algorithm is just a filter, but it isn't.
>
> The reason is that step (5), multiplication in the
> frequency domain, is equivalent to convolution in the
> time domain. And the convolution of two signals of N
> samples (T1 and T2) has lenght 2*N-1. This means that
> this result is wrapped around in step (6) since the
> inverse FFT produces only N samples.
>
> The windowing on the inputs will reduce the errors
> resulting from this, and anyway for a vocoder or other
> spectral processor it doesn't matter so much since this
> is an effect anyway. Jamin reduces the errors by using
> a much larger overlap (IRCC the step is N/8 or N/16
> instead of the usual N/2).
>
> To get the correct result, T1 and T2 must be limited
> to N/2 samples followed by as many zeros. Then each
> block of N/2 input samples produces N output samples
> without wrapping around, and they can just be added
> with an overlap of N/2. This is how linear convolution
> engines such as BruteFir or Jconvolver work. There is
> no windowing involved at all.
>
> For Jamin that would also mean that the required
> frequency response, F2, must the the N-point FFT of
> an impulse response that is not longer than N/2
> samples. The data from the GUI must be preprocessed
> to ensure that this condition is satisfied.
>
> Now let's look at the CPU load involved. Assuming a
> step size of N/8, Jamin performs 16 N-point FFTs per
> channel per N samples.
>
> Using the exact linear convolution algo it would
> perform two FFTs of twice the size. This is roughly
> the same as 4 N-point FFTs, or 1/4 of the work.
>


So, given that the idea was to produce a commercial music friendly
mastering sound using either a 30 band or parametric eq feed into a
multiband compressor and the decision was made to use fft for the eq
filter instead of convolution it seems to me that jamin *is* designed to
make sure that the sound has as few artifacts as possible for the signal
process.

Most people have the spare resources to handle the load required. It's
also a traditional technique for mastering stereo tracks that people from
traditional studio environments can easily understand and the interface is
designed to enable that workflow.

I'm not sure why jamin should be considered defective or a kludge.

Can you give me a little more idea why you feel using FFT in this method
is more likely to return atrifacts than the equivalent convolution? Is it
simply because there is more data being crunched to get to the end result?



-- 
Patrick Shirkey
Boost Hardware Ltd.



More information about the Linux-audio-user mailing list