[linux-audio-dev] Realtime convolution - a threading problem?

Sune Mai sunemai at hotmail.com
Wed Apr 6 16:34:28 UTC 2005


Hello! I just joined the LAD-list to get some feedback on the following 
topic:

Convolution is really nice - I use it all the time to get perfect reverb, 
nice guitar amp-sound etc. The only serious drawback is the high 
input-output latency of simple FFT based algorithms. I'd like to see if it 
is possible to improve on this situation without sacrificing to much 
efficiency.

If the impulse response (the FIR) is N samples long, the complexity of a 
naive (non FFT) implementation scales as N, i.e. it uses N operations per 
sample of input. A simple (single block overlap-add)  scales as ln_2(N), 
since FFT has complexity N ln_2(N).

If one wants both computational efficiency and low latency, one can do (at 
least...?) two things:
1: Split the FIR in smaller blocks. This is fairly simple, but has a serious 
drawback: As the latency, L, goes down, the efficiency of the algorithm 
approaches the naive non-FFT implementation. So the computational cost 
scales as N for "zero" (L<<N) latency.
2: Another approach is to split the FIR in blocks of different sizes, as can 
be seen in the bottom figure of this page:
http://www.music.miami.edu/programs/mue/Research/jvandekieft/jvchapter2.htm
This approach has an obvious advantage in that it allows "zero" latency, 
while still retaining some of the computational efficiency of the 
high-latency FFT based method. As far as I can see, the complexity of this 
algorithm scales as (ln_2(N))^2 for zero-latency. This is not as good as the 
high latency efficiency of ln_2(N) - but still far better than using method 
1. with low latency.

As far as I know, the programs I've come across so far (most notably 
BruteFIR and Jack_convolve) uses method 1. to get low latency. This leads to 
my first question:
Has anybody implemented method 2. in a jack application, or, more generally, 
under GPL?. If not, are there any reasons that I'm not aware of, why one 
should not choose algorithm 2 instead of 1?

The only difficulty I can see with method 2 is the scheduling of the 
different sub tasks. One way to get smooth processor load, is to do the FFT 
of the different block-sizes in different threads. This involves that the 
computation of the smallest blocks (triggered by the jack callback 
mechanism), should somehow preemt the thread doing the double sized blocks - 
and this thread should in turn preemt the thread doing the quadruple size 
FFT-blocks etc. According to the FFTW documentation the FFT algorithm is 
thread-safe, and can be used in this way.

I don't have much experience with threads - and none of the resources I've 
found seems to deal with this kind of realtime problem. Is the above a 
stupid way to deal with the scheduling problem? And are there any resources 
you know of, that describes this kind of problem in detail?

\Sune Mai

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee® 
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963




More information about the Linux-audio-dev mailing list