On Wed, 06 Apr 2005 18:34:28 +0200
"Sune Mai" <sunemai(a)hotmail.com> wrote:
[snip
2: Another approach is to split the FIR in blocks of
different sizes, as can
be seen in the bottom figure of this page:
http://www.music.miami.edu/programs/mue/Research/jvandekieft/jvchapter2.htm
This approach has an obvious advantage in that it allows "zero" latency,
while still retaining some of the computational efficiency of the
high-latency FFT based method. As far as I can see, the complexity of this
algorithm scales as (ln_2(N))^2 for zero-latency. This is not as good as the
high latency efficiency of ln_2(N) - but still far better than using method
1. with low latency.
As far as I know, the programs I've come across so far (most notably
BruteFIR and Jack_convolve) uses method 1. to get low latency. This leads to
my first question:
Has anybody implemented method 2. in a jack application, or, more generally,
under GPL?. If not, are there any reasons that I'm not aware of, why one
should not choose algorithm 2 instead of 1?
I _heard_ that in the US the algorithm 2 is patented. I haven't really
looked into it, since my time is limited ATM. If it is not, i might have
a shot at it at some unspecified time in the future.
The only difficulty I can see with method 2 is the
scheduling of the
different sub tasks. One way to get smooth processor load, is to do the FFT
of the different block-sizes in different threads. This involves that the
computation of the smallest blocks (triggered by the jack callback
mechanism), should somehow preemt the thread doing the double sized blocks -
and this thread should in turn preemt the thread doing the quadruple size
FFT-blocks etc. According to the FFTW documentation the FFT algorithm is
thread-safe, and can be used in this way.
Since the convolution is simply multiplication in the frequency domain,
it should be possible to break up the computation for the bigger chunks
into smaller pieces (except for the FFT (and IFFT) of the input and
the result). This would eliminate the need for using threads. But since
the FFT's aren't the bottlenecks in the partitioned convolution (the
memory access is), this might still work. I'm not sure though that
there's much to gain though (maybe i haven't quite grokked Algorithm 2
yet. will need to do some more thinking)
Other possibly worthy ways of reducing the cpu load would be different
ways of "tail extensions". The two techniques i can see there are
- Using convolution only for the "early reflections" and using classic
dsp based reverb tails (i don't know much about how these work though)..
- Making use of the fact the tail is typically not having much variation
except for exponentially dropping volume. Thus one could use a
relatively short (and relatively early) chunk of the tail over and over
just fitted to the right volume with a gain factor.. This would not
reduce the number of multiplications, but it would help with memory
access..
I'm not sure, if the latter method can be automated and hidden behind a
simple UI (tough to get any simpler than jack_convolve's current
Interface :)) or if it would require heavy parameter tweaking by the end
users..
Flo
--
Palimm Palimm!
http://affenbande.org/~tapas/