[LAD] vectorization

Jens M Andreasen jens.andreasen at comhem.se
Tue May 27 07:59:04 UTC 2008

Previous message: [LAD] vectorization
Next message: [LAD] vectorization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 2008-05-07 at 10:50 +0200, Fons Adriaensen wrote:
[about zita convolver]
> It's quite complex. I'll try to build up a picture in four steps.

-<fast FWD>-

> 4. The scheme above is FFT-based partitioned convolution with a single
>    partition size P. For efficiency you want large P, but this also
>    introduces processing delay as you can only start the computation
>    when P new input samples are available. To avoid this delay in a
>    real-time application zita-convolver uses multiple partition sizes,
>    small at the start of an IR, and larger ones for the later parts.
>    There can be up to five sizes. Calculations for P == period size
>    are performed directly in the JACK callback, the longer ones are
>    performed by lower priority threads. A final optimisation is that
>    a sparse matrix representation is used in all three dimensions,
>    so no time or memory is wasted on zero-valued data.
>    

Since you are bound by bandwith to main memory, it would be nice to get
you off the precious level 2 cache. There are hints available to bypass
the cache, but a better solution might be to look into nVidias C-like
CUDA language. Here are some measures for their library FFT involving a
€200 card with a 256bit GDDR3 memory interface:

 http://www.cv.nrao.edu/~pdemores/gpu/

... and here is an introduction where you can get an idea of how
involved that might be:

 http://www.ddj.com/hpc-high-performance-computing/207200659

Silent cards with passive cooling are  available at about €50+ but have
"only" ordinary 128bit DDR2 memory and considerably less computational
power. 4 Gflops (real world) might be enough for the application at hand
though? 

Apparently nVidia has implemented some kind of hardware permute in their
later designs (from 8400 an up), opening up their gpu's to a much wider
range of algorithms than previous generations. Real world performance in
Gflops appears to be about 1/10 of the peak shader thruput mentioned in
this table at wikipedia:

 http://en.wikipedia.org/wiki/GeForce_8_Series#Technical_summary

As an added bonus, should somebody look into and implement any of this,
we would all get the perfect excuse for achieving insane framerates in
Quake/OpenArena :-D

Disclaimer: NVidias product codes have become a disgustingly confusing
alphabet-soup. I might have misread some comparison table.
--

Previous message: [LAD] vectorization
Next message: [LAD] vectorization
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Linux-audio-dev mailing list