On Wed, 2010-10-13 at 07:09 +0200, Stéphane Letz wrote:
Examples of
what?
Example of CUDA used for audio.
If we for now ignore getting data in and out - which I understand you
are reading up upon now - there are two main use cases:
a) Vertical signal flow: The signal flows from the top of each thread,
starting at a location in shared memory chosen by indexing and ends up
in another, fixed and known location, also in shared memory. Shared
memory works like a switchboard with 4096 plugs, where any output can be
the input of any other. In this case the code is identical to what
you'll find in textbooks or floating around in DSP forums, including
this one. The difference being that you'll get 128 instances of each
functionality rather than just one.
b) Horisontal signal flow: The above approach does not work for things
like say delaylines (longer than a few samples) where you want each
instance to have their delaytime individually parameterized. In this
case you will therefore first transform the outputs so that a collection
of vertical signals like:
a A 1 $ a b c d..
b B 2 # A B C D..
c C 3 ! becomes horisontal: 1 2 3 4..
d D 4 ? $ # ! ?..
: : : :
You will want to do this in chunks of 16 elements because this is how
the memory controller works. A codepath driving 128 threads can then
adress 8 unrelated global memory locations in parallel to store as well
as load the right chunks(s) back into shared memory - after which the
individual threads can look left and right for finetuning, interpolation
and/or FIR filtering.
--
eins, zwei, drei ... tekno tekno??
http://www.youtube.com/watch?v=ZEgbW1FxR78