[LAD] CUDA

Jens M Andreasen jens.andreasen at comhem.se
Thu Aug 28 20:49:11 UTC 2008


Kryz, your mail-severs seems to be down and when I checked out if you
had a running WWW, I got this message:

"Chwilowo nic tu nie ma" [We are not here yet]



So, back to the list ... This time corrected as per second mail:



On Tue, 2008-08-26 at 11:12 +0100, Krzysztof Foltman wrote:
> Jens M Andreasen wrote:
> > I am doing some preliminary testing of CUDA for audio, Version 2 (final)
> > has been out for a couple of days, and this is also what I am using.
> 
> Does it require the proprietary drivers and/or Nvidia kernel module?
> 

Yes, and not only that. The proprietary drivers distributed with say
Mandrake, Ubuntu et al won't work either. Uninstall that, change your X
setup to vesa (to stop recursive nvidia installer madness) and then get
your CUDA driver and compiler from:

 http://www.nvidia.com/object/cuda_get.html

> What kind of things is the gfx card processor potentially capable of
> doing? Anything like multipoint interpolation for audio resampling
> purposes? Multiple delay lines in parallel? Biquads? Multichannel
> recording to VRAM?


Multichannel recording by itself would be a waste of perfectly good
floating point clock-cycles, but anything that you can map to a wide
vector (64 to 196 elements) is up for grabs. A 196 voice multi timbral
synthesizer perhaps or 64 channel-strips with basic filters and
compressor/noise-gate for remix. The five muladds needed for a single
biquad filter times the number of bands you need to equalize on fits the
optimal programming model quite well.

The linear 2D interpolator is also available and even cached. Perhaps
not the worlds most useful toy for audio-resampling, but could find its
way into some variation of wave-table synthesis. It can be set up to
wrap around at the edges, which I find kind of interesting.

Random access to main (device) memory is - generally speaking - a bitch
and a no-go if you cannot wrap your head around ways to load and use
very wide vectors. 

There are some 8096 fp registers to load into though, so all is not
lost. Communication, permutation and exchange of data between vector
elements OTOH is then fairly straight forward and cheap by means of a
smallish shared memory on chip.

The more you can make your algorithm(s) look like infinitely brain-dead
parallel iterations of multiply/add, the better they will make use of
the hardware. The way I see it, the overall feel of your strategy should
be something like "The Marching Hammers" animation (from Pink Floyd: The
Wall.)

> Is it possible to confine all the audio stream transfer between gfx
> and  audio cards to kernel layer and only implement control in user
> space? (to potentially reduce xruns, won't help for control latency 
> but at least it's some improvement)
>
 
Yuo you mean something like DMA? Yes I would have thought so but this is
apparently not always the case, Especially not on this very card that I
have here. :-/

The CUDA program running on the device will have priority over X though.
So no blinking lights (nor printfs) before your calculation is done. For
real-time work, I reckon this as a GoodFeature (tm)!

Potentially this can also hang the system if you happen to implement the
infinite loop (so don't do that ...)

> Would it be possible to use a high level language like FAUST to
> generate CUDA code? (by adding CUDA-specific backend)
> 

The problem would be to give Faust a good understanding of the memory
model and how to keep individual vector elements away from collectively
falling over each other.

But I must admit that I am not too familiar with what Faust is actually
doing? Would it be of any help to you with a library of common higher
level functionality like FFT and BLAS?

---8<---------------------------------------------

- "CUBLAS is an implementation of BLAS (Basic Linear Algebra
Subprograms) on top of the NVIDIA(r) CUDA(tm) (compute unified
device architecture) driver. It allows access to the computational
resources of NVIDIA GPUs. The library is self-contained at the API
level, that is, no direct interaction with the CUDA driver is
necessary."

------8<................................

But observe that:

... - "Currently, only a subset of the CUBLAS core functions is
implemented."


/j

> Krzysztof
> 




More information about the Linux-audio-dev mailing list