On Mon, 2009-08-10 at 21:43 -0400, David Robillard wrote:
With
buffer-size 3 × 1.3 ms @96KHz I have clock cycles to spare and can at
ease display a stream of video (320×200) simultaneously for doing a
soundtrack to some movie or something. And more ...
Interesting... I am surprised you can crunch DSP on these things with
this kind of latency at 96KHz. 48KHz should be a breeze then...
Most certainly. The driver is very predictably processing each batch to
completion, so if there are no X-events queued up, then you 'own' the
GPU. Anything OpenGL and you're fried though - not even GLX gears in a
small window, at least not with buffer sizes around 1ms. Increasing the
buffers/jitter to around 5ms does away with that problem though, so that
glxgears can run in 256×256 (along with the previous smallish video
stream and Atari ST emulation [running a MIDI-sequencer]), oh and
apparently also GLX in full screen. Increased buffers also does away
with artifacts coming from certain Firefox redraws (where this site
http://carlbildt.wordpress.com/ mysteriously is affected while this site
http://arstechnica.com/ is not.)
This is with the tiniest most out-fashioned device almost available
which do not have newer features like overlapping processing/transfers
for alternating streams. The motherboard chipset on say the ION-platform
would have near twice the processing power, and you could then also do
away with any transfers and just read from normal memory, leaving the
Intel companion chip to do interrupt processing and nothing much else.
If anybody would be interested in doing a concerted effort of optimizing
PCIe transfers in jackd for cross platform CUDA audio processing, then -
well - I am here, as well as over at the CUDA forums. I imagine that, as
seen from say qjackctl, this should just look the same as any other
hardware you may have - like a sound-card - with 32 or perhaps 64
channels in/out. What happens at the other side, what those channels
connects to, would be up to the user/programmer. Running several kernels
in succession, the one piped into the other has very little overhead,
although it might be a better strategy to do different parts of the
scene in parallel on neighboring processors. Very few people who does
not work at TU-Berlin actually needs more than 640 channel strips ;)
The reward would be having huge arrays of GHz processors for about one
dollar a pop! Memory bandwidth in the triple digit range to go with
that. Or like me, just enjoy the bliss of silent computing on something
a little less ambitious.
/j
8<--------------------------------------
update: The performance increase for GTK pixmaps I experienced earlier
came because the X conf defaulted to 8bit after the Nvidia card was
disabled ... darned!
I'll have to redo that experiment with the Intel driver again some other
day, and for now just remember that switching to 8bit might allow for
more 'blinkenlights' while still processing near RT on the GPU.