On Wed, 2003-11-19 at 00:56, Paul Davis wrote:
if you used poll(2) or select(2), you could do
simultaneous waits on
each direction, regardless of whether they use different devices or
not. you'd then reduce the context switches and the overhead of
synchronize().
Context switches are not problem when thread's main function is to sleep
on blocking descriptor. And synchronize() has minimal overhead on system
which have futexes and very small in any case. Sleeping on system call
(on waitqueue) doesn't take any noticeable amount of CPU time. This
works especially well on SMP machines.
OK, your choice. From my perspective, 1 context switch is on the order
of 2-6% of the available CPU cycles for a 64 frames-per-interrupt
configuration. And sleeping in two threads on what in just about every
case will be one piece of hardware with synchronized playback and
capture streams seems like a waste of resources to me. 99 times out of
a hundred, or more, the "ready" bit on a task is set from the
interrupt handler, and it will set both the playback and capture
waitqueues. as a result, you now have to wake up two threads when one
would do. you're accomplishing nothing that i can see, other than
adding a certain kind of theoretical elegance in your code. and
theoretical elegance is for the API, not the implementation :)
Not all OSS Lite drivers support poll() or select().
:(
i also note
that you're also not using mmap, resulting in extra
copying of significant quantities of data on every cycle for
multichannel cards. the cpu cycles for this can be significant when
you get down to very low latency on hammerfall cards, for example.
Memorymapping buffers of fast running devices is problematic.
(Soundcards are relatively slow however.) Also the pagefault handler
takes some time.
there are no pagefaults in a realtime JACK system. all code is pinned
in physical RAM.
Another point is that normal gcc memcpy() is
significantly slower
compared to copy_to/from_user() on modern hardware.
do you have a URL i can read about this?
Third is that mapping DMA buffers of 32-bit cards to
memory space of
64-bit processors is another story. If the driver is mapping some
secondary buffer allocated from vm then it's different.
ALSA seems to be handling it perfectly OK so far.
I don't have have any control on jack, so I'm
not going to tie myself up
on something which needs unspecified amount of work.
I'll merge it into CVS after thanksgiving (nov 27th) here (first
time i'm likely to have the time).
--p