Joshua Haberman wrote:
At the moment I am favoring an approach that uses multiple discrete
buffers and passing pointers to them, rather than pushing data on a
ring buffer. One major reason is that it allows you to send the data
to multiple places. Say you want to send a captured buffer to both
the disk thread (for recording) and the GUI thread (for displaying a
VU meter). With ring buffers you would have to copy this data to two
separate ring buffers, one for each thread that will read it. With
multiple buffers, you can just send the same pointer to both threads.
Yes you would need to copy buffers but messing around with slow GUIs is
always a pain so in that case I'd choose to copy.
Why ?
Let me make an example.
Assume I want to show a FFT or an oscilloscope that displays incoming
audio data (or from a disk track).
As long as we display only one FFT/oscilloscope the overhead caused by
the memory copy is low.
Of course if we have to copy 100 stereo tracks you get about 17MB/sec
which can add a measurable overhead.
But the question is: did you try to open 100 FFT meters at the same time ?
I guess that would make the memory copy overhead look minuscule compared
to the graphics output.
True, but this isn't a reason to choose a ring buffer *over* the
multiple buffer approach. The multiple buffer approach is convenient to
me for other reasons also.
Also, I have a
strategy in mind for gracefully degrading when the GUI
thread can't pull data from the FIFO fast enough (again, for VU meters
or some other kind of visualization). Instead of queuing up more and
more unconsumed buffers, I will give the FIFO an upper limit, a number
of buffers past which it will discard the oldest buffers. It's not
immediately clear to me how you could implement this same strategy
with a ring buffer.
In the case of FFT/oscilloscope GUIs (or any kind of GUI that displays
audio data) I'd do the following:
RingBuffer audio_ringbuffer;
RingBuffer gui_ringbuffer;
disk thread:
while(1) {
read data into audio_ringbuffer
check if there is enough space in the GUI ringbuffer which must be >=
than
the data just read.
if yes then memcpy(data from audio ringbuffer to GUI ringbuffer)
else don'rt write this block of data to the GUI ringbuffer
}
In that case you have one single memory copy for mirroring the data
you've put into the
audio ringbuffer to the GUI ringbuffer.
And in case the GUI ringbuffer gets completely filled (because the GUI
thread is unable to
process all the data it gets) you simply discard the current data block.
With your scheme, once the GUI thread finally wakes up it will be left
with the *oldest* data from the time since it has been asleep. My
scheme can give it the *newest* data. How important will this be in
practice? I don't know. But at least I will have the option.
With lists of buffer pointers what will happen if the
display thread
stalls for a long time ?
You have to keep track of many buffers which cannot be actively used by
the disk streaming thread / audio thread
so you waste lot of memory.
Not true -- as I described before, I will give the queue an upper limit
on the number of buffers it can keep queued. Buffers older than this
will be returned to the free pool of buffers.
Also, though
I'm not sure yet if this matters for what I'm doing, your
ring buffer can only handle one reader and one writer. The algorithms
we are implementing can handle multiple readers and writers (though we
are writing single-reader/single-writer versions also, since they are
simpler and more efficient).
The question is: in what cases do we need multiple readers, multiple
writers lock-free FIFOs when dealing with audio apps ?
I'm not entirely sure yet, but here are some possibilities. Remember
that you're often passing more than just audio data around, you're
passing messages between threads.
- you definitely need a multiple-reader/multiple-writer stack to
implement a buffer freelist. A buffer freelist is a pool of unused
buffers, and you want to be able to allocate buffers from and and return
buffers to it from any thread. In your case, you may not need a buffer
freelist because you're using ring buffers to pass data between threads.
- there may be cases where you want both your GUI thread and your disk
thread to be able to post messages to the audio thread's message queue.
One final
thing: I'm not sure if this is a problem in practice, but
most (if not all) processors do not guarantee that memory reads and
writes are performed in order, as observed by other processors and by
I/O devices. This could theoretically lead to a problem where the
stores that write data to the ring buffer are not performed until
*after* the store that updates your write pointer. Your reader thread
could get old data rather than the data your just wrote. To guarantee
that the loads and stores are observed in the required order, you have
to use memory barriers (again, in theory). See:
http://en.wikipedia.org/wiki/Memory_barrier
(Ross wrote that and we're both still learning about this stuff, so
don't take that article as the word of God).
Our RingBuffer simply updates a read_ptr and a write_ptr so the worst
case that can happen is that the read_ptr still points to the old
location even if
the writer already wrote the data. (assume a SMP system).
Is is not harmful because the reader will get that data during the next
cycle.
See my reply to Tim. You are not recognizing the danger I am
describing. Assuming a sequential execution of loads and stores, what
you say is true. However, processors *do not guarantee* that loads and
stores are performed in program order!
Take the pseudo-assembly:
stw sample, 0(ringBuffer) ; copy last sample into the ringbuf
stw newptr, 0(write_ptr) ; signal that the write has completed
Because modern CPUs allow out-of-order execution of loads and stores (as
observed by other CPUs and I/O devices), the processer gives you no
guarantee that the first store will be visible to another CPU before the
second store is. If your consumer thread is running on another CPU and
sees the second store first, it will start reading from the ring buffer
before all the stores to that ring buffer have completed!
It may be that the chances of this error scenario happening for a
RingBuffer are small to none -- after all, your consumer thread won't
load the last sample of that buffer until it has performed loads for all
the other samples in that buffer, which should give the producer's
stores plenty of time to settle. It might be more dangerous if you were
transferring smaller amounts of data, like if you were reading and
writing one integer at a time.
Nevertheless, it's something you should at least be aware of when you
are writing lock-free code. See
(
http://citeseer.ist.psu.edu/adve95shared.html) for a general
introduction to issues of shared memory consistency (which is what
memory barriers address).
Josh