[linux-audio-dev] lock-free data structures

Joshua Haberman joshua at reverberate.org
Fri Jun 18 20:47:28 UTC 2004


> Joshua Haberman wrote:
>
>>
>> At the moment I am favoring an approach that uses multiple discrete
>> buffers and passing pointers to them, rather than pushing data on a
>> ring buffer.  One major reason is that it allows you to send the data
>> to multiple places.  Say you want to send a captured buffer to both
>> the disk thread (for recording) and the GUI thread (for displaying a
>> VU meter).  With ring buffers you would have to copy this data to two
>> separate ring buffers, one for each thread that will read it.  With
>> multiple buffers, you can just send the same pointer to both threads.
>
> Yes you would need to copy buffers but messing around with slow GUIs is
> always a pain so in that case I'd choose to copy.
> Why ?
> Let me make an example.
> Assume I want to show a FFT or an oscilloscope that displays incoming
> audio data (or from a disk track).
> As long as we display only one FFT/oscilloscope the overhead caused by
> the memory copy is low.
> Of course if we have to copy 100 stereo tracks you get about 17MB/sec
> which can add a measurable overhead.
> But the question is: did you try to open 100 FFT meters at the same time ?
> I guess that would make the memory copy overhead look minuscule compared
> to the graphics output.

True, but this isn't a reason to choose a ring buffer *over* the
multiple buffer approach.  The multiple buffer approach is convenient to
me for other reasons also.

>> Also, I have a strategy in mind for gracefully degrading when the GUI
>> thread can't pull data from the FIFO fast enough (again, for VU meters
>> or some other kind of visualization).  Instead of queuing up more and
>> more unconsumed buffers, I will give the FIFO an upper limit, a number
>> of buffers past which it will discard the oldest buffers.  It's not
>> immediately clear to me how you could implement this same strategy
>> with a ring buffer.
>
> In the case of FFT/oscilloscope GUIs (or any kind of GUI that displays
> audio data) I'd do the following:
>
> RingBuffer audio_ringbuffer;
> RingBuffer gui_ringbuffer;
>
> disk thread:
> while(1) {
>  read data into audio_ringbuffer
>  check if there is enough space in the GUI ringbuffer which must be >=
> than
>  the data just read.
>  if yes then memcpy(data from audio ringbuffer to GUI ringbuffer)
>  else don'rt write this block of data to the GUI ringbuffer
> }
>
> In that case you have one  single memory copy for mirroring the data
> you've  put into the
> audio ringbuffer to the GUI ringbuffer.
> And in case the GUI ringbuffer gets completely filled (because the GUI
> thread is unable to
> process all the data it gets) you simply discard the current data block.

With your scheme, once the GUI thread finally wakes up it will be left
with the *oldest* data from the time since it has been asleep.  My
scheme can give it the *newest* data.  How important will this be in
practice?  I don't know.  But at least I will have the option.

> With lists of buffer pointers what will happen if the display thread
> stalls for a long time ?
> You have to keep track of many buffers which cannot be actively used by
> the disk streaming thread / audio thread
> so you waste lot of memory.

Not true -- as I described before, I will give the queue an upper limit
on the number of buffers it can keep queued.  Buffers older than this
will be returned to the free pool of buffers.

>> Also, though I'm not sure yet if this matters for what I'm doing, your
>> ring buffer can only handle one reader and one writer.  The algorithms
>> we are implementing can handle multiple readers and writers (though we
>> are writing single-reader/single-writer versions also, since they are
>> simpler and more efficient).
>
> The question is: in what cases do we need multiple readers, multiple
> writers lock-free FIFOs when dealing with audio apps ?

I'm not entirely sure yet, but here are some possibilities.  Remember
that you're often passing more than just audio data around, you're
passing messages between threads.

- you definitely need a multiple-reader/multiple-writer stack to
implement a buffer freelist.  A buffer freelist is a pool of unused
buffers, and you want to be able to allocate buffers from and and return
buffers to it from any thread.  In your case, you may not need a buffer
freelist because you're using ring buffers to pass data between threads.

- there may be cases where you want both your GUI thread and your disk
thread to be able to post messages to the audio thread's message queue.

>> One final thing: I'm not sure if this is a problem in practice, but
>> most (if not all) processors do not guarantee that memory reads and
>> writes are performed in order, as observed by other processors and by
>> I/O devices.  This could theoretically lead to a problem where the
>> stores that write data to the ring buffer are not performed until
>> *after* the store that updates your write pointer.  Your reader thread
>> could get old data rather than the data your just wrote.  To guarantee
>> that the loads and stores are observed in the required order, you have
>> to use memory barriers (again, in theory).  See:
>>
>> http://en.wikipedia.org/wiki/Memory_barrier
>>
>> (Ross wrote that and we're both still learning about this stuff, so
>> don't take that article as the word of God).
>
>
> Our RingBuffer simply updates a read_ptr and a write_ptr so the worst
> case that can happen is that the read_ptr still points to the old
> location even if
> the writer already wrote the data. (assume a SMP system).
> Is is not harmful because the reader will get that data during the next
> cycle.

See my reply to Tim.  You are not recognizing the danger I am
describing.  Assuming a sequential execution of loads and stores, what
you say is true.  However, processors *do not guarantee* that loads and
stores are performed in program order!

Take the pseudo-assembly:
      stw     sample, 0(ringBuffer)  ; copy last sample into the ringbuf
      stw     newptr, 0(write_ptr)   ; signal that the write has completed

Because modern CPUs allow out-of-order execution of loads and stores (as
observed by other CPUs and I/O devices), the processer gives you no
guarantee that the first store will be visible to another CPU before the
second store is.  If your consumer thread is running on another CPU and
sees the second store first, it will start reading from the ring buffer
before all the stores to that ring buffer have completed!

It may be that the chances of this error scenario happening for a
RingBuffer are small to none -- after all, your consumer thread won't
load the last sample of that buffer until it has performed loads for all
the other samples in that buffer, which should give the producer's
stores plenty of time to settle.  It might be more dangerous if you were
transferring smaller amounts of data, like if you were reading and
writing one integer at a time.

Nevertheless, it's something you should at least be aware of when you
are writing lock-free code.  See
(http://citeseer.ist.psu.edu/adve95shared.html) for a general
introduction to issues of shared memory consistency (which is what
memory barriers address).

Josh




More information about the Linux-audio-dev mailing list