[linux-audio-dev] lock-free data structures

Benno Senoner sbenno at gardena.net
Thu Jun 17 17:50:42 UTC 2004
Previous message: [linux-audio-dev] lock-free data structures
Next message: [linux-audio-dev] lock-free data structures
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Paul Davis wrote:

>>One thing I am still looking to learn more about is how to adjust 
>>thread priorities and such to make sure that your threads are run often 
>>enough (especially the disk thread), and how to decide how big your 
>>disk buffers need to be.
>>    
>>
>
>4 years ago, Benno and I measured this and concluded that under some
>circumstances it was possible to have a small-multi-second delay in
>disk access. Ardour uses 5 second disk buffers. With buffers of this
>size, the scheduling priority of the disk thread is not really relevant.
>
>We also determined that 256kB seemed to be the optimal i/o block size
>for ext2. Whether this is true for ext3/reiserfs/xfs and others i do
>not know.
>  
>
I recently made more tests (I'll release benchmarking code, when it's 
cleaned up a bit) and it seems that on decently fast disks,
like the 7200rpm IDE disks (my Maxtor 80GB IDE disk does up to 40MB/sec 
sustained), if there is a lot of seeking
(which HDR apps like ardour and disk based sampling apps like 
LinuxSampler do), you need to read more,
like 512KB - 1MB at time, because the disk seek time is relatively high 
(12-14msec) vs the transfer rate (40MB/sec).
This means if the buffers are too small then you have lots of disk seeks 
between reads and you loose performance

my benchmark does the following: creates 4 files of 750MB each (3GB total),
then seeks around randomly in all 4 files simultaneously and reads 2 
bytes after each seek (just to ensure that the
disk scheduling algorithm will not fool us). Of course there is still 
the file cache that could inflate numbers but if the
file sizes are big enough (so that they don't fit in RAM) and provided 
you do thousands of seeks you get quite realistic
values that reflect those achieved by a HDR , disk sampling app:
for example these are the numbers of my Maxtor IDE 40GB 7200rpm

seeks/sec=75.5 average seek time=13.2 msec

in the streaming test I read a chunk of X KB from each of the 4 files
while(1) {  file1.read() ; file2.read() ; file3.read() ; file4.read();  }

this causes the disk head seeking beetween the files after each read, 
just like a HDR app does
when reading tracks. these are the numbers I get (read speed).

streaming with 128 KB buffers ....
performance: 8.50 MB/sec  stereo voices at 44.1kHz = 50.53
required memory for buffering: 6.32 MB

streaming with 256 KB buffers ....
performance: 14.43 MB/sec  stereo voices at 44.1kHz = 85.79
required memory for buffering: 21.45 MB

streaming with 512 KB buffers ....
performance: 21.10 MB/sec  stereo voices at 44.1kHz = 125.41
required memory for buffering: 62.71 MB

streaming with 1024 KB buffers ....
performance: 28.49 MB/sec  stereo voices at 44.1kHz = 169.38
required memory for buffering: 169.38 MB

streaming with 2048 KB buffers ....
performance: 32.88 MB/sec  stereo voices at 44.1kHz = 195.44
required memory for buffering: 390.87 MB

As you can see the performance increase between 512KB and 1MB is still very
big , around 35% , so reading 256KB at time is definitively too little 
these days.

Paul, does ardour allow to specify the size of the per-track-buffers you 
use for
disk streaming, if yes perhaps you should add this as an option since 
it's handy for the user
having the possibility to increase the default values to achieve optimal 
track count.
For example using large RAID arrays, the difference between seek time 
and raw disk tranfer speed
gets even bigger so even bigger buffers are needed to achieve the max 
track count.
(keep in mind I'm not familiar with the ardour codebase nor with 
advanced settings so
my question might be redundant in case ardour already supports it)

Joshua:
if you want an easy to use Lock-Free FIFO template in C++ look at
RingBuffer.h  in the LinuxSampler CVS
We use this template heavily.
For example we set up a large ringbuffer for streaming the audio from disk
(one ringbuffer for each voice).
The disk thread reads directly into the ringbuffer and the audio thread 
fetches the data
in a lock-free way. This ensures zero-copy operation. Plus we added a 
wrap space so that
a section of the beginning of the buffer is replicated after the 
official upper bound so that the
audio thread can read a bit past of it and still gets the correct audio 
data (as it was linear),
this speeds up the audio interpolation since for the audio thread it's 
like reading from a linear segment,
no nasty if() checks etc, it's all done (from time to time, so no 
pratical CPU overhead) when the disk thread
writes the data to the ringbuffer.

But we don't use the lockfree ringbuffer only for audio: we use it 
(since it's a template you can create ringbuffers
of any kind of struct) to send commands between the midi thread (note 
on/off etc) and the audio thread and
to send commands to the disk thread (start/stop streams).
Works really well and the resulting code is clean too.

The RingBuffer class uses atomic_*() macros so it is safe on any 
architecture (but on most 32bit word accesses
are atomic anyway so the macros simply translate to load and store ops, 
afaik the SPARC SMP is one of the only
archs that needs special care, (and can access only 24bit atomically, 
thus your ringbuffers are limited to
16million elements).

PS: about disk streaming benchmarks ... I ported my benchmark to win32 
too and make some tests there too,
the irony is that using buffered I/O when lots of disk seeks occur you 
get really sucky perforomance, as low
as 30% of the normal sustained disk transfer speed.
It seems that the read ahead algorithm reads too much and get the disk 
head scheduling wrong
(I used Win XP so I assume it has the most performant file I/O among the 
windows family).
Using direct I/O (without buffering) you get decent performance, but you 
lose the benefits of the file cache.
For example in the case of a disk based sampler where you often hit the 
same notes (thus audio files) over short
periods of time you can save lots of accesses. The Linux file cache does 
an excellent work here.
I guess those windows based disk sampler all implemented their own file 
cache, while on linux the OS does all
the work for you :)

cheers,
Benno
http://www.linuxsampler.org
Previous message: [linux-audio-dev] lock-free data structures
Next message: [linux-audio-dev] lock-free data structures
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Linux-audio-dev mailing list