Jack O'Quin wrote:
On Sun, Oct 19, 2008 at 2:55 AM, Paul Davis
<paul(a)linuxaudiosystems.com> wrote:
i don't know whether to shoot myself or eat
another couple of the
oft-promised hats.
Don't beat yourself up too badly. Multiple threads accessing shared memory
is *very* tricky. Even smart people (like you) get it wrong sometimes.
I agree with Jack. I remember a friend of mine once qualifying multi-threading
of "kernel quantization". This is complex but also challenging :)
Although your understanding of these matters is far beyond mine, I think that I
have a good professional experience of coding methodology. And to me the problem
here is indeed methodological: there is a lack of unit tests in JACK.
so the next
question is how best to prevent it. as far as i can see we
have a couple of proposals:
1) fons' design, which never actually wraps readptr or writeptr, but
masks the address used to access the data buffer
2) removing the intermediate state's visibility
i admit to preferring (2) even though i know that with a 64 bit index,
not wrapping the ptrs is not really a problem.
I also prefer (2).
however, it is not totally clear to me how to
prevent an optimizing
compiler from doing the wrong thing here. unlike the claims made by
someone involved with portaudio, i believe that it is correct to declare
the read_ptr and write_ptr volatile, so that the compiler knows that it
cannot try to be clever about it accesses of the "other" ptr (i.e.
read_ptr for the writing thread, and vice versa). maybe the comment on
the use of volatile was based on some idea that i thought it made the
variables thread safe, which is and never was the case.
i suspect that the safest code looks like:
size_t tmp = (ptr + incr) & mask;
barrier();
ptr = tmp;
but i am not sure whether barrier needs to be read, write or both.
i think that the simpler code:
ptr = (ptr + incr) & mask;
is subject to potential compiler and/or processor "optimization" that
might reduce it back to the problem case of two ops without an
intermediate load/store location. the volatile declaration ought to
prevent the compiler from doing this, and i don't see why a processor
would do this, ever, but clearly i've already been deeply wrong about
this. does anybody know for certain?
It is best to avoid assumptions about what some future compiler may
consider an "optimization". If the register pressure is high at some
point in the program, it may decide to store some value just to free up
register space for other variables.
The "volatile" declaration should remove any need for the compiler
barrier() statement, AFAICT. Note that barrier() is a compiler
directive, and has no effect on the CPU's ability to reorder cache
operations in an SMP memory hierarchy.
Here is a fairly clear and complete description of the memory barrier
issues:
http://lxr.linux.no/linux/Documentation/memory-barriers.txt
As best I can tell, both ring buffer threads require "general" memory
barriers, because they both read and write (different) shared data. In
Linux kernel terms, they'd need to use smp_mb(), since the problems
they address are multiprocessor-only. But, since JACK is not compiled
differently for UP and SMP, the full mb() seems more appropriate.
That stuff makes my head hurt.
Maybe that is because your approach is too theoretical. I've read many theories
about whether memory barriers are needed or not in lock-free ring buffers, and I
think it might lead nowhere without experimenting, that is: write test cases.
Until now, I just couldn't write a test that fail because of the lack of memory
barrier on x86. However, I think I found another bug in Jack ringbuffer, by
writing another test.
It's a bit of a weird test, I call it "bit circle": I start 16
"node" threads.
Each node :
- communicate with its adjacent node through 2 ring buffers
- is responsible for shifting an integer by one bit
- send the shifted result to the next node through a ring buffer
- checks that the value read from previous node is correct.
On my Debian Quad Core and Mac OS X Core Duo boxes, this test fails with Jack's
ringbuffer even with my patch applied. But it succeeds with Portaudio (both with
and without memory barriers), and also with Fons' implementation.
I wish someone could eventually run my test suite on a PowerPC, especially SMP.
The usage is unchanged:
svn co
http://svn.samalyse.com/misc/rbtest
cd rbtest
make test
The run time is now shorter, about 2 minutes. Below is the output. You can see
that jack (test-bit-circle-jack) fails the bit circle test, even when patched
(test-bit-circle-jack-fix1). "|" is printed when a node thread starts, and
"-"
when a node has read 1000 values, to ensure data is really flowing.
The test-int-array-* tests are the same as my original test. All *-lfq tests use
Fons's Lfq ringbuffer (modified to use char instead of uint32_t) as backend.
./run-tests.sh bit-circle 512 jack jack-fix1 portaudio portaudio-nobarrier lfq
test-bit-circle-jack - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514
test-bit-circle-jack-fix1 - starting (5s max) - buffer size: 512
|| FAILURE: 2 != 514
test-bit-circle-portaudio - starting (5s max) - buffer size: 512
||-|-|-||-|-|-|-|-||--|||--|---- SUCCESS
test-bit-circle-portaudio-nobarrier - starting (5s max) - buffer size: 512
|||--|-|||---||-||-|-|--||--|--- SUCCESS
test-bit-circle-lfq - starting (5s max) - buffer size: 512
|||--|||-|||-|||--|||-------|--- SUCCESS
./run-tests.sh int-array 16 jack jack-fix1 portaudio portaudio-nobarrier lfq
test-int-array-jack - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] 52572 != 52568 at offset 0
FAILURE in chunk 91822
test-int-array-jack-fix1 - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-portaudio - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-portaudio-nobarrier - starting (20s max) - array/buffer size: 8/16
[reader started] [writer started] [flowing] SUCCESS
test-int-array-lfq - starting (20s max) - array/buffer size: 8/16
[writer started] [reader started] [flowing] SUCCESS
--
Olivier Guilyardi / Samalyse