Hi there,
On Sat, Apr 05, 2003 at 06:08:51PM +0100, Steve Harris wrote:
On Sat, Apr 05, 2003 at 06:15:09 +0200, Ingo Oeser
wrote:
Now make that thread-safe and esp. thread-safe on
an architecture
with weak memory ordering and all the fun stuff.
Sure, it will only work on architectures where 32bit reads and writes are
atomic.
That is not even true on all ix86 machines. At least I've seen
special memory ordering barriers used in the kernel for newer
ix86 machines.
Smalled is not the issue, its branches that are
important in inner loops.
Right, but if your compiler unrolls your loop, uses indexes to
load the data in a random order and increments your pointer later
by a bigger chunk.
e.g.
a1 = buffer[read_ptr++ & (size - 1)];
a2 = buffer[read_ptr++ & (size - 1)];
a3 = buffer[read_ptr++ & (size - 1)];
a4 = buffer[read_ptr++ & (size - 1)];
a5 = buffer[read_ptr++ & (size - 1)];
a_ = (a1 + a2 + a3 + a4 + a5) / 5;
can become:
read_ptr += 5;
a5 = buffer[(read_ptr - 5) & (size - 1)];
a3 = buffer[(read_ptr - 3) & (size - 1)];
a4 = buffer[(read_ptr - 4) & (size - 1)];
a2 = buffer[(read_ptr - 2) & (size - 1)];
a1 = buffer[(read_ptr - 1) & (size - 1)];
a_ = (a1 + a2 + a3 + a4 + a5) / 5;
without any problem (compiler tries so parallize load/stores,
because the architecture has two load/store units).
These are mathematically equivalent transformations backed by the
C99 and C89 standard and now your write_ptr will overwrite 5
bytes of your buffer. This may not be important for sound output,
because it will only sound wrong, but for recording this is
really bad, since it will record wrong data.
I dont think compiler will optimise away the trhread
safeness, unless I've
missed something. A vectorising compiler might unroll the loops but it
will still keep the ordering, and the aligned vector operations are still
atomic AFAIK.
The ordering will only be kept, if the data access path requires
it or you call (non-inlined) functions in between or you use
volatile.
The kernel people have seen lots of thread safeness being
optimized away, so I never assume anything about atomicy in
C constructs, that is not backed by the standard.
But our solutions combined will give maximum effectiveness. Your
masked adressing in indexes removes most of the branches left in
my lock_free_fifo scheme.
It might be worth to code that all up to have a GPLed high
performance fifo scheme.
The glibc has internal support for atomic operations (at least
atomic_add() is there and allows a negative argument, so
atomic_sub() is there too and exchange_and_add(&lff->filled, 0)
will provide an atomic read), so there is no problem about
portability on Linux systems (and even other systems).
Regards
Ingo Oeser