From: clemens(a)ladisch.de
To: nickycopeland(a)hotmail.com
CC: d(a)drobilla.net; linux-audio-dev(a)lists.linuxaudio.org
Subject: Re: [LAD] Pipes vs. Message Queues
Nick Copeland wrote:
> I got curious, so I bashed out a quick
program to benchmark pipes vs
> POSIX message queues. It just pumps a bunch of messages through the
> pipe/queue in a tight loop.
This benchmark measures data transfer bandwidth. If increasing that
were your goal, you should use some zero-copy mechanism such as shared
memory or (vm)splice.
You might be running into some basic scheduler
weirdness here though
and not something inherently wrong with the POSIX queues.
The difference between pipes and message queues is that the latter are
typically used for synchronization, so it's possible that the kernel
tries to optimize for this by doing some scheduling for the receiving
process.
Not sure about that. The CPU(95%) was all in the kernel, not in the process
itself so any improvements to what it scheduled for the process would only
translate into a small percentage difference. Isn't it more likely that the pipe
code is using an inefficient kernel lock on the pipe to ensure it is thread safe?
Please don't misunderstand my 'not sure about', I am relieved to say I am not
a kernel programmer but understanding these kinds of limitations is interesting
as it bears directly on application implementation (see below).
The results
with 1M messages had wild variance with SCHED_FIFO,
SCHED_FIFO is designed for latency, not for throughput. It's no
surprise that it doesn't work well when you have two threads that both
want to grab 100 % of the CPU.
Dave can comment on what he wanted to actually achieve, I was interested in
whether the results could be shown to be general. I take your points on the use
of SCHED_FIFO but there are still some weirdness
It's no surprise that it doesn't work well
It does work very well, just not with piped messages.
when you have two threads that both want to grab 100 %
of the CPU
My system does have 200% available though, it's was dual core and the question
I raised was why there is a scheduling problem between the two separate threads
with pipes whilst it could be demonstrated that there was no real need to have
such contention.
Perhaps I should revisit another project I was working on which was syslog event
correlation: it used multiple threads to be scalable to >1M syslog per second
(big installation). I was testing it with socketpair()s and other stuff. I would be
interested to know if scheduler changes affect it too.
I actually quite like your idea of shared memory - dump a ringbuffer over that and
it could give interesting IPC. Am not going to test that as it would be a significant
change to Dave's code but on the Intel platform it could give some very high
performance without the need for any recursion to the kernel. The event correlator
would not benefit from the use of shmem since it was threaded, not multiprocessed.
Kind regards, nick