* Ingo Molnar <mingo(a)elte.hu> wrote:
i have solved the fundamental SMP latency problems in
the -Q7 patch,
by redesigning how SMP preemption is done. Here's the relevant
changelog entry:
also this changelog explains the core changes that enable good
preemption latencies on SMP:
[...]
i took another look at SMP latencies, the last larger chunk of code that
produced millisec-category latencies. CONFIG_PREEMPT tries to solve some
of the SMP issues but there were still lots of problems remaining: main
problem area is spinlocks nested at multiple levels. If a piece of code
(e.g. the MM or ext3's journalling code) does the following:
spin_lock(&spinlock_1);
...
spin_lock(&spinlock_2);
...
then even with CONFIG_PREEMPT enabled, current kernels may spin on
spinlock_2 indefinitely. A number of critical sections break their long
paths by using cond_resched_lock(), but this does not break the path on
SMP, because need_resched() is not set in the above case.
(The -mm kernel introduced a couple of patches that try to drop
spinlocks unconditionally at a high frequency: but besides being a
kludge it's also a performance problem, we keep
dropping/waiting/retaking locks quite frequently. That solution also
doesnt solve the problem of cond_resched_lock() not working on SMP.)
to solve the problem i've introduced a new spinlock field,
lock->break_lock, which signals towards the holding CPU that a
spinlock-break is requested by another CPU. This field is only set if a
CPU is spinning in __preempt_spin_lock [at any locking depth], so the
default overhead is zero. I've extended cond_resched_lock() to check for
this flag - in this case we can also save a reschedule. I've added the
lock_need_resched(lock) and need_lockbreak(lock) methods to check for
the need to break out of a critical section.
preliminary results on a dual x86 box show a dramatic reduction in
latencies on SMP - where there used to be 5-10 msec latencies there are
close-to-UP latencies now. But it needs more testing.
[...]
Ingo