Hi Ingo,
* Rui Nuno Capela wrote:
I've been testing 2.6.9-rc1-mm1-S3 SMP on a
P4 HT SUSE 9.1 Pro box. In
fact I've been struggling with the VP's ever since O3, while on
SMP/HT. It hasn't been a pleasent experience, to say the least.
[...]
i tried to reproduce your problems before with no luck. I tried your
.config on my dual P4/HT box. I am also testing the VP patchset on a
dual Celeron, an 8-way Xeon, an on an AMD64 box. So it is something
special to your setup.
the initial question is whether the VP patchset is stable if you apply
-S3 but disable all the following options:
CONFIG_PREEMPT_VOLUNTARY
CONFIG_PREEMPT_SOFTIRQS
CONFIG_PREEMPT_HARDIRQS
CONFIG_PREEMPT_BKL
CONFIG_PREEMPT_TIMING
CONFIG_LATENCY_TRACE
this should give you a close to vanilla kernel. Then once you have
established a 'known point of stability', could you enable the above
options one by one (and keeping the previous options enabled) and see
which one introduces the instability? My guess would be
PREEMPT_HARDIRQS.
I'll try that.
All my VP trials have been conducted with ALL those CONFIG_ options turned
ON. I've been just tweaking on softirq-preempt and hardirq-preempt boot
prompt arguments. However tweaking these only affects the ability for
running jackd -R without instant lockup. Again, the only viable
combination has been softirq-preempt=0 and hardirq-preempt=0.
Rest assured that's when I unset CONFIG_SCHED_SMT things get pretty much
better.
I've been testing 2.6.9-rc2-mm1 all along too, with SMP, SCHED_SMT and
PREEMPT. Besides the CD/DVD burning annoyance and only that, this is the
most recent, reliable and stable combination I could reach to date. This
is why I think IMO that's the VP approach which is at stake, not my
specific hardware setup.
to debug the hard-lockups, the most reliable way is to
use
nmi_watchdog=1 and a serial console to another box. It's quite painful
to debug lockups without a serial console.
I'm sure you may know it far better than me ;)
OK. I have to make some time to help you and myself on this. Is this
nmi_watchdog=1 a kernel boot prompt argument? I didn't seem to find any
kernel config option about it. Would you let me know how to do this, in a
consistent way?
I do know how to setup the serial console part, but never tried to debug
lockups before. Is there anything that I should be aware before I give
away my time? Is there any special kernel debug configuration option(s)
for which you find most helpful? Anything you may come to mind?
I'm willing to take the plundge, but probably only during this weekend.
Take care.
--
rncbc aka Rui Nuno Capela
rncbc(a)rncbc.org