[Freebob-devel] [linux-audio-dev] ieee1394 deadlock on RT kernels

Pieter Palmers pieterp at joow.be
Mon Jun 26 20:35:09 UTC 2006


Lee Revell wrote:
> On Mon, 2006-06-26 at 21:44 +0200, Pieter Palmers wrote:
>> Lee Revell wrote:
>>> On Mon, 2006-06-26 at 21:05 +0200, Pieter Palmers wrote:
>>>> Lee Revell wrote:
>>>>> On Mon, 2006-06-26 at 16:51 +0200, Pieter Palmers wrote:
>>>>>>  
>>>>>> Of course. My monday-morning bad temper is over by now, and I hope I 
>>>>>> didn't transfer it to any of you. I'll provide the panic, one way or 
>>>>>> another.
>>>>>>
>>>>> Can you reproduce the problem on a non-RT kernel?
>>>>>
>>>> No, it only occurs with RT kernels, and only with those configured for 
>>>> PREEMPT_RT. If I use PREEMPT_DESKTOP, there is no problem. (with 
>>>> threaded IRQ's etc... only switched over the preemption level in the 
>>>> kernel config).
>>>>
>>>> I've uploaded the photo's of the panic here:
>>>> http://freebob.sourceforge.net/old/img_3378.jpg (without flash)
>>>> http://freebob.sourceforge.net/old/img_3377.jpg (with flash)
>>>>
>>>> both are of suboptimal quality unfortunately, but all info is readable 
>>>> on one or the other.
>>> Can you add debug printk's before and after tasklet_kill() in
>>> ohci1394_unregister_iso_tasklet to see where it locks up?
>>>
>> That's the first thing I did: the printk before tasklet_kill succeeds, 
>> the one right after the tasklet_kill doesn't.
> 
> OK that's what I suspected.
> 
> It seems that the -rt patch changes tasklet_kill:
> 
> Unpatched 2.6.17:
> 
> void tasklet_kill(struct tasklet_struct *t)
> {
>         if (in_interrupt())
>                 printk("Attempt to kill tasklet from interrupt\n");
> 
>         while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
>                 do
>                         yield();
>                 while (test_bit(TASKLET_STATE_SCHED, &t->state));
>         }
>         tasklet_unlock_wait(t);
>         clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
> 
> 2.6.17-rt:
> 
> void tasklet_kill(struct tasklet_struct *t)
> {
>         if (in_interrupt())
>                 printk("Attempt to kill tasklet from interrupt\n");
> 
>         while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
>                 do      
>                         msleep(1);
>                 while (test_bit(TASKLET_STATE_SCHED, &t->state));
>         }
>         tasklet_unlock_wait(t);
>         clear_bit(TASKLET_STATE_SCHED, &t->state);
> }
> 
> You should ask Ingo & the other -rt developers what the intent of this
> change was.  Obviously it loops forever waiting for the state bit to
> change.
> 

because you are not allowed to yield() in an RT context?

I wish I had been a little more elaborate on my initial mail, as it 
would have saved us some time, and communication troubles (on my part 
that is). I already spotted the msleep() change in the patch, and I 
already tried reverting it. That gives you a nice new panic message, 
something like 'BUG: yield()'ing in ...'.

I'm wondering why a patched, but not 'complete preemption' configured 
kernel works fine. This change is present in them too, so it probably 
has something to do with the msleep() implementation.

Another strange thing is: why doesn't the tasklet finish, so that it can 
be 'unscheduled'? I have my IRQ priorities higher than any other RT 
threads, so I would expect that the tasklet can finish. Or is 
tasklet_kill not-preemtible? that would be very strange as I would 
expect that busy waiting on something in a non-preemptible code path on 
a single-cpu system always deadlocks.


Greets,

Pieter



More information about the Linux-audio-dev mailing list