On 10/07/2011 08:07 PM, Robin Gareus wrote:
  On 10/08/2011 03:25 AM, Michael Ost wrote:
  Hi list,
 We are seeing unexpected interruptions of SCHED_RR audio processing
 threads, and are struggling to understand why they are happening. Does
 anyone have any good tips or tools to suggest to help figure out what is
 preempting or delaying realtime audio threads? 
 https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html#builtintools
 It's a bit dated but so is your 2.6.24.3 kernel. You will need to
 compile the kernel with CONFIG_PREEMPT_TRACER=y CONFIG_SCHED_TRACER=y
 under "Kernel hacking ->  Tracers ->  .." to get access to
 /sys/kernel/debug/tracing/
 The man page of cyclictest (8) includes some hints: see the '-b' option
 on how to produce traces from the scheduler. Also check out
 ..linux-rt-source/Documentation/trace/histograms.txt 
Excellent!
   The issues are
coming up with Receptor [see (*) below for an intro]
 running its 2.6.24.3 CCRMA based kernel. The bug appears with Receptor
 in its "dual core mode", with three instances of Native Instruments'
 Kontakt 4 in DFD + multi-core mode. Either lots of held notes, or large
 patches are needed to get the spikes.
 With these settings there are 5 SCHED_RR threads processing audio (on a
 two-core system). 2 are from Receptor, and 1 from each instance of
 Kontakt. These Kontakt "helper threads" are released to do work as
 possible while the audio thread is processing.
 Kontakt/DFD is using mmap to bring its files into memory. This is done
 in a lower priority "DFD thread", and the mapped memory is used by the
 r/t audio threads.
 DFD is important because the problems don't happen without it. And the
 high SCHED_RR thread count is important, because the problems also don't
 happen if we reduce the count. 
 is the DFD thread calling mlock() on the mapped memory?
 maybe madvise/mlock fail because you're trying to lock more pages than
 RLIMIT_MEMLOCK permits? just a guess. 
 
Nope, I looked into that. Nothing quite that straightfoward, unfortunately.
  Documentation/trace/mmiotrace.txt should help to find
out if a process
 blocks due to memory mapped i/o. The debug interface is nifty and the
 time consuming part is compiling a kernel with CONFIG_MMIOTRACE=y :) 
Again, thanks for the tip. Excellent.
   When the spike
happens there are no:
 * wine bottlenecks
 * system calls
 * threads blocking on each other
 * page faults during audio processing
 There do appear to be "involuntary context switches" (as reported by
 getrusage) when the spikes happen. This makes it seem like the scheduler
 is interrupting our threads. But how do you figure out why that is
 happening?
 There aren't many threads in the system with higher priority. All of the
 5 processing threads are SCHED_RR/76. The higher priority threads in the
 system are:
 * migration/0 - FIFO/99
 * migration/1 - FIFO/99
 * watchdog/0 - FIFO/99
 * watchdog/1 - FIFO/99
 * posix_cpu_timer (x2) - FIFO/99
 * IRQ8 (rtc) - FIFO/99
 * IRQ20 (our audio card) - FIFO 77
 Could other kernel activity interrupt the audio threads? Are there
 issues with memory mapping, that can block other unrelated threads? Are
 there just too danged many SCHED_RR threads fighting for two cores?
 Anyone have any suggestions for how to trace the scheduler, and thread
 or process interruptions?
 Apologies for the lengthy post, but this is a tricky subject.
 Thanks for any tips or insights, 
 It would not surprise me if that is one of the many issues that got
 fixed since 2.6.24. That kernel still featured the BKL [BigKernelLock]
 and the problem you are describing is not too far out..
 3.0.6-rt17 requires a bit more work, but 2.6.39 is very stable. 
 
We have tried 2.6.33 with the same results. I'll see if we can try
something even newer.
Great info. Thanks alot!
Michael Ost