On Sat, 25 Jul 2009, Brent Busby wrote:
In addition to the question I posted earlier about
PREEMPT_RCU, I've
found still other kernel config options that are not covered in most of
the extant howtos for setting up low latency kernels, because they are
recently added in the unpatched kernel. (Or at least, they're more
recent than most of the howtos...)
I've been researching these options, mostly by googling (and
googling...and googling) the Linux Kernel Mailing List archives, and
also by looking at the config used by 64Studio, since it seems the
prevalent opinion is that they are stable as a rock. With the latter
approach of checking against 64Studio's config, I've had little luck
though, because their distro currently comes in a 2.1 "stable" version
that is based on Debian "etch", with a 2.6.21 kernel that predates the
existence of some of the config options I'm asking about; and a 3.0
"beta" version that uses a newer kernel, but should also probably be
taken with a grain of salt, because 64Studio is still working the bugs
out of it, and because it seems to have options turned on that the LKML
archives have warned strongly against. So, most of the information I've
gleaned has been from the LKML.
I tried out the 64Studio stable version last fall, and had a lot of troubles
with it. The 2.6.21 based kernel wouldn't even boot on my box, so I ended up
booting up with the 2.6.24 based -rt kernel that I was using on Fedora 8 /
CCRMA. There were too many libraries and apps that were way too out of date
for my tastes, so I abandoned the whole experiment.
I might as well collate these questions here, for what
good it may do.
The kernel I'm configuring is 2.6.29.6 with matching RT patches (-rt23),
though all of the options listed here are actually present even in the
unpatched 2.6.29.6 kernel.
PREEMPT_RCU :
This was the one my original question was about in the earlier post.
In the newer kernels (even mainline ones, without any RT patches), there
is a choice of "RCU Subsystem", with one option being "classic", and
other being "preemptable". The choice would seem obvious for low
latency, except that the help text warns that a preemptible RCU is
likely to expose serious kernel bugs that may render the system
completely unstable. (This is *not* the same setting as the various
CONFIG_PREEMPT options that have been present in the mainline kernel for
awhile now. This is something new, and it appears in menuconfig as a
separate setting.) I think I've mostly gathered from reading through
the whole process of debugging that the kernel gurus went through that
as of last year or so, this is now considered mostly stable, after
having been subjected to a utility called 'rcutorture', and having fixed
many lockups. I'd still be interested in anything anyone else can
contribute about it though, especially if they're using it and they
also think it is stable.
There was a time when PREEMPT_RCU was unstable, and most of the -rt kernel
releases would fix one RCU bug while adding another. These times are now past,
however. PREEMPT_RCU has been completely stable for me on AMD and Intel (both
multicore) for about a year now.
GROUP_SCHED :
This one is interesting, and I don't know what to make of it, other than
that the LKML seems to have decided in the last two months or so that it
slows your system down and makes latencies worse.
The thing that's confusing about it though is that it's
described as a mechanism for grouping high priority tasks by group.
It's implied (though not spelled out specifically) that they even mean
by this Posix groups, because in one document I read, it says that
enabling this will cause you to be unable to get realtime as a non-root
user unless you are setup in a group specified in limits.conf. Hmm!
That sounds an awful lot like what we've just been calling pam_limits
for years now. Are we doing this with a kernel config option now? (One
which apparently doesn't work?)
The 3.0 (beta) version of 64Studio turns this on. Then again,
looking at their kernel config, they seem to turn everything on, and
that might be why it's considered beta. (The 2.1 stable version of
64Studio seems to have a kernel old enough that it never had that
option as a choice.) Anyone have any feedback about this? It's only in
the past two months that the kernel gurus have decided that it's bad,
but they're actually considering marking it broken from what I've read.
Here's another vote for completely broken! Every time I set up a box with the
-rt kernel, I start with the distro's .config and tweak from there. And every
time, I can't make realtime low-latency audio a reality until I disable
GROUP_SCHED.
CGROUPS :
This sounds cool, but I'm reasonably sure it's not actually necessary.
Documentation I've read suggests that this allows for letting
applications (or users) define CPU and memory pool affinity for tasks,
so that one could arbitrarily tie down particular threads or tasks to a
given processor core, or region of memory (or something like that).
However, the thing that makes me somewhat sure I don't positively need
this is that the same documentation also says that if you want to use
it, you need to create a new subdirectory under /dev, mount a new
pseudofilesystem under it, and then this module will populate that space
with dynamic configuration data about these affinity groups for running
tasks. I have neither seen any distro (even ones made for musicians)
that has set any such thing up out of the box, nor have I ever seen a
realtime howto that tells people to do it themselves. It sounds like
there's a lot of infrastructure necessary for this that's not common in
distros yet. (Though I see that regular Debian "lenny" turns this on in
the kernel without actually providing the special /dev support for it,
which I presume they're thinking you'll setup yourself if you're
interested.) Still, comments welcome...
As far as I can tell, CGROUPS doesn't affect realtime latencies enough to make
any sort of difference for realtime audio. It does, however, mess with the
scheduler, even without setting up the affinity groups. I've noted some
strange scheduling behavior: When running multiple instances of the same CPU
bound task (like burnP6, mprime, dd, or your favorite stress tester), the tasks
usually start off fairly evenly distributed between the cores, and then the
affinity seems to drift, lumping these tasks on one half of the cores while
leaving the other half mostly idle. On a core2quad with CGROUPS enabled, I
generally have to run 12 instances of burnP6 to keep utilization of all four
cores pegged at 99-100%. I wouldn't go as far as to say that CGROUPS is
broken, but I fail to see the usefulness on a production audio system with the
IRQ and other realtime priorities tuned properly, especially now that the core
scheduling features for -rt are stable.
--
+ Brent A. Busby + "We've all heard that a million monkeys
+ UNIX Systems Admin + banging on a million typewriters will
+ University of Chicago + eventually reproduce the entire works of
+ Physical Sciences Div. + Shakespeare. Now, thanks to the Internet,
+ James Franck Institute + we know this is not true." -Robert Wilensky
I've also run into a handful of config options that either help or hurt
realtime performance, depending on the specific hardware:
NO_HZ (seems to work properly on most hardware these days)
PCI_MSI (some PCI devices seem to add more latency with MSI)
RTC_DRV_CMOS (this depends on your particluar RTC hardware)
HPET_EMULATE_RTC (this depends on your particluar HPET hardware)
X86_ACPI_CPUFREQ (creates TSC drift between cores on Intel)
And of course, there's all kinds of profiling, stats, testing, and debugging
options that make any realtime load suffer. Occasionally, you'll find a device
driver that's just plain bad for -rt performance, but I've been running into
this problem a lot less these days.
Equally as important for low-latency are the scheduling priorities. After a lot
of google, lkml, and trial and error, I've settled on the following for
rock-solid low-latency audio:
99 FF migration
99 FF posixcputmr
98 FF IRQ-8 (realtime clock)
97 FF audio IRQ (in some cases means ieee1394 or specific USB port)
80 RR JACK
All audio and MIDI apps should run at a lower realtime priority than JACK. All
other IRQ- and sirq- threads should be set to
SCHED_OTHER. To set this up, I've added the following to my /etc/rc.local:
---
# set all irq threads to sched_other
for irq in `pgrep 'IRQ-'`; do
chrt -o -p 0 $irq;
done
# set all softirq threads to sched_other
for sirq in `pgrep 'sirq-'`; do
chrt -o -p 0 $sirq;
done
# set high prio IRQs
chrt -f -p 98 `pgrep IRQ-8` # rtc
chrt -f -p 97 `pgrep IRQ-16` # hda-intel
---
And of course, there's those pesky BIOS options. Some of the newer Intel CPU
features like to generate interrupts that the kernel can do absolutely nothing
about. I had to disable C1E, CPU TM function, execute disable bit, and one of
the ACPI options (can't remember which one) to get my core2quad to work.
Disabling video interrupts (only an issue on older hardware) always seems to
help. AMD systems seem to require much less (if any) BIOS tweaking for -rt.
Please, anyone, correct me if I'm wrong on any of this. Most of this knowledge
is gained from four years of following -rt development and tuning -rt kernels
on a variety of hardware. I'm always adding to my bag of tricks when it comes
to tuning realtime machines ;-}
Cheers,
--ww