On Tue, 2009-06-23 at 15:27 -0400, Paul Davis wrote:
On Tue, Jun 23, 2009 at 2:50 PM, Fernando
Lopez-Lezcano<nando(a)ccrma.stanford.edu> wrote:
This is what Lennart wrote in his original announcement:
> Why not use cgroups for this? Because it's simply a horrible API, and
> using this for media applications has non-obvious consequences on
> using cgroups for their originally intended purpose -- which are
> containers.
The problem with this is that the mechanism described there is
actually not dependent on the use of cgroups at all. It will limit all
!SCHED_OTHER threads whether they are in a cgroup or not.
Not my reading of it. The default 100%/95% split will be allocated to
root only (the only entity - this is not a group or a cgroup - that is
there by default). My understanding is that non-root users will not be
able to run realtime unless that is changed from the default (even if
other mechanism allows them to run rt). I don't know how you enable the
whole thing in the first place.
The document provides two ways of reallocating realtime real state
(configurable at build time), one by uid and the other by using
cgroup's. So, yes, you apparently depend on cgroups to get the
functionality.
Moreover,
its not clear to me how we should take Lennart's assessment "its
simply a horrible API" when its already been merged into (at least)
the Fedora kernel and perhaps the mainline one (I have not checked).
Merged != Beautiful :-) I did not want to comment on that part
initially. He does not say "the API is not usable for the purpose at
hand" but rather "it is horrible", which is a subjective thing. It may
be horrible (and that is what it looks like from a brief look at
Documentation/cgroups.txt - try it), but usable. Or may be horrible and
not usable, we don't know.
Its already wierd enough to have RLIMIT_RTPRIO and
RTKit both capable
of providing access to !SCHED_OTHER, but when RTKit seems to aiming at
another target already covered by a small part of the already merged
cgroup stuff, it gets even wierder.
I don't know enough to comment on this. It could be a "not invented
here" thing, or there could be sound technical reasons (the "non-obvious
consequences" he does not dwell on or explain).
Do you have
any idea what containers are? No clue here...
It would seem to me that sched-rt-groups (with cgroup support) does
address the problem, but differently. SCHED_RESET_ON_FORK would prevent
the creation of a rt fork bomb, sched-rt-groups + cgroups would just
contain it within the boundaries of the maximum rt time alloted to that
group.
right. but in both cases, the remaining problem child is a clone bomb,
which in the case of RTKit requires the watchdog,
Hmmm, did Lennart specifically answer the issue of the clone bomb? I
can't remember and the thread is looong (I had a couple of points that I
made that seemed to be valid and never got a confirmation reply)... If
the clone bomb is still an issue then the whole SCHED_RESET_ON_FORK
concept seems less than useful. Ingo was also involved so I imagine they
thought about this thoroughly? On the other hand it would not be the
first time Ingo just replaces a whole kernel subsystem from scratch
(usually with a nice outcome :-)
Argh. I should try to find the SCHED_RESET_ON_FORK thread(s) on lkml to
see what was argued.
whereas in the stuff
discussed at that URL is handled by the scheduler. i have to be honest
- the watchdog is the more "linux" like choice, because it puts policy
out in user space. but one could argue that for technical and
reliability reasons, the scheduler is where this really ought to be.
Yes, it is debatable. I would put mechanisms for policy in the kernel
and the policy itself in userspace.
of course, the scheduler won't actually kill the
offender, and
SCHED_RESET_ON_FORK won't either - in all likelihood, the
runaway/errant process will continue to use tons of CPU whether its
SCHED_OTHER or not. is it easier to kill a process when:
a) its SCHED_FIFO but limited to 95% of the CPU cycles
OR
b) SCHED_OTHER and competing with all other such processes
i really don't have any idea. seems as though 5% of any modern CPU
should be enough to run a shell :)
I don't know either. In either case the watchdog would probably be
running with the highest rt priority so it should be scheduled in as
fast in one as in the other. Probably faster when what needs to be
killed is SCHED_OTHER.
Looks to me like there are too many questions that only Lennart can
answer.
-- Fernando