this will
produce reams of output, but will provide you with the next
hint.
reams of output collected...
problem: the output will affect scheduling, which
might make the
problem go away. i recommend outputting it to a file if possible, and
viewing after the fact.
I have big files for jack and ardour, and the problem did not go away. I
have pored over the huge files but I don't seem to find anything that
looks like a glitch. Will keep looking...
I have some data but I have no idea if it is significant. This is
running rt_monitor as a safeguard, jack and finally ardour. To speed up
the death of the patient (in my experience) a tar cvf usr.tar /usr was
also run in a separate terminal...
I looked for time transients in the cycle count of the listings. This is
the tail of it for ardour (abs cycles followed by delta cycles):
25316620570960 151478724
25314593862618 170551720
25316905845228 174448260
25283295747490 1690268954
So the last delta time is much bigger, a good candidate. Finding the time
in the listing gives me this:
jack:27813:25281569457264 client.c:jack_client_thread:556: client polling on event_fd and
graph_wait_fd...
jack:27813:25281605113512 client.c:jack_client_thread:664: client 27813 signalled at
25281605055172, awake for process at 25281605109372 (delay = 32.071006 usecs) (wakeup on
graph_wait_fd==30)
jack:27813:25281605240108 client.c:jack_client_thread:686: client finished processing at
25281605238028 (elapsed = 76.127811 usecs), writing on graph_next_fd==31
jack:27813:25281605450304 client.c:jack_client_thread:694: client sent message to next
stage by 25281605450388, client reading on graph_wait_fd==30
jack:27813:25281605461784 client.c:jack_client_thread:699: reading cleanup byte from pipe
jack:27813:25281605472500 client.c:jack_client_thread:710: process cycle fully complete
jack:27813:25281605478536 client.c:jack_client_thread:556: client polling on event_fd and
graph_wait_fd...
GLITCH! (this is not part of the listing :-)
jack:27813:25283295747490 client.c:jack_client_thread:556: client polling on event_fd and
graph_wait_fd...
jack:27813:25283359653444 client.c:jack_client_thread:664: client 27813 signalled at
25281641118776, awake for process at 25283359647196 (delay = 1016880.700592 usecs) (wakeup
on graph_wait_fd==30)
jack:27813:25283359829424 client.c:jack_client_thread:686: client finished processing at
25283359827480 (elapsed = 106.676923 usecs), writing on graph_next_fd==31
jack:27813:25283360445692 client.c:jack_client_thread:694: client sent message to next
stage by 25283360445776, client reading on graph_wait_fd==30
jack:27813:25283360457440 client.c:jack_client_thread:699: reading cleanup byte from pipe
jack:27813:25283360468308 client.c:jack_client_thread:710: process cycle fully complete
So there are two "client polling" in a row with nothing in between...
The second one presumably happens when rt_monitor downgrades the process to
SCHED_OTHER? Or maybe that is what is "hanging"?
In the jack side of things:
25317681129070 143623260
25316620498912 151380488
25314593789218 170399340
25283359590208 1718469688
jack:27772:25281605266960 engine.c:jack_process:584: reading byte from
subgraph_wait_fd==13
jack:27772:25281641074368 engine.c:jack_process:465: considering client alsa_pcm for
processing
jack:27772:25281641107540 engine.c:jack_process:498: in-process client has no process()
function
jack:27772:25281641114096 engine.c:jack_process:465: considering client ardour for
processing
jack:27772:25281641120520 engine.c:jack_process:516:
**** alsa_pcm: xrun of at least 996.047 msecs
calling process() on an OOP subgraph, fd==7
GLITCH!
jack:27772:25283359590208 engine.c:jack_process:535: waiting on fd==13 for process()
subgraph to finish
jack:27772:25283359952396 engine.c:jack_process:584: reading byte from
subgraph_wait_fd==13
jack:27772:25283396656700 engine.c:jack_process:465: considering client alsa_pcm for
processing
jack:27772:25283396688968 engine.c:jack_process:498: in-process client has no process()
function
jack:27772:25283396695880 engine.c:jack_process:465: considering client ardour for
processing
I did another test with the sleep in rt_monitor changed from 2 seconds
to 10 seconds (to keep the machine catatonic for a longer time) and the
result was not as interesting because jack timed out when it recovered
and killed itself and ardour.
In that test I found three back to back polls with big timeouts in
between and two more somewhere else in the file.
-- Fernando