thanks for all your input, I’ll try and summarize here.


You're running Mint :-)  Lots of background bells and whistles there, lots of things which will crop up and interfere, things you cannot disable or turn off with absolute certainty.  If you want smooth power, you'll have to choose more carefully.  My current SOP in more detail here:

http://lsn.ponderworthy.com/doku.php/choosing_a_linux_platform_for_live_synth



-- 
Jonathan E. Brickman   jeb@ponderworthy.com   (785)233-9977
Hear us at http://ponderworthy.com -- CDs and MP3 now available!
Music of compassion; fire, and life!!!


First of all, booting into console mode, rather than running the full blown desktop seemed to eliminate most of the problems, although it’s still not quite a stable as i’d like.
Also i don’t quite understand how all of that could interfere with my RT-thread.
This was going to try and install a more minimal system anyway, and don’t need a graphical environment for this, but during developments it’s kind of nice to have.

I still would like to see how far i can take this, and was really hoping i can continuously use 80-90% of all cpu cores without dropouts…
Is that realistic with a lowlatency kernel? 


Do you lock all memory used by your RT threads ? 

If you don't and the system is configured for high swappiness
[1] this sort of thing could happen. 

I'm routinely running big real-time convolution matrices without
problems, so it's certainly possible. 


[1] <https://en.wikipedia.org/wiki/Swappiness>

-- 
FA

I am not currently locking memory. I thought a had plenty of ram, as not to cause any swapping, but i guess its good practice to wire memory, so i will give it a try.


Bad kernel driver? WIFI drivers are known bad for things like this. An interupt driver can block if it is designed badly. I found on one machine I had to unload the the kernel module for my wifi as it actually created more problems when I turned the power off to the tx than when it was on. (it seems to me on my wifi, when it was turned on I got xruns every 5 seconds, but with it turned off it was every half second or so... sounds very close to 0.6, unloading the kernel module fixed it)

Cron should also be turned off, but that is probably not the problem here. Cron runs super "nice" but there seem to be some things it does like packge update that can cause problems too. I turn off cron while recording.

--
Len Ovens

I don’t have a wireless on my machine, nor an nvidia card. just intel builtin graphics. This where my linux knowledge falls short, but If i don’t have that hardware, can I assume no drivers for it are loaded?


AFAIK, the important things are.

1. Use a properly configured realtime patched kernel.


lowlatency-kernel is not going to cut it?

I wasn’t really able to find to much info on the difference between the two, other than than the rt-kernel is a “step up” and hard realtime vs soft.
But nothing on how this is technically achieved


2. Set a high priority of the soundcard interrupt, something like 97 is
a good value.  (If using a USB soundcard, set the priority of the
interrupt servicing the USB hub instead).


did that.
3. Run Jack with realtime and memlocking enabled and at a priority of
80.

I’m not running jack but rather using alsa directly/

4. Make sure that you don't have any hardware/drivers that play havoc
with your kernel scheduling.  some WIFI adapters, NVIDIA, etc comes to
mind.

5. Make sure that the system isn't suffering from SMI/NMIs which
preempt the kernel and can take a long time to execute.  This can be
done with hwlatdetect script in the rt-tests package.

6. Use cyclictest from rt-tests to confirm that there are no latency
spikes in how the kernel schedules threads.

Possibly hyperthreading, cpu power management, etc could cause
problems, and I don't have experience with all hardware out there, but
IME on modern Intel hardware this isn't a problem.

I did actually find that hyperthreading had an impact, turing it of made every thing much more predictable.

JACK2 also has a very nice profiling tool that can give a good idea
about what is going on with the soundcard interrupt, clients, etc.

-- 

  Joakim


Keep an eye on the interrupts while its all running, particularly
Non-maskable interrupts. Try to correlate them with the 0.6 sec
of the glitches if possible;

watch -n 0.1 cat /proc/interrupts
 
I've written up some of the checks I generally do, perhaps browse
that to see if there's anything there that you could check?
http://openavproductions.com/real-time-latency-tuning/

Thats all I can think of at the moment, -Harry


Here’s the output of  cat /proc/interrupts:


           CPU0       CPU1       CPU2       CPU3
   0:         57          0          0          0   IO-APIC-edge      timer
   1:          3          0          0          0   IO-APIC-edge      i8042
   7:         44          0          0          0   IO-APIC-edge
   8:          1          0          0          0   IO-APIC-edge      rtc0
   9:          3          0          0          0   IO-APIC-fasteoi   acpi
  12:          4          0          0          0   IO-APIC-edge      i8042
  16:          0          0          0          0   IO-APIC   16-fasteoi   madifx
 121:       7074          0          0        341   PCI-MSI-edge      xhci_hcd
 122:      13001      25946          0        342   PCI-MSI-edge      0000:00:17.0
 123:       3409          0          0          0   PCI-MSI-edge      eth0
 124:     171029          0          0          0   PCI-MSI-edge      i915_bpo
 125:       4805          0          0          0   PCI-MSI-edge      snd_hda_intel
 NMI:         17         12         13         14   Non-maskable interrupts
 LOC:     544121     436328     444080     462821   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:         17         12         13         14   Performance monitoring interrupts
 IWI:          0          0          0          0   IRQ work interrupts
 RTR:          3          0          0          0   APIC ICR read retries
 RES:      13051      11975      11216       8004   Rescheduling interrupts
 CAL:        613        547        560        526   Function call interrupts
 TLB:        640        767        676        535   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:         31         31         31         31   Machine check polls
 HYP:          0          0          0          0   Hypervisor callback interrupts
 ERR:         44
 MIS:          0

the local timer interrupts are getting fired all the time, but i guess they should.
123 eth0 is also updated rather often. But the one thats closed to 0.6s seems to be:

 122:      13001      26147          0        342   PCI-MSI-edge      0000:00:17.0

But is there anything a can do about that?



cheers,
Fokke