What is happening right now, is I have seven synth+filter chains, all
run through the single JACK server, all feeding eventually into the one
sound card.  I have more than ample CPU to run them all, but as you and
others have explained, one JACK server is reaching its limits to handle
them all because of the limits of the synchronous nature of everything. 
So what I intend to do, is to run all of the chains independently,
asynchronously, on their own JACK servers, and then combine them all
into a separate final which will connect to the sound card.  This is
being done already with as many motherboards as desired, but I would
like to do it within one very powerful box.
Maybe some visualisation of your jack graph could help, I think patchage
can export the structure of that into a dot/graphviz file, you could
attach that. Information about the strain each of these filters puts on
the CPU would be helpful as a hint too. That would not be the number at
the top of htop, but next to the process of each of these filters.
The DOT is attached.  At max load, the only CPU being stressed more than 5% is running just one of the Yoshimi processes, one taking high ranges in patch SRO; this one CPU is kept at a steady 14% when SRO is sounding with maximum notes.  There is no very significant CPU stress, just maxing-out of JACK DSP.

--
Jonathan E. Brickman   jeb@ponderworthy.com   (785)233-9977
Hear us at http://ponderworthy.com -- CDs and MP3 now available!
Music of compassion; fire, and life!!!