[Jack-Devel] How does --hwmon work?

Tobias Hoffmann lfile-list at thax.hardliners.org
Sat May 6 15:53:21 CEST 2017


On 06/05/17 13:08, David Kastrup wrote:
>> 1. Different audio interfaces provide completely different routing
>> capabilities, so what you want to get isn't that easy to do, if at
>> all.
> Sorry, but that's just handwaving.  An audio interface will either be
> able to route some input to some output with a given volume or not.  The
> API to jack (or something doing that kind of job for the connection
> management on its behalf) would just request the connection and would
> not need to know how it was going to be established.  Many many
> soundcards already provide amixer controls that can be used for this
> functionality if you know how, and the "if you know how" part can be
> described in a database.
>
> As the database grows, more cards will transparently support hardware
> monitoring.
>
> It most definitely _is_ easy to do since the hard work is in the ALSA
> controls of individual drivers (or even other kinds of command line
> utilities used for accessing mixers) and those are already there for a
> whole lot of cards.

This thread is going on for much too long already.

Here is an actual datapoint wrt. current Hardware:
Take the Focusrite Scarlett class of USB-based Audio Devices (for which 
I happened to write the Alsa-mixer driver [based on previous work by 
Robin Gareus]).
These devices use the standardized USB-HID-Endpoints for the audio, but 
use custom endpoints for accessing the Mixer. Therefore Audio-I/O did 
work out of the box, with the standard linux driver, but Mixer-control 
needs an hardware-specific driver, for reasons that might become clear 
in the following exposition.

For the Scarlett 18i8 there are 18 Inputs and 8 Outputs. Internally 
there is a 18x8 mixing matrix (the same(!) matrix size is actually used 
for variants with less actual I/O ...).
Actually, the above mentioned I/O numbers are the physical connectors. 
Regarding USB, there are also 18 Capture-"Endpoints" and 8 
Playback-"Endpoints".
Now the thing is, these Capture/Playback-Endpoints aren't tied to a 
specific electrical I/O, but for each Matrix-Input, Physical-Output and 
Capture-Input there is a software-configurable - i.e. exposed as alsa 
mixer enum-type control - Multiplexer to choose from any of the 
available Matrix-Outputs, Physical-Inputs and Playback-Outputs (except 
you can't directly connect a Matrix-Output to a Matrix-Input).
That's the - quite flexible - topology of the Device, which can and 
should - obviously - allow for hardware-monitoring...

Let's now assume that JACK provides some kind of hardware monitoring 
support by detecting direct connections between capture-inputs and 
playback-outputs and instead of copying audio data on the software side 
chooses to modify the mixer instead (I don't know what --hwmon really does).
Now how could it do that?

First, it needs some kind of mapping which physical i/o corresponds to 
which mixer control. As you can see, this entirely depends on the 
device's topology. So, for a solution to be  any good, it has to support 
different topologies. There are at least "Summing" nodes, "Gain control" 
nodes, "Multiplexer" nodes, "Switch" nodes (-> Mute) and "Duplicate" 
nodes required to model current hardware. These nodes could be more or 
less arbitrarily connected (source -> sink), like a DAG (directed 
acyclic graph).

Secondly, we have have to find a path through this graph from a certain 
hardware-input to a certain hardware-output. But Jack actually does not 
deal with hardware-inputs and hardware-ouputs, but with capture-inputs 
and playback-outputs. For the Scarlett, we know that we have to look at 
the capture-mux to find the hardware-input it is connected to (but it 
might even be connected to something else). We also look at the 
Playback-output and have to find the Hardware output. There are actually 
two ways a hardware-output could be connected to a hardware-input: 
Either the Hardware-Output-Mux directly selects a Hardware-Input, or it 
selects a Matrix-Output where some Matrix-Input is actually connected, 
via the Matrix-Input-Mux, to a given hardware-input. Notice how the 
device is perfectly capable of sending the monitoring signal to a 
hardware-output which is not connected to a capture-input. So the whole 
idea that JACK works with capture-inputs instead of physical hardware 
inputs is detrimental to what we actually want to achieve. But it does 
not stop there: To establish a new monitoring connection, there are now 
also *two* possible paths to choose from: Either via the mixer (might 
still need some MUX-reconfiguration!), or via the Output-Mux. Except you 
can't use the Output mux method if you want to route more than one input 
to the same output (i.e. sum them). To add another twist, the 
Matrix-hardware does not allow Mute/Unmute (unlike the Output-Masters, 
they can be muted). A hardware-specific mixer software can actually 
implement these by storing the old value and setting the gain to "0" 
(which actually is labeled -128dB, on the Scarlett). As JACK was 
(intentionally, AFAIK) designed to not provide any gain control by 
itself, there somewhere has to be this "magic" gain value to be used to 
"Unmute" a certain Matrix connection. While "0 dB" might seem a 
reasonable choice, it clearly falls behind the actual capabilities of 
the device and using a fixed value would certainly be an annoying 
limitation, in practice.
It basically boils down to: "Who controls the mixer?"
E.g. for MUX settings there has to be one single master who decides how 
the device is going to be configured and who is free to make any 
changes, as required.
Any software that places itself to become that "master" should better 
work with whatever flexibility a given interface has, instead of 
arbitrary limiting it. In other words, it has to, more or less, support 
all features a device has. JACK is not - and will most certainly never 
be - able to handle such things. And IMHO it shouldn't. From a 
software-engineering standpoint, a dedicated mixer app as "master" is 
the simplest solution, and - by far - the most flexible.

A side node: The alsa mixer stuff is quite horrible. E.g. the ordering 
of the controls is determined by some heuristics in the user-space alsa 
libs (that takes a snd_hctl_* (hardware control) from the kernel and 
provides the snd_ctl_* an application actually works with). To get some 
kind of sensible ordering with the about 200 controls the Scarlet driver 
exposes, I had to choose the names "just right" to get avoid any special 
cases and drop through to the "lexical sorting" case.
And, while there are alsa control-types that could be used for metering 
purposes, *no* mixing application supports these; furthermore the device 
actually returns the monitoring information as "block" (e.g. "all 
captures"-block, "all outputs"-block, ...) and not "per channel", as 
required by alsa. There is no simple way to associate the monitor with 
the actual gain control (IIRC, if you choose the names "correctly" alsa 
will combine these snd_hctl_* into one snd_ctl_* - at least that's what 
happens with Mute). This is the point where some drivers fall back to 
providing snd_hwdep_* (hardware dependent api) access, which requires a 
corresponding user-space program. Because of the severe lack of 
documentation on this part of alsa, the current Scarlett-driver simply 
does not support metering.
Also, during the process of upstreaming the driver, the "Save mixer 
settings to hardware (NVRAM)" (for standalone operation) had to go, 
because alsa just has no appropriate control element type to expose that 
feature of the device.

> That kind of "let the hardware do the job on its own whenever it can" 
> mind frame was what made Firewire the choice for professional work for 
> a really long time. So the user notices that monitoring with effects 
> leads to larger time lag and larger CPU load and higher likelihood of 
> dropouts. He has a choice then. He can switch off the effects.
For most of the history of digital music/video/..., computers were not 
fast enough to do any serious processing. So to get anywhere, reliably, 
you simply *had to* use hardware, where ever you can.
Nowadays even digital mixer consoles tend towards software effect 
processing / plugins on general purpose CPUs instead of dedicated DSP 
chips. The earlier digital audio consoles were also quite limited by the 
hardware implementation, e.g. "You can route X to Y, but not to Z, 
because reasons". On the other hand, I believe the Focusrite Scarlett is 
so flexible, is because it's based on the xCORE chip, which has up 32 
CPU/microcontroller cores that can easily implement tasks like "gain", 
"mux", ... - because it is all *software*. It's all about *controlling* 
latency and load - which is exactly what JACK tries to do.

   Tobias




More information about the Jackaudio mailing list