[LAD] [LAU] Open Source Audio Interface

Len Ovens len at ovenwerks.net
Tue Sep 9 22:57:06 UTC 2014

I thought I would move this over here. I know there is already work being 
done on this hw wise. These thoughts are for a point to point raw 
ethernet audio transport That still allows some normal network traffic as 

On Tue, 2 Sep 2014, Len Ovens wrote:

> My thought is something like this:
> We control all network traffic. Lets try for 4 words of audio. For sync 
> purposes, at each word boundry a short audio packet is sent of 10 channels. 
> This would be close to minimum enet packet size. Then there should be room 
> for one full size enet packet, in fact even at 100m the small sync packet 
> could contain more than 10 channels (I have basically said 10m would not be 
> supported, but if no network traffic was supported then 10m could do 3 or 4 
> channels with no word sync). So:
> Word 1 - audio sync plus 10 tracks - one full network traffic packet
> word 2 - audio sync plus 10 tracks - one full audio packet 40 tracks
> 					split between word 1 and 2
> wors 3 - audio sync plus 10 tracks - one full audio packet 40 tracks
> 					split between word 2 and 3
> word 4 - audio sync plus 10 tracks - one full audio packet 40 tracks
> 					split between word 3 and 4

Nobody commented that this could not work  :)  4 samples on a 100mbit link 
is still less than one full 1500byte data packet. The reason I am thinking 
about this right now, is that my studio has been flooded  :P  and so I 
have no access to work on my control surface project right now.

So some background thinking:

- The idea is to replace FW audio interfaces with something at least as 
good, maybe better.
- Really low latency available (even if a lot of uses don't need it)
- Really stable operation. On a desktop/rack computer where the 
user has access to a PCIe slot, it is obvous that a second NIC would be 
the best solution. Laptops should work too.
- Normal network traffic will make it through this mess without ever 
disturbing the audio. A laptop may be used with only one NIC and still 
need to access network traffic.
- It should "just work" on newer network hw as it is developed.
- It should handle a switch in the middle for use as a range extender, but 
never as a network traffic mixer. (on a 1gbit link other traffic may not 
disturb things with low enough channel count) What I have thought of so 
far would tend to ignore other traffic anyway and a switch should end up 
just sending our traffic through our ports. This situation may require 
some user intervention such as pointing out which box they wish to connect 
- It should deal well with hot plugging. This feels messy, but it should 
be possible to let things like netman play with things first and still be 
able to detect an audio IF has been connected and reinit the interface for 
this use.

In the end, the kernel module for this device should be loaded anytime a 
NIC is detected. (detected and has a connection) It should create both an 
eth* device and an ALSA device, but should only do so if it detects an 
audio IF online. To begin with, this would mean the user would have to do 
some setup to get around all the auto setup stuff already running (dhcp 
etc.) But as the IF was used more, normal networking stuff might be 
expected to detect an audio IF and leave it alone. Maybe the audio IF 
could have a dhcp server that refused an IP so that the dhcp client gives 
up and goes away. That is, use network protocols that are already 
available when possible.

This whole effort assumes the ethernet device is connected by twisted 
pair from the host computer to the audio device with a separate path for 
each direction. This is very important as it _should_ make for a colision 
free environment. This interface will control both audio and data flow. 
Any network traffic will only be sent during times audio sending is not 
needed. This can be done because all network data will go through the 
audio driver. There are still a lot of 100mbit networks out. Lots of new 
equipment still has them too. I have chosen a 4 sample frame at 48k (which 
happens to be 16 samples for 192k... if you must) because it seems to be 
the lowest latency with reasonable use of overhead. (on Gbit and higher 
lines, this is no longer true. The internet still runs on 1500MTU and more 
than one full packet will travel in one sample's time) I have done all my 
calculations based on a 100mbit line because that happens to be what I 
have to play with and appears to be able to handle up to 60 audio channels 
with some left over for control. I understand that there are venues that 
use more, but gbit links will handle lots more. (600 plus, but realize the 
systems on each end have to be able to deal with the data as it comes in, 
it is not just about the link capabilities)

So, My thought is that each group of packets sent will be timed by a group 
of 4 samples. The driver will attempt to send a packet with 4 samples 
worth of audio for all channels at the end of those 4 samples. The driver 
then calculates how much time it has before another 4 sample times are up 
and sends as much data as it has time for. This calculation should be able 
to happen only one time for any channel number setup if the hw is not 
doing anything too fancy (like waiting for more than one packet before it 
sends). It assumes the hardware/driver uses standard size guard bands, 
etc. SO for each 4 samples there would be two packets minimum (probably 
maximum too for a 100m link). One audio and one data. On a 100m link these 
packets would always have an mtu of less than 1500. It would not be 
possible to use the arrival of an audio packet as a sync signal, sync as 
always, would be an external line if two audio interfaces needed to be 

I expect to have an atom based MB to play with soon that has two NICs on 
board. as well as an audio IF. This will not be a true test low latency 
because the onboard AI has higher latency to begin with (runs at p64/n3 
min) and so I will have 192 samples to play with at a time which I will 
still try sending in 4 sample bundles. (I may try putting an older PCI AI 
in to see if I can get that down a bit) My thought is to make the AI side 
a jack client (I think I can do that much :)  and the host side an ALSA 
device (something new to learn).

All control will be MIDI-able. Because there is two NICs and one of them 
expects to do real IP based networking, OSC is possible as well as web 
based control. IN the end this is also a general computer running Linux 
that can be SSHed into (even ssh -Y) almost anything is possible... but I 
would guess the first box that has a real DIY digital CODEC, S/pdif, ADAT 
or MADI IF will be pretty basic.... but looking at the R-Pi, basic seems 
to be pretty powerful anymore.

In the long run this could be a very interesting device. There is no 
reason this could not also be, an effects box (both with local analog 
ports as well as through net), a softsynth (most of these boards have at 
least one serial port or USB), A remote mixer... drop the box FOH and use 
a networked control surface, Android pad... even a browser to control, a 
FOH snake box or even a standalone recording device.

Price point? Concidering ethernet switches, USB AUdio devices, Ethernet 
storage controllers, set top... I hesitate to call them boxes some of 
them are so small? Even development boards look ok. I don't think It would 
be worth while to make a two i/o box, but by the time we hit 8 or so it 
begins to look good.

Len Ovens

More information about the Linux-audio-dev mailing list