I thought I would move this over here. I know there is already work being
done on this hw wise. These thoughts are for a point to point raw
ethernet audio transport That still allows some normal network traffic as
well.
On Tue, 2 Sep 2014, Len Ovens wrote:
My thought is something like this:
We control all network traffic. Lets try for 4 words of audio. For sync
purposes, at each word boundry a short audio packet is sent of 10 channels.
This would be close to minimum enet packet size. Then there should be room
for one full size enet packet, in fact even at 100m the small sync packet
could contain more than 10 channels (I have basically said 10m would not be
supported, but if no network traffic was supported then 10m could do 3 or 4
channels with no word sync). So:
Word 1 - audio sync plus 10 tracks - one full network traffic packet
word 2 - audio sync plus 10 tracks - one full audio packet 40 tracks
split between word 1 and 2
wors 3 - audio sync plus 10 tracks - one full audio packet 40 tracks
split between word 2 and 3
word 4 - audio sync plus 10 tracks - one full audio packet 40 tracks
split between word 3 and 4
Nobody commented that this could not work :) 4 samples on a 100mbit link
is still less than one full 1500byte data packet. The reason I am thinking
about this right now, is that my studio has been flooded :P and so I
have no access to work on my control surface project right now.
So some background thinking:
- The idea is to replace FW audio interfaces with something at least as
good, maybe better.
- Really low latency available (even if a lot of uses don't need it)
- Really stable operation. On a desktop/rack computer where the
user has access to a PCIe slot, it is obvous that a second NIC would be
the best solution. Laptops should work too.
- Normal network traffic will make it through this mess without ever
disturbing the audio. A laptop may be used with only one NIC and still
need to access network traffic.
- It should "just work" on newer network hw as it is developed.
- It should handle a switch in the middle for use as a range extender, but
never as a network traffic mixer. (on a 1gbit link other traffic may not
disturb things with low enough channel count) What I have thought of so
far would tend to ignore other traffic anyway and a switch should end up
just sending our traffic through our ports. This situation may require
some user intervention such as pointing out which box they wish to connect
to.
- It should deal well with hot plugging. This feels messy, but it should
be possible to let things like netman play with things first and still be
able to detect an audio IF has been connected and reinit the interface for
this use.
In the end, the kernel module for this device should be loaded anytime a
NIC is detected. (detected and has a connection) It should create both an
eth* device and an ALSA device, but should only do so if it detects an
audio IF online. To begin with, this would mean the user would have to do
some setup to get around all the auto setup stuff already running (dhcp
etc.) But as the IF was used more, normal networking stuff might be
expected to detect an audio IF and leave it alone. Maybe the audio IF
could have a dhcp server that refused an IP so that the dhcp client gives
up and goes away. That is, use network protocols that are already
available when possible.
This whole effort assumes the ethernet device is connected by twisted
pair from the host computer to the audio device with a separate path for
each direction. This is very important as it _should_ make for a colision
free environment. This interface will control both audio and data flow.
Any network traffic will only be sent during times audio sending is not
needed. This can be done because all network data will go through the
audio driver. There are still a lot of 100mbit networks out. Lots of new
equipment still has them too. I have chosen a 4 sample frame at 48k (which
happens to be 16 samples for 192k... if you must) because it seems to be
the lowest latency with reasonable use of overhead. (on Gbit and higher
lines, this is no longer true. The internet still runs on 1500MTU and more
than one full packet will travel in one sample's time) I have done all my
calculations based on a 100mbit line because that happens to be what I
have to play with and appears to be able to handle up to 60 audio channels
with some left over for control. I understand that there are venues that
use more, but gbit links will handle lots more. (600 plus, but realize the
systems on each end have to be able to deal with the data as it comes in,
it is not just about the link capabilities)
So, My thought is that each group of packets sent will be timed by a group
of 4 samples. The driver will attempt to send a packet with 4 samples
worth of audio for all channels at the end of those 4 samples. The driver
then calculates how much time it has before another 4 sample times are up
and sends as much data as it has time for. This calculation should be able
to happen only one time for any channel number setup if the hw is not
doing anything too fancy (like waiting for more than one packet before it
sends). It assumes the hardware/driver uses standard size guard bands,
etc. SO for each 4 samples there would be two packets minimum (probably
maximum too for a 100m link). One audio and one data. On a 100m link these
packets would always have an mtu of less than 1500. It would not be
possible to use the arrival of an audio packet as a sync signal, sync as
always, would be an external line if two audio interfaces needed to be
used.
I expect to have an atom based MB to play with soon that has two NICs on
board. as well as an audio IF. This will not be a true test low latency
because the onboard AI has higher latency to begin with (runs at p64/n3
min) and so I will have 192 samples to play with at a time which I will
still try sending in 4 sample bundles. (I may try putting an older PCI AI
in to see if I can get that down a bit) My thought is to make the AI side
a jack client (I think I can do that much :) and the host side an ALSA
device (something new to learn).
All control will be MIDI-able. Because there is two NICs and one of them
expects to do real IP based networking, OSC is possible as well as web
based control. IN the end this is also a general computer running Linux
that can be SSHed into (even ssh -Y) almost anything is possible... but I
would guess the first box that has a real DIY digital CODEC, S/pdif, ADAT
or MADI IF will be pretty basic.... but looking at the R-Pi, basic seems
to be pretty powerful anymore.
In the long run this could be a very interesting device. There is no
reason this could not also be, an effects box (both with local analog
ports as well as through net), a softsynth (most of these boards have at
least one serial port or USB), A remote mixer... drop the box FOH and use
a networked control surface, Android pad... even a browser to control, a
FOH snake box or even a standalone recording device.
Price point? Concidering ethernet switches, USB AUdio devices, Ethernet
storage controllers, set top... I hesitate to call them boxes some of
them are so small? Even development boards look ok. I don't think It would
be worth while to make a two i/o box, but by the time we hit 8 or so it
begins to look good.
--
Len Ovens
www.ovenwerks.net