[linux-audio-dev] Audio over Ethernet / Livewire

Tue Jun 22 06:20:24 UTC 2004

Audio traffic has a constant data rate.
eg 44.1khz 16bit stereo is 176400 bytes/sec
since the audio cards use audio fragments (or periods) of N frames
it is natural to send audio using packets over the network of that size 
(or multiples of it).
UDP is the natural choice because of the low latency.
Of course there might be a packet loss but I if you go TCP/IP or 
implement your own retransmission schemes
you lose the low latency characteristics.

On a local LAN on a non congested network if the hardware is not broken 
the packet loss ratio is basically
zero and AFAIK most of the low latency audio over IP protocols are 
implemented with this assumption in mind.
So I think we should go that route too.
You cannot saturate the network till it gets congested because it would 
not even work with TCP, which would
prevent errors but causing so much delays that the audio streams would 
stutter.

Regarding the MIDI data over network: MIDI is event based and in theory 
there is no upper limit of how much
midi events per time unit one might want to send.

But in praticle standard MIDI (over a serial 31.250kbit link) is limited 
to about 3000bytes/sec.

This is enough in most of cases but we all know that big MIDI setups 
need multiple midi interfaces
(most professional midi expanders provide 2 MIDI input connectors to 
achieve better timing in a high track setup
and to provide more than 16way multitimbratlity).

So far so good.
I think one of the simplest approaches would be to send audio packets 
with embedded midi data over the network.
As we know the more information we pack into a signle datapacket and the 
less handshakes the network devices
do the higher the probability that the data flow is reliable and fast.

Let's do some math: UDP limits the payload of a packet to about 
1500bytes (the ethernet frame).
That way it's guaranteed that when we send a block of data it gets 
delivered atomically without incurring into
fragmentation issues (of course if you send it across several routers 
where the MTU is < 1500 then it's another story
but since we are talking LAN ( NIC -> switch -> NIC) fragmentation is 
not an issue.

assume the above stereo stream but we want to transmit floats (jackd 
uses floats).
sizeof(float)  (=4)  * 44100 * 2 = 352800 bytes/sec

assume we send a stereo stream 128 frames per packet
128*2 *sizeof(float) = 1024 bytes , 2.9msec worth of audio data
we still have over 400 bytes left for our custom header (eg id remote 
jack client, jack port number etc) and for midi data.

So my proposal would be to limit the number of midi events per audio 
fragment.
let's take simlified MIDI encoding and let's see what we can fit into  
400 bytes.
Since we want sample accurate midi triggering (which traditional MIDI 
over serial does not provide) we could do the following:
a MIDI command is usally not longer than 3 bytes (let's forget abut 
sysex etc for now).
we could divide 400 bytes into 100 midi events consisting in:
1 byte timestamp relative to the audio fragment (0-127) , this would 
limit the fragmentsize to max 256 frames
3 byte midi payload

This means we could have up to 100 MIDI events (note on/off, controller, 
programchange etc) per 2.9msec audio fragment !
This is a LOT of events.
Just take traditional MIDI over serial: a NOTE-ON takes 1.1msec to 
transmit (3byte) this means in 2.9msec you can barely
send a NOTE-ON and a NOTE-OFF and you consumed almost the full channel 
capacity.

In the case of serial MIDI you achieve a maximum about (3byte MIDI 
events) 1000 events/sec.

In our case we send about 344 packets/sec which multiplied with 100 midi 
events gives us
34400 events/sec which are SAMPLE ACCURATE !
basically it would be like having the equivalent 34 midi interfaces.

Such a (bidirectional) audio/midi stream would consume about 
500KByte/sec , which means a 100Mbit LAN could run 10-15
of those at the same time without loosing data.

basically it would work as follows:

client PC (has an audio card) <----> jack server PC (runs jackd and jack 
clients like samplers, softsynths, HDR apps etc, no audio card).

the client PC would run a jack network client which does the client PC 
<---> server PC communication

the server PC has a special (not implemented yet) jackd input/output 
driver which recieves/sends the audio data to the network,
for the rest the jack clients residing on the server PC don't know 
anything about the network, to them it looks like a standard jack server.

The "clock" to the jackd residing on server PC is given by the client PC.

client PC:
-----
jack_process_callback() {
  send_to_network(local_jack_input_port); // non blocking

fetch_packet_from_local_queue_and_check_if_new_packet_arrived(local_jack_output_port); 
// non blocking
}
----

server PC: (I'm not yet familiar with the jackd driver API so the naming 
of the functions will be wrong but you'll hopefully
understand what I'm trying to explain).
----
while(1) {
  receive_audio_input_data_from_network(); // blocking 
  call process() of local jack clients
  send_audio_output_data_to_network(); // non blocking
}
----

on the client PC we send out the data packet in non blocking mode and 
the next call
fetches the next audio fragment from a local queue.
Of course during the first iteration the queue does not contain anything 
so we will prefill it with 1-2 fragments
worth of silence. 1 fragment is the minimum needed to work (since the 
fetch_packet.. will be called only
a few usecs later after send_to_network because it's non blocking).
We can prefill with more fragments which will of course increase the end 
to end latency but will at the same
help to eliminate network jitter problems.
I think with the above approach end to end latencies of < 8-10msec can 
be achieved which means you can
play a cluster of sampler/softsynths live and having rendered the audio 
in real time on your client box containing
the audio card.

The audiocard-less PCs will all stay in sync with the PC containing the 
audio card thanks to the blocking
receive_audio...() call.

Do you see particular flaws in this proposal ? (of course it would need 
to be adapted and made more flexible
so that it can deal with an arbitrary number of jack audio/midi ports)
It's the simplest I can think of and as said above, adding all sorts of 
error correction, synchronization etc
would buy us almost nothing and probably never achieve the same low 
latency that this system can deliver.

Of course we will not know before we actually turn ideas into working code.
But if my approach fails then I'll be glad to offer a few beers at next 
ZKM :)
(Steve H. ? :)) )

cheers,
Benno

Hans Fugal wrote:

>>Plus considering that midi over jack is being implemented too you would 
>>have both midi and audio over ethernet
>>through jack, available to any jack client without the application 
>>needing to be changed.
>>    
>>
>That would be convenient, yes. But at the implementation level there is
>quite a bit of difference between MIDI traffic and audio traffic. MIDI
>is much less forgiving of errors, or much more if you know which errors
>to make. You can do a lot of smart things when doing MIDI over a network
>that you can't do with audio, a la MWPP or whatever it's called now.
>
>That said, if you've got the bandwidth and latency issues worked out for
>audio, MIDI should be a piece of cake and you may not need to worry
>about the smart things you can do with it.
>
>FWIW, I've implemented basic MIDI over TCP/IP at [1], which is loosely
>based on MWPP and needs some TLC, but already outperforms aseqnet.
>
>1. http://hans.fugal.net/music/nmidi-0.1.0.tar.gz
>
>  
>