[linux-audio-dev] Re: Audio over Ethernet

Anders Torger torger at ludd.luth.se
Thu Apr 15 10:07:07 UTC 2004

On Wednesday 14 April 2004 22.27, John Lazzaro wrote:
> If what you mean by "operating at the ethernet level" means
> "no Cobra-like hardware to help, but putting data directly
> into Etherframes w/o IP/RTP headers", then its unclear to me that
> working at the RTP/IP level is going to hurt you much.  The
> simplest implementation would have RTP/IP header overhead,
> but there are nice header compression schemes that get rid of it:
> http://www.ietf.org/rfc/rfc2508.txt
> and its improved versions.  By using RTP, you get a lot of
> protocol design you might otherwise need to do,
> within RTP (like RTP MIDI) and surrounding it
> (session management, etc).

Perhaps RTP is good enough. However, at several hundreds of megabits of 
throughput, it could be valuable not needing to calculate checksums in 
software. I guess IP and UDP checksums usually are calculated in 
software. Possibly other CPU time/latency inducing layers can be 
bypassed by accessing the ethernet layer directly, and possibly the use 
of jumbo frames etc is better controlled then.

RTP is not designed for the same purpose though, it would be a bit 
overkill. Using RTP would give the impression that the solution would 
work over a large routed network, although it will only work in a tight 
isolated ethernet running no traffic but audio.

> One big thing you need to worry about are clocks -- unlike
> a protocol like AES/EBU or SPIDF, packet-based media is
> not sending an implicit clock along with the data.  So, the
> nominal "sender sampling rate" can't be precisely linked to
> the nominal "receiver sampling rate" in a simple way.  The
> consequence is either too much data piles up at the receiver,
> or not enough.  One solution to this problem is to continuously
> running a sample-rate converter at the receiver in software,
> to keep the two sampling rates locked.  See:

The sample clock is passed separately with for example wordclock. I 
don't expect to use more than one computer with a sound card in most 
cases though, so then it is a non-issue.

The idea is basically take inputs on the sound card, broadcast them to 
the convolver nodes, which convolves and unicast back the result to the 
machine with the sound card which mixes all inputs from the nodes and 
puts the result out on the sound card outputs. For WFS it could for 
example be 20 megabit/s broadcast and total of 400 megabit/s of 

> A separate issue for your "many streams" case is synchronizing
> the streams to each other, in the case where not all share the
> same nominal clock.  RTP has tools for this, based on
> associating NTP timestamps from a common clock to each
> independent stream, that get used for audio/video lipsync,
> and can be repurposed here as well.

RTP is designed to transport realtime data over a routed network, when 
packets can be lost, re-ordered and stuff like that. In the controlled 
ethernet environment for this system there will be no packet losses (I 
hope), and no reordering.

It would probably work in the way that a roundtrip time is tuned and 
found at startup, and then that is added with some marigin to the 
I/O-delay in the sound card machine. That is probably quite easy. A 
small challenge will be to synchronise dynamic commands such as filter 
changes and such, so they happen in the same block index (synchronised) 
for all nodes. By specifying block index in the command, 
synchronisation will be easily kept though, then the problem is just to 
make the command reach all nodes in time. The simple way is to have 
quite high latency for the commands, but one would want to minimise 


More information about the Linux-audio-dev mailing list