Hi!
This mail is just a heads-up about a finding discovered with the help of
the linux-usb mailing list and which I thought some other people might
benefit from. If this is nothing new to you, then please do ignore it.
Background:
The USB audio class 2.0 specification dictates that isochronous
transfers (i.e. audio frames to/from an audio interface) happen every
"micro-frame". USB micro-frames are 125 microseconds (us) apart. 125 us
= 0.125 milliseconds (ms).
The majority of USB audio interfaces (at least those that I have) use
synchronous audio-streaming, i.e. the sample clock is derived from the
bus clock. There are definitely interfaces that use e.g. adaptive or
asynchronous modes and this discussion would have to be altered for these.
Given the case of synchronous mode isochronous transfers at a sampling
rate of 48000 Hz (= 48 kHz) this would correspond to 6 audio frames per
USB micro-frame.
48000 frames/second * 0.000125 seconds = 6 frames
So in principle a very well behaved audio interface attached to a very
well behaved USB controller sitting in a well tuned system _should_ be
able to achieve a minimum round-trip latency of 2 * 6 frames = 12 frames
or 250 us. This leaves out additional buffering inside the audio
interface and additional latency by anti-aliasing and reconstruction
filters.
The first caveat to above in the context of Linux: The snd_usb_audio
driver does not seem to support period sizes of just 6 frames. It _does
support 12 frames though, which is nice.
And here is the other caveat which lead to the title of this mail: Some
controllers are behaving "worse" than others. There's this little thing
in the XHCI spec which amounts to the following: The XHCI can specify in
a register how many micro-frames have to buffered at all times for
outgoing isochronous endpoints. The Intel XHCI in my ASRock N100dc-itx
main-board for example request a whole USB frame (which corresponds to 8
micro-frames or 1 ms) to be buffered at all times. Another XHCI that I
have (a Renesas controller) only requires one micro-frame. This has
direct consequences on the kind of period sizes and number of periods
that are usable on these controllers. These consequences don't explain
everything but at least you know that you can't ever get better than
this limit.
For example for a period size of 48 frames I need to use 3 periods on
the Intel controller (resulting in 3 ms round-trip latency), but for the
Renesas I can use 2 periods at 48 frames (resulting in 2 ms round-trip
latency). On the Renesas controller even 2 periods at 24 frames works
fine (1 ms round-trip latency).
I can lower the latency on the Intel controller by using a smaller
period size but more of them as long as the buffering requirement of the
controller is satisfied. One stable setting is for example a period size
of 24 and 5 periods which results in a round-trip latency of 2.5 ms.
So what's the take-away here? In some cases, if you are chasing stable
low-latency operation using a USB audio class 2.0 device it might just
be worth installing a different XHCI in your computer (the above
mentioned Renesas controller is just a PCI-Express card which sits in a
slot in my N100DC-itx board) that is better behaved than the one you
currently have.
Another take-away is that the above limitation only applies to the
outgoing direction (playback). If all I was interested in would be the
capture direction then 2 periods at 48 frames would work fine even on
the Intel controller.
Kind regards,
FPS