[LAU] JACK2 hardware buffer size for browser-based video conferencing

David Kastrup dak at gnu.org
Tue Jan 5 17:45:28 CET 2021


"Andrew A. Grathwohl" <andrew at grathwohl.me> writes:

> Hi David,
>
> Thanks this was super-informative!
>
> We can likely rule out the idea that the smaller buffer sizes tax the
> computer further, leading to more fan/PSU noise, since the machine
> itself is not in the same room as the microphone.
>
> I am intrigued by your comments about the sampling rate on my Babyface
> Pro. I have always set it to 48kHz whenever doing low-latency audio or
> any audio that will be transmitted over a network cable, which I mostly
> do superstitiously. Is there any guidance out there about what the
> correct sampling rate would be for my device, or about how to determine
> this answer for myself?

With RME, I'd trust the soundcard, no questions asked.  With other
soundcards offering substantially higher sample rates, there is a chance
that downsampling (after proper digital filtering) can lead to better
quality and/or lower latency.

Cf. the parameters in sox:

	rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
		Change  the  audio  sampling rate  (i.e.  resample  the
		audio) to any  given RATE (even non-integer  if this is
		supported by  the output  file format) using  a quality
		level defined as follows:
		     Quality          .na Band-width    Rej dB                .na Typical Use 
		-q   .na quick        n/a               .na 30 @  Fs/4    .na playback on ancient hardware 
		-l   low              80%               100                   .na playback on old hardware 
		-m   medium           95%               100                   .na audio playback 
		-h   high             95%               125                   .na 16-bit mastering (use with dither) 
		-v   .na very high    95%               175                   24-bit mastering

		where  Band-width  is  the   percentage  of  the  audio
		frequency  band that  is preserved  and Rej  dB is  the
		level  of   noise  rejection.   Increasing   levels  of
		resampling quality  come at  the expense  of increasing
		amounts of  time to process  the audio.  If  no quality
		option is given, the quality  level used is `high' (but
		see  `Playing   &  Recording  Audio'   above  regarding
		playback).

		The  `quick' algorithm  uses  cubic interpolation;  all
		others use band-limited interpolation.  By default, all
		algorithms   have  a   `linear'  phase   response;  for
		`medium', `high' and `very high', the phase response is
		configurable (see below).

		The rate  effect is  invoked automatically if  SoX's -r
		option specifies  a rate that  is different to  that of
		the input  file(s).  Alternatively,  if this  effect is
		given  explicitly, then  SoX's  -r option  need not  be
		given.   For example,  the following  two commands  are
		equivalent:

		   sox input.wav -r 48k output.wav bass -b 24
		   sox input.wav        output.wav bass -b 24 rate 48k

		though the second command is more flexible as it allows
		rate options to be given,  and allows the effects to be
		ordered arbitrarily.
		*   *   *

		Warning: technically detailed discussion follows.

		The simple  quality selection described  above provides
		settings that satisfy the needs of the vast majority of
		resampling  tasks.  Occasionally,  however,  it may  be
		desirable to fine-tune the resampler's filter response;
		this  can   be  achieved  using   override options,  as
		detailed in the following table:
		-M/-I/-L     Phase response = minimum/intermediate/linear
		-s           Steep filter (band-width = 99%)
		-a           Allow aliasing/imaging above the pass-band
		-b 74-99.7   Any band-width %
		-p 0-100     .na Any phase response (0 = minimum, 25 =
			    intermediate, 50 = linear, 100 = maximum) 

		N.B.  Override options cannot  be used with the `quick'
		or `low' quality algorithms.

		All resamplers  use filters  that can  sometimes create
		`echo'  (a.k.a.   `ringing') artefacts  with  transient
		signals such as those that occur with `finger snaps' or
		other  highly percussive  sounds.   Such artefacts  are
		much more  noticeable to  the human  ear if  they occur
		before the  transient (`pre-echo')  than if  they occur
		after  it (`post-echo').   Note that  frequency of  any
		such  artefacts  is  related  to  the  smaller  of  the
		original and new sampling rates  but that if this is at
		least 44.1kHz, then the  artefacts will lie outside the
		range of human hearing.

		A phase  response setting  may be  used to  control the
		distribution of  any transient  echo between  `pre' and
		`post': with  minimum phase,  there is no  pre-echo but
		the longest post-echo; with  linear phase, pre and post
		echo are  in equal  amounts (in  signal terms,  but not
		audibility  terms);  the   intermediate  phase  setting
		attempts  to find  the best  compromise by  selecting a
		small  length  (and level)  of  pre-echo  and a  medium
		lengthed post-echo.

		Minimum,  intermediate,  or  linear phase  response  is
		selected using the -M, -I, or -L option; a custom phase
		response can be created with  the -p option.  Note that
		phase responses between `linear' and `maximum' (greater
		than 50) are rarely useful.

		A resampler's band-width setting determines how much of
		the   frequency   content   of  the   original   signal
		(w.r.t. the  original sample rate when  up-sampling, or
		the new  sample rate  when down-sampling)  is preserved
		during  conversion.  The  term `pass-band'  is used  to
		refer  to all  frequencies up  to the  band-width point
		(e.g.  for  44.1kHz  sampling rate,  and  a  resampling
		band-width of 95%, the pass-band represents frequencies
		from  0Hz  (D.C.)  to  circa  21kHz).   Increasing  the
		resampler's band-width  results in a  slower conversion
		and  can increase  transient echo  artefacts (and  vice
		versa).

		The  -s   `steep  filter'  option   changes  resampling
		band-width  from  the default  95%  (based  on the  3dB
		point), to 99%.  The -b option allows the band-width to
		be set  to any value in  the range 74-99.7 %,  but note
		that  band-width  values  greater   than  99%  are  not
		recommended for normal use  as they can cause excessive
		transient echo.

		If the -a option  is given, then aliasing/imaging above
		the pass-band  is allowed.   For example,  with 44.1kHz
		sampling rate, and a resampling band-width of 95%, this
		means  that  frequency  content   above  21kHz  can  be
		distorted; however,  since this is above  the pass-band
		(i.e.      above    the     highest    frequency     of
		interest/audibility), this  may not be a  problem.  The
		benefits  of  allowing   aliasing/imaging  are  reduced
		processing time, and reduced (by almost half) transient
		echo  artefacts.  Note  that if  this option  is given,
		then the minimum band-width allowable with -b increases
		to 85%.

		Examples:

		   sox input.wav -b 16 output.wav rate -s -a 44100 dither -s

		default  (high)  quality resampling;  overrides:  steep
		filter,  allow   aliasing;  to  44.1kHz   sample  rate;
		noise-shaped dither to 16-bit WAV file.

		   sox input.wav -b 24 output.aiff rate -v -I -b 90 48k

		very high  quality resampling;  overrides: intermediate
		phase, band-width 90%; to 48k sample rate; store output
		to 24-bit AIFF file.
		*   *   *

		The  pitch and  speed effects  use the  rate effect  at
		their core.


As you can see, downsampling is a science...  If you sacrifice linear
phase response (which makes the main difference at very high
frequencies), you can achieve lower latency, though the difference at
lower frequencies will be comparatively minimal.

-- 
David Kastrup


More information about the Linux-audio-user mailing list