On Tue, Apr 20, 2010 at 11:51 PM, Arnold Krille <arnold@arnoldarts.de> wrote:

CD's actually don't cut off sharp at the nyquist-frequency of half the
sampling-rate.
The highest frequency possible to reproduce -with correct amplitude- is half
the sampling-rate _only_ if the phase is aligned to the sampling-clock so that
minima/maxima of the sinus are correctly sampled. If its out of phase, the
amplitude is not reproduced correctly.
It is easy to understand that this correlation between phase and correct
amplitude also affects frequencies below half the sampling-rate. Might be as
low as quarter of the sampling-rate, which in case of the CD is 11kHz. Below
that you will have more then four samples to reproduce the sinus wave.
That is in fact another reason to do the recording, mixing and mastering in
more then 44kHz...

I agree, and this is something that is rarely talked about "scientifically" because the psychoacoustics of the phase information is so poorly understood.

It's the spatial aspects of the sound that can be lost in the digital recording process, and that are carried in the phase information at the high frequencies.

IMHO, compression psychoacoustics and human perceptual modelling also need to take into account how bats (or blind people) can echo-locate sound and come to an understanding of the physically-geometries surrounding them without traditional vision. Once that is better modeled, and the "internal deconvolution" that the brain accomplishes in deciphering what's echo versus source, what's "air" versus "ground' etc, one can then properly compress the "source sound" and then reconstruct the "air" to sound more realistic, without resorting to extremely high sampling rates (there's a reason 192K exists, and it's not just marketing).

It is the muddling of our natural recognition processes that can make some recordings tiring to listen to.

I think this "internal deconvolution" is also something that might be worth investigating as a better way of compressing sound. If there was a way of "reverse convolving" the echoed/heard sound back to the original source sound (given an impulse response of that instrument/track-->output); then one could use traditional MP3/OGG music encoding for the "original source sound" (where it won't matter if the phase&spatial information, esp at higher freqs, is lost). Concurrently with each frame of compressed audio, there'd also be a frame of "impulse response" that is separately coded and streamed. The "heard signal" is then recomposed from the decompressed MP3/OGG stream, and then convolved with a frame of "impulse response". (The smooth inbetweening of convolutions between frames of impulse response is left as an exercise for the reader :-) ).

-- Niels

http://nielsmayer.com

PS: I think learning how MP3 works is like finding out what went into the sausage you previously thought so delicious... might even make you go vegan. What's interesting is that all the stuff on psychoacoustic modelling is presented as a "total fact" with no consideration for any of the "perspective" raised above. For example, the only place "time domain" is mentioned is in the unverified assumptions about "temporal masking":: http://www1.cs.columbia.edu/6181/slides/05-music-coding.pdf

http://www.cs.rutgers.edu/~elgammal/classes/cs443/slide14_short.pdf ... IMHO, the literature on compression ends up reading like medical literature where they've already decided what medicine they're going to sell you, then devise tests, diagnoses, and even new diseases to justify the "sale", while purposefully ignoring (as part of their misuse of the scientific method) anything that doesn't fit with the idea they're desperately trying to sell...