On Tue, Apr 20, 2010 at 11:51 PM, Arnold Krille <arnold(a)arnoldarts.de>wrote;wrote:
CD's actually don't cut off sharp at the nyquist-frequency of half the
sampling-rate.
The highest frequency possible to reproduce -with correct amplitude- is
half
the sampling-rate _only_ if the phase is aligned to the sampling-clock so
that
minima/maxima of the sinus are correctly sampled. If its out of phase, the
amplitude is not reproduced correctly.
It is easy to understand that this correlation between phase and correct
amplitude also affects frequencies below half the sampling-rate. Might be
as
low as quarter of the sampling-rate, which in case of the CD is 11kHz.
Below
that you will have more then four samples to reproduce the sinus wave.
That is in fact another reason to do the recording, mixing and mastering in
more then 44kHz...
I agree, and this is something that is rarely talked about "scientifically"
because the psychoacoustics of the phase information is so poorly
understood.
It's the spatial aspects of the sound that can be lost in the digital
recording process, and that are carried in the phase information at the high
frequencies.
IMHO, compression psychoacoustics and human perceptual modelling also need
to take into account how bats (or blind people) can echo-locate sound and
come to an understanding of the physically-geometries surrounding them
without traditional vision. Once that is better modeled, and the "internal
deconvolution" that the brain accomplishes in deciphering what's echo versus
source, what's "air" versus "ground' etc, one can then properly
compress the
"source sound" and then reconstruct the "air" to sound more
realistic,
without resorting to extremely high sampling rates (there's a reason 192K
exists, and it's not just marketing).
It is the muddling of our natural recognition processes that can make some
recordings tiring to listen to.
I think this "internal deconvolution" is also something that might be worth
investigating as a better way of compressing sound. If there was a way of
"reverse convolving" the echoed/heard sound back to the original source
sound (given an impulse response of that instrument/track-->output); then
one could use traditional MP3/OGG music encoding for the "original source
sound" (where it won't matter if the phase&spatial information, esp at
higher freqs, is lost). Concurrently with each frame of compressed audio,
there'd also be a frame of "impulse response" that is separately coded and
streamed. The "heard signal" is then recomposed from the decompressed
MP3/OGG stream, and then convolved with a frame of "impulse response". (The
smooth inbetweening of convolutions between frames of impulse response is
left as an exercise for the reader :-) ).
-- Niels
http://nielsmayer.com
PS: I think learning how MP3 works is like finding out what went into the
sausage you previously thought so delicious... might even make you go vegan.
What's interesting is that all the stuff on psychoacoustic modelling is
presented as a "total fact" with no consideration for any of the
"perspective" raised above. For example, the only place "time domain"
is
mentioned is in the unverified assumptions about "temporal masking"::
http://www1.cs.columbia.edu/6181/slides/05-music-coding.pdf
http://www.cs.rutgers.edu/~elgammal/classes/cs443/slide14_short.pdf ...
IMHO, the literature on compression ends up reading like medical literature
where they've already decided what medicine they're going to sell you, then
devise tests, diagnoses, and even new diseases to justify the "sale", while
purposefully ignoring (as part of their misuse of the scientific method)
anything that doesn't fit with the idea they're desperately trying to
sell...