Excerpts from Adrian Knoth's message of 2011-12-10 15:01:54 +0100:
On Sat, Dec 10, 2011 at 01:10:38PM +0100, Philipp
Überbacher wrote:
2.
Players like VLC have a normalize function. I don't know if it's
one-pass (on the fly, more like automated gain control) or two-pass,
but it basically solves your problem of audio levels at playback
time.
Counter argument:
If normalization would be sufficient, there wouldn't have been a
need for replaygain in the first place. Explanation on
http://www.replaygain.org/.
Huh? I cannot find anything on this website that says "normalization is
not sufficient":
"Although music is encoded to a digital format with a clearly defined
maximum peak amplitude, and although most recordings are normalized
to utilize this peak amplitude, not all recordings sound equally loud.
This is because once this peak amplitude is reached, perceived loudness
can be further increased through signal-processing techniques such as
dynamic range compression and equalization"
This page tells you that peak normalization is not sufficient, but
that's nothing new, it's logical that the highest peak has basically no
relation to the perceived loudness of a piece of audio.
I have to admit I'm not too familiar with the
details of replaygain, but
I'm not aware that it actually does compression or equalization. In
contrast, they explicitly state:
"The player reads the corresponding gain metadata value from the file
and scales the audio data as appropriate. Scaling the audio data simply
means multiplying each sample value by a constant value."
That's gain, nothing more, nothing less.
Exactly, no equalisation, no compression, that's why purists like it.
According to the wiki, they store four values:
1. Peak track amplitude
2. Peak album amplitude
3. Track replay gain
4. Album replay gain
We can directly forget about the album values, since we're talking
videos here.
I agree, album mode makes little sense unless we have music video albums
or similar.
Let's assume we're brave and normalize to
0dBFS, then this is obviously
the resulting peak track amplitude.
Asking for additional replay gain would simply cause distortion.
Long story short: only the peak track amplitude could be a useful
information if you don't want to apply automatic gain control or
read the entire file every time you play it just to determine the
correct gain for normalization.
Wrong. This is in short how (track) replaygain works:
1) Calculate the tracks loudness. Loudness implies psychoacoustics. It
uses some algorithm that takes the human perception into account.
2) Compare the measurement value to a reference (pink noise at -14 dB)
and store the difference as metadata.
3) The audio player reads the metadata and adjusts its output level
according to the metadata.
The result: levels of 'loud' songs get lowered to the reference level,
levels of 'soft' songs get raised to the reference level. No
compression, now eq, no need to twist the volume knob all the time.
Again, the reference level is below 0 dBFS, loud songs get attenuated.
Peak normalization in contrast just tries to make everything as loud as
possible and doesn't take human perception into account at all.
There are special tools. If nothing helps, vlc and
mplayer can do this.
mplayer with -dumpvideo and -dumpaudio, vlc with the transcode commands.
Right,
this should do. Is there an API for that?
Sure, libavformat from ffmpeg or libav, whatever you prefer.
Ok, thanks.
After that,
the scanning process should work as with any audio file.
Afterwards the calculated replaygain values have to be added to the
metadata of the file.
Provided that the audio track supports meta data. Depending on the
container, you can embed almost everything from pcm to ogg, mp3, mp4/aac
and so on.
Depending on the container, it might be possible to add the meta data to
the muxed file.
Ah, I didn't think of adding metadata to the audio itself,
another
possibility. However, adding it to the container would probably be more
universal.
It's probably not that simple. If the container doesn't provide the
possibility to add sane metadata, you'd be lost. Likewise, there might
be multiple audio streams (stereo, multichannel, different languages).
You'd need a way to relate to those substreams from within your global
metadata.
This should be possible, shouldn't it? The player needs to be able to
identify those anyway.
It could be easier to work on the audio streams
directly, but this would
require re-muxing the file, causing subtle problems like A-V sync or the
necessity to write arbitrary containers (MPEG, MPEG-TS, MP4, ogv...).
ffmpeg/libav might help.
Either way, it's complex. ;)
Yes, especially with all those containers and formats out there.
However, I'd start with one where it's reasonably easy to do :)
[jackd connection handling]
Alex Stone, I guess :)
Exactly.
If you have a good idea, please tell me. I'll
have to find a team on
monday and it might help to have some good proposals.
I'm still looking for somebody to rewrite hdspmixer. ;) While we're at
it, how about an HTML based approach: you fire up your browser, either
have a matrix mixer for everything or can select individual output
buses, and then have the input/playback faders for this destination
only. (I have a couple of details, if you want to go this route)
I don't have such a mixer. The html approach is probably outside the
scope of this class, it's a beginner C/C++ class.
Other idea: a P1722 streamer, no idea if Christoph
Kuhr is still working
on that.
I guess my teacher could like it, but I know basically nothing about
network stuff at this point.
Or ask Paul if he needs some help with jack3. ;)
Cheers
Haven't heard of that one yet and I doubt we need another implementation
:)
Thanks for your help,
regards,
Philipp