[LAU] Analysis of monophonic audio signals on the commandline

Jeanette C. julien at mail.upb.de
Sun Feb 27 14:06:06 CET 2022


Thank you for the detailed elaboration. Those were indeed factors I have so 
far ignored, even though I should be aware of them. Somehow I didn't link the 
anlysis of a sonogram to FFT like challenges.

At least, analysing the song of the chaffinch is free of broadband features. 
Certainly going by ear alone. That together with some more already coded data 
from other bird calls may greatly help in a semi-analytical approach, which 
may yield good enough results.

I will start a few experiments and see how good it gets.

Best wishes,

Jeanette

Feb 27 2022, Fons Adriaensen has written:

> On Sun, Feb 27, 2022 at 11:37:45AM +0100, Jeanette C. wrote:
>
>> Hm, if such images are clean, I suppose a program can be written to
>> translate the sonogram to values.
>
> They are not always very clean and there are two reasons for this:
> - the quality of the recording (filtering will help),
> - the complexity of the sound.
>
>> 2D representations of all kind are unfeasible really.
>
> The problem here is that some bird sounds can only be represented
> correctly in 2D parameter space.
>
> Some contain a clear single frequency, usually sweeping and modulated.
> Such modulation can be quite fast, in the tens of Hz region.
> Some others contain very short broadband features, and the whole notion
> of a single frequency is not valid at all. I've seen impulsive waveforms
> of only a few milliseconds in some recordings. And many bird sounds are
> a mix of those two extremes. A sonogram deals with both of them, that
> is why it is useful. So what would be needed is some form of analysis
> that produces less output but is still able to handle both cases and
> everything in between.
>
> Using classical analysis methods, there is a limit to the product of
> resolution in time and frequency, similar to the uncertainty principle
> in quantum physics. Human (and animal) hearing can in some cases go
> beyond that limit - this is possible only by making some a-priori
> assumptions about the signal.
>
> The problem is similar to one that occurs in time-stretching of audio:
> the algorithm must decide if some feature should be regarded as
> significant in the time domain or in the frequency domain. Which is
> why software such as rubberband has both user options and some
> not-so-simle internal decision making.
>
> As a simple example, take a 1 kHz sinewave that is amplitude modulated
> by a 10 Hz signal. The actual frequencies present then are 990, 1000,
> and 1010 Hz. Now how should this be analysed ?
>
> Option 1: as a modulated 1 kHz signal. When time-stretched, e.g by
> a factor of 2, the amplitude as a function of time is preserved,
> the modulation frequency becomes 5 Hz, and the output frequencies
> are 995, 1000, and 1005 Hz.
>
> Option 2: as three separate and unrelated frequencies. Each of them
> is stretched separately, and the output is 990, 1000, and 1010 Hz.
> So this will still sound as 10 Hz modulation, just longer.
>
> Which one is correct ? The simple fact is that both are, it is just
> a matter of interpretation. Deciding this is something our brains
> are good at, based on experience and expectations.
>
> Exactly the same question arises when trying to reduce a signal
> to something that can be described by a 1D function.
>
> Ciao,
>
> -- 
> FA
>
>
>
>

-- 
  * Website: http://juliencoder.de - for summer is a state of sound
  * Youtube: https://www.youtube.com/channel/UCMS4rfGrTwz8W7jhC1Jnv7g
  * Audiobombs: https://www.audiobombs.com/users/jeanette_c
  * GitHub: https://github.com/jeanette-c

Make all the clouds disappear
Put all your fears to rest <3
(Britney Spears)


More information about the Linux-audio-user mailing list