[linux-audio-dev] Fixed vs. floating point

Phil Frost indigo at bitglue.com
Fri Oct 14 20:28:55 UTC 2005


On Fri, Oct 14, 2005 at 09:00:37PM +0200, cpolymeris at gmx.net wrote:
> First of all, thank you all very much for you comments, the picture is much
> clearer now.
> 
> I just don't fully understand the floating-point precission part. If numbers
> from binary 0.1 to 1.0 are represented using 24 bits (sign bit + mantissa, I
> think the implicit 1 does not count), and the numbers from binary 0.01 to
> 0.1 also have 24 bits of precision, and so on for 0.001..0.01, etc. wouldn't
> that mean we have a higher resolution?
> We are using 7 of 8 exponent bits too. Just wasting the cases where the
> exponent is larger than 0, or has some special meaning.)
> That would give you 31 bits, minus a couple useless and redundant cases.
> (when the exponent is -128, and denormalizing ocurrs)
> 
> Then I also fail to see why it's bad for overflows to ocurr in fixed point.
> Those signals (above 0dB FS) would clip on the hardware anyway, and are
> expected to do so, since they were either badly recorded or amplified.
> 
> Greetings, Dimitri

Obviously, too strong a signal will clip. This of course requires
calculations to check for overflow. More likely, such checks will be too
slow, so overflow will result in the value "wrapping around", that is,
maxint + 1 => minint. This of course sounds really, really bad.

Less obviously, too weak a signal will also distort. If the signal is in
the range [-6, 5] then there are only 10 discrete values a sample may
have, and your signal will sound like it's being played through your PC
beeper with a pencil stuck through the cone.

Also, any DC offset will indirectly make fewer values available,
resulting in either clipping, or lack of precision.

These problems don't seem so bad if only the output is considered, but
in any audio processing pipeline, there are many more signal paths that
must be considered. Imagine all the different signal paths in a filter.
It is possible to avoid the mentioned problems using fixed point
arithmetic where the "point" is fixed at a different place depending on
the expected signal level. However, it's not much fun, and I very much
doubt it is significantly faster.

By using floating point, you gaurantee that the biggest step between two
consecutive values is somewhere between 1/2^23 and 1/2^24th of your
signal range (provided you don't run up against the limits of floating
point, which is hard to do). Of course there are smaller steps between
consecutive values as values approach 0; this is what makes
floating-point desirable. In other words, you do have more than 2^24
values available to you, but the worst case difference between
consecutive values is never much worse than if you had exactly 2^24
values.



More information about the Linux-audio-dev mailing list