[linux-audio-dev] Denormal numbers

List overview All Threads
Download

newer

older

[linux-audio-dev] Best Buy Award,...

[linux-audio-dev] Jack 0.75.0...

Steve Harris

31 Jul 2003 31 Jul '03

4:40 a.m.

Several people have asked me what denormal numbers are over the last few weeks, well heres a much better description than my rambling head scratching: http://www.ecs.soton.ac.uk/~swh/denormal.ps Its an extract from David Goldberg's article, "What Every Computer Scientist Should Know about Floating-Point Arithmetic". - Steve

Show replies by date

Alfons Adriaensen

31 Jul 31 Jul

5:04 a.m.

On Thu, Jul 31, 2003 at 09:26:49AM +0100, Steve Harris wrote:

...

Do you know of a *very fast* (probably inline assembly) way to force denormal FP numbers to zero ? -- FA

Steve Harris

5:19 a.m.

On Thu, Jul 31, 2003 at 10:45:01 +0200, Alfons Adriaensen wrote:

...

On Thu, Jul 31, 2003 at 09:26:49AM +0100, Steve Harris wrote:

Do you know of a *very fast* (probably inline assembly) way to force denormal FP numbers to zero ?

No, the thing I've been using is (from the music-dsp mailing list): #define FLUSH_TO_ZERO(fv) (((*(unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv) but I'm beginning to suspect it doesn't always work. Sometimes you can just add low amplitude whitenoise instead. - Steve

Simon Jenkins

1 Aug 1 Aug

5:49 p.m.

Steve Harris wrote:

...

On Thu, Jul 31, 2003 at 10:45:01 +0200, Alfons Adriaensen wrote:

On Thu, Jul 31, 2003 at 09:26:49AM +0100, Steve Harris wrote:

Do you know of a *very fast* (probably inline assembly) way to force denormal FP numbers to zero ?

No, the thing I've been using is (from the music-dsp mailing list): #define FLUSH_TO_ZERO(fv) (((*(unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv) but I'm beginning to suspect it doesn't always work.

This macro gives completely erroneous results if you use it in an expression such as a = FLUSH_TO_ZERO(x) * FLUSH_TO_ZERO(y); because the operator precedence gets messed up. If you're doing this sort of thing you need to fix it with a couple more brackets... #define FLUSH_TO_ZERO(fv) ((((*(unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv)) ...and then, as far as I can see, the macro always does exactly what it is supposed to do. There are some limitiations though: 1. Portability. It only works if unsigned int is 32-bits stored in the same byte order as float. (Which it probably is). 2. Minus zero. Gets converted into plus (ie normal) zero. Personally I won't be losing too much sleep over this one :) 3. Only works on lvalues. You can't flush an rvalue, so your code has to generate *and store* the value you want to flush. Admittedly the optimiser might optimise away the store but OTOH... 4. The optimiser might defer the flush. I think (but am not entirely sure) that you can suffer a performance hit just by moving a denormal from one location to another, never mind actually doing any arithmetic on it. If so then an optimiser could conceivably "optimise" source code which says flush-then-copy into object code which does copy-then-flush, so incurring this penalty. 5. Its a bit too late. The macro doesn't do anything until after a denormal has been generated, so you have already incurred the denormal penalty on one calculation. You avoid the performance catastrophe of the denormal propogating through subsequent calculations, but still suffer the minor performance glitch of generating it in the first place. 6. Performance I'd classify this as "quite fast" but not "very fast". This is in some ways a good thing: Its probably cheap enough to be carefully deployed in exactly the places where its needed, but not so cheap that it can be sprinkled around all over the place where it isn't. Simon Jenkins (Bristol, UK)

Simon Jenkins

8:44 p.m.

Simon Jenkins wrote:

...

If you're doing this sort of thing you need to fix it with a couple more brackets... #define FLUSH_TO_ZERO(fv) ((((*(unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv)) ...and then, as far as I can see, the macro always does exactly what it is supposed to do.

Whoops... I spoke too soon. I've found a way to make the macro fail: /*==================================================== Program to demonstrate FLUSH_TO_ZERO macro failure. S.Jenkins 2003 Using GCC 2.95.4 Macro fails at optimisation -O1 or above if -fstrict-aliasing is also used. ====================================================*/ #include <stdio.h> #define FLUSH_TO_ZERO(fv) ((((*(unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv)) float TestResult; int main (int argc, char **argv) { int j; /* set TestResult to a value that is just barely above the denormal float threshold */ *(unsigned int *)&TestResult = 0x00800001; /* now decay TestResult down into the denormals... */ for( j = 0; j <1024; j++ ) { TestResult = TestResult * 0.999f; /* ...but flush it to zero (causing all subsequent values to be zero) as soon as it actually becomes denormal */ TestResult = FLUSH_TO_ZERO(TestResult); } /* Result should be zero and - if we instrument the code - we should only see ourselves paying the penalty of a single denormal calculation */ printf( "TestResult: %e\n", TestResult ); exit(0); } /*=== the end ==========================================*/ When compiled with -01 -fstrict-aliasing, this produces a denormal result and takes a relatively long time doing so. I guess that the -fstrict-aliasing option is telling the optimiser to ignore the possibility that a pointer to unsigned int might actually point to a global float variable, so it believes the "unsigned int" that the macro is testing cannot be the same entity as the float that the loop is modifying. So it moves the test outside of the loop. Anyway, enough doom and gloom. Here's the fix: #define FLUSH_TO_ZERO(fv) ((((*(volatile unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv)) (Note: This will stop working once Juhana gets the "volatile" keyword removed from the C language :) ) Simon Jenkins (Bristol, UK)

Jack O'Quin

9:02 p.m.

Simon Jenkins <sjenkins(a)blueyonder.co.uk> writes:

...

#define FLUSH_TO_ZERO(fv) ((((*(volatile unsigned int*)&(fv))&0x7f800000)==0)?0.0f:(fv))

Nice work.

...

(Note: This will stop working once Juhana gets the "volatile" keyword removed from the C language :) )

I can just picture the skies filling with planes carrying all the people going to *that* standards group meeting! ;-) Regards, -- Jack O'Quin Austin, Texas, USA

Steve Harris

2 Aug 2 Aug

6:10 a.m.

On Fri, Aug 01, 2003 at 11:36:22 +0100, Simon Jenkins wrote:

...

There are some limitiations though:

Those are all good points, but my concerns are that I dont think it actually does what its supposed to do :) It could be the pointer aliasing + optimisation thing thugh, I'l check it out. - Steve

Simon Jenkins

8:38 a.m.

Steve Harris wrote:

...

On Fri, Aug 01, 2003 at 11:36:22 +0100, Simon Jenkins wrote:

There are some limitiations though:

Those are all good points, but my concerns are that I dont think it actually does what its supposed to do :) It could be the pointer aliasing + optimisation thing thugh, I'l check it out. - Steve

The actual maths (checking whether the exponent field is zero) looks perfect to me apart from the minus zero quibble, so its got to be a language or compiler subtlety. So far we've got: Problem: Operator precedence in complex expressions Symptom: Totally incorrect calculations in some circumstances Fix: Extra pair of brackets Problem: Pointer aliasing + optimisation Symptom: Calculations correct, but denormals not flushed in some circumstances Fix: Cast to volatile unsigned int * Note: After the fix the affected code runs slightly slower in ordinary (non-denormal) calculations because the compiler is no longer optimising away the checking. Both of the above are working solutions to demonstrable problems. If it still doesn't work after they are applied then you could try -ffloat-store. The compiler manual says that programs which rely on the exact storage format of IEEE floats should use this option, but I am unable to break the macro in a way that this fixes. (Possibly because the macro only works on lvalues and -ffloat-store seems to be concerned with intermediate values that the compiler generates). === Alternatively: === I'm hoping to post some code in the next day or two which prevents denormal values from being generated in the first place. (I haven't been torturing FLUSH_TO_ZERO in isolation... I've got my own code hanging by its feet in the cell next door :) ) Simon Jenkins (Bristol, UK)

Simon Jenkins

3 Aug 3 Aug

2 p.m.

Simon Jenkins wrote:

...

[...]If it still doesn't work after they are applied then you could try -ffloat-store. The compiler manual says that programs which rely on the exact storage format of IEEE floats should use this option, but I am unable to break the macro in a way that this fixes.

Unfortunately -ffloat-store slows code down a *lot*. More unfortunately, I can't prove that the macro will always function correctly without it. I still can't actually break it, but things I have read whilst searching the web have given me definite cause for concern that it could fail. I might mail a compiler list about this one.

...

I'm hoping to post some code in the next day or two which prevents denormal values from being generated in the first place.

Here's where I've got to so far. Comments are welcome. (Note: This might, or might not, suffer from the same problems that FLUSH_TO_ZERO might or might not suffer from :)) /* === benormal.h - Copyright (C) 2003 Simon Jenkins === */ #define FLOAT_EXP_MASK (0x7F800000) #define FLOAT_AS_BITS(x) (*(volatile unsigned int *)&(x)) /*============================================================================ Function: FlushMultiplyQuick ============================================================================== Purpose: Returns the product of its parameters, but flushes the result to zero if there is suspicion (not proof!) that the result might have been a denormal float. Method: The result is flushed to zero if the magnitude of either parameter is below 2**-63. Comments: This never produces a denormal, but sometimes flushes a result that would not have been denormal. ============================================================================*/ static inline float FlushMultiplyQuick( float a, float b ) { return ( ( FLOAT_AS_BITS(a) & 0x60000000 ) && ( FLOAT_AS_BITS(b) & 0x60000000 ) ) ? (a * b) : 0.0f; } /*============================================================================ Function: FlushMultiplyQuickAsym ============================================================================== Purpose: Returns the product of its parameters, but flushes the result to zero if there is suspicion (not proof!) that the result might have been a denormal float. Method: The result is flushed to zero if the magnitude of the first parameter is below 2**-63. Comments: Faster than FlushMultiplyQuick, but might produce a denormal if the second parameter is non-zero with magnitude < 2**-63. Never produces a denormal if second parameter has magnitude >= 2**-63, ============================================================================*/ static inline float FlushMultiplyQuickAsym( float a, float b ) { return ( FLOAT_AS_BITS(a) & 0x60000000 ) ? (a * b) : 0.0f; } /*============================================================================ Function: FlushMultiply ============================================================================== Purpose: Returns the product of its parameters, but flushes the result to zero if there is suspicion (not proof!) that the result might have been a denormal float. Method: Computes a lower bound for the exponent of the result by adding the exponents of the parameters. Flushes the result if the lower bound suggests a possible denormal result. Comments: A bit slower than the other methods, but much less likely to flush a non-denormal result. (A non-denormal will only be flushed if the mantissas of the parameters would have "saved" an otherwise denormal result by having a product >= 2). Never produces a denormal result. ============================================================================*/ static inline float FlushMultiply( float a, float b ) { return ( ( FLOAT_AS_BITS(a) & FLOAT_EXP_MASK ) + ( FLOAT_AS_BITS(b) & FLOAT_EXP_MASK ) > 0x3F800000 ) ? (a * b) : 0.0f; } /*=== end of file ===*/ Simon Jenkins (Bristol, UK)

Steve Harris

2:55 p.m.

On Sun, Aug 03, 2003 at 07:47:31 +0100, Simon Jenkins wrote:

...

Here's where I've got to so far. Comments are welcome.

Looks fine in theory, but it forces an extra branch per MUL, which is not really practical. I haven't had a chance to try your fixes to FLUSH_TO_ZERO, but I'l try this week. - Steve

Simon Jenkins

5:52 p.m.

Steve Harris wrote:

...

On Sun, Aug 03, 2003 at 07:47:31 +0100, Simon Jenkins wrote:

Here's where I've got to so far. Comments are welcome.

Looks fine in theory, but it forces an extra branch per MUL, which is not really practical.

Its only an extra branch on the MULs where you actually use it. I'm not suggesting replacing every multiplication everywhere with this, but to pinpoint *exactly* the MUL that might first produce a denormal (*if* that is actually a potential problem in an app) and replacing just one crucial multiplication with a flushing function, which hopefully will propogate the zero through the rest of the calculation.

...

I haven't had a chance to try your fixes to FLUSH_TO_ZERO, but I'l try this week.

They definitely fix actual problems, but I'm not at all convinced that they fix all problems. I suspect that its still broken. If so then my flushing multiply functions are possibly (but not definitely) broken in the same way. Simon Jenkins (Bristol, UK)

Jussi Laako

2 Aug 2 Aug

4:29 p.m.

On Thu, 2003-07-31 at 11:45, Alfons Adriaensen wrote:

...

Do you know of a *very fast* (probably inline assembly) way to force denormal FP numbers to zero ?

If I remember correctly, SSE does flush-denormals-to-zero by default. -- Jussi Laako <jussi.laako(a)pp.inet.fi>

Steve Harris

3 Aug 3 Aug

2:56 p.m.

On Sat, Aug 02, 2003 at 11:14:25 +0300, Jussi Laako wrote:

...

On Thu, 2003-07-31 at 11:45, Alfons Adriaensen wrote:

Do you know of a *very fast* (probably inline assembly) way to force denormal FP numbers to zero ?

If I remember correctly, SSE does flush-denormals-to-zero by default.

Yes, I think thats correct, but I got a performace penalty when I've tried telling gcc to produce SSE code instead of 387. I should play with it somemore, its possibel that you can tell the 387 to ignore the exceptions. - Steve

Jussi Laako

4 Aug 4 Aug

12:50 p.m.

On Sun, 2003-08-03 at 21:42, Steve Harris wrote:

...

I should play with it somemore, its possibel that you can tell the 387 to ignore the exceptions.

See /usr/include/fpu_control.h -- Jussi Laako <jussi.laako(a)pp.inet.fi>

Steve Harris

1:30 p.m.

On Mon, Aug 04, 2003 at 07:35:42 +0300, Jussi Laako wrote:

...

On Sun, 2003-08-03 at 21:42, Steve Harris wrote:

I should play with it somemore, its possibel that you can tell the 387 to ignore the exceptions.

See /usr/include/fpu_control.h

Thanks, and Hmmm.... If its possible to disable it (on a per process basis, we dont want to turn it off for every process), maybe it should be a recommendation for realtime LADSPA hosts? Its hard to imageine that any realtime audio app would require denormal support for anything. - Steve

Paul Davis

2:39 p.m.

...

If its possible to disable it (on a per process basis, we dont want to turn it off for every process), maybe it should be a recommendation for realtime LADSPA hosts?

quasimodo used to do this, modelled a Csound hack. it is per-process, since the kernel is supposed to restore FPU flags during context switches.

Simon Jenkins

6:01 p.m.

Jussi Laako wrote:

...

On Sun, 2003-08-03 at 21:42, Steve Harris wrote:

I should play with it somemore, its possibel that you can tell the 387 to ignore the exceptions.

See /usr/include/fpu_control.h

The trouble with turning off the exceptions is, erm, that they are already turned off. (Following discussion is intel 32-bit and equivalent specific:) There are two exceptions relevant to denormals: - A denormal operand exception occurs when an operand is denormal. - An underflow exception occurs when the result of an operation is denormal, You can either enable these exceptions and deal with them in software, or you can mask them and let the FPU deal with them. If you let the FPU deal with them (which is the default) then it proceeds with calculations that have denormal operands, and it produces denormal results when calculations underflow. There is no hardware option to flush either denormal operands or denormal results to zero. (I think that there is such an option on Itanium processors though, and on some other processor families). The slow-down that happens with denormal calculations isn't the result of exception handling, its the result of the FPU hardware itself taking a lot longer to perform the calculations. So, at least until the next hardware upgrade, we're stuck with detecting and dealing with denormals ourselves. Here are some possibly useful macros: typedef unsigned int fpu_status_t __attribute__ ((__mode__ (__HI__))); #define GET_FPU_STATUS_WORD(x) __asm__ ("fnstsw %0" : "=m" (*&x)) #define CLEAR_FPU_STATUS_BITS __asm__ ("fnclex") #define FPU_SW_DENORMAL_EXCEPTION_MASK (0x0002) #define FPU_SW_UNDERFLOW_EXCEPTION_MASK (0x0010) The exception bits are "sticky" so we can do things like... fpu_status_t FPUStatus; /*..*/ CLEAR_FPU_STATUS_BITS; /*..*/ /* do a chunk of FP math here */ /*..*/ GET_FPU_STATUS_WORD(FPUStatus); if( FPUStatus & ( FPU_SW_DENORMAL_EXCEPTION_MASK | FPU_SW_UNDERFLOW_EXCEPTION_MASK ) != 0 ) { /* either we were passed a denormal or we generated one ourselves */ /*..*/ /* flush results to zero or - depending what the math was - take some other appropriate action */ /*..*/ } Simon Jenkins (Bristol, UK)

Jussi Laako

6 Aug 6 Aug

2:46 p.m.

On Tue, 2003-08-05 at 01:48, Simon Jenkins wrote:

...

that have denormal operands, and it produces denormal results when calculations underflow. There is no hardware option to flush either denormal operands or denormal results to zero. (I think that there is such an option on Itanium processors though, and on some other processor families).

There is no such option for x86 family afaik. Flush-denormals-to-zero by CPU/FPU requires use of SSE/3DNow for floating point calculations. To get rid of at least some of the denormal performance problems one could use "-march=pentium4 -msse2 -mfpmath=sse" on GCC or "-tpp7 -xW" on ICC.

...

The slow-down that happens with denormal calculations isn't the result of exception handling, its the result of the FPU hardware itself taking a lot longer to perform the calculations.

This performance penaly is _very_ significant on Intel CPUs and rather low on AMD ones. However, using SSE2 for fp math on P4 fixes this. -- Jussi Laako <jussi.laako(a)pp.inet.fi>

Steve Harris

3:24 p.m.

On Wed, Aug 06, 2003 at 09:31:17 +0300, Jussi Laako wrote:

...

This performance penaly is _very_ significant on Intel CPUs and rather low on AMD ones.

That explains why I've been getting bug reports from P4 users, and cant reproduce it on my athlon, thanks. There were several serious bugs with gcc's sse2 code generation unfortunatly. I dont know when, or if they have been fixed. - Steve

Frank Barknecht

31 Jul 31 Jul

5:23 a.m.

Hallo, Steve Harris hat gesagt: // Steve Harris wrote:

...

Intersting article. For those who like me wanted to get the full text: it's available at Citeseer: http://citeseer.nj.nec.com/253514.html ciao -- Frank Barknecht _ ______footils.org__

8030

days inactive

8036

days old

linux-audio-dev@lists.linuxaudio.org

Manage subscription

19 comments

7 participants

tags (0)

participants (7)

Alfons Adriaensen
Frank Barknecht
Jack O'Quin
Jussi Laako
Paul Davis
Simon Jenkins
Steve Harris