Steve Harris wrote:
On Thu, Jul 31, 2003 at 10:45:01 +0200, Alfons
Adriaensen wrote:
On Thu, Jul 31, 2003 at 09:26:49AM +0100, Steve
Harris wrote:
Several people have asked me what denormal
numbers are over the last few
weeks, well heres a much better description than my rambling head
scratching:
http://www.ecs.soton.ac.uk/~swh/denormal.ps
Its an extract from David Goldberg's article, "What Every Computer
Scientist Should Know about Floating-Point Arithmetic".
Do you know of a *very fast* (probably inline assembly) way to force
denormal FP numbers to zero ?
No, the thing I've been using is (from the music-dsp mailing list):
#define FLUSH_TO_ZERO(fv) (((*(unsigned
int*)&(fv))&0x7f800000)==0)?0.0f:(fv)
but I'm beginning to suspect it doesn't always work.
This macro gives completely erroneous results if you use it in an expression
such as
a = FLUSH_TO_ZERO(x) * FLUSH_TO_ZERO(y);
because the operator precedence gets messed up.
If you're doing this sort of thing you need to fix it with a couple more
brackets...
#define FLUSH_TO_ZERO(fv) ((((*(unsigned
int*)&(fv))&0x7f800000)==0)?0.0f:(fv))
...and then, as far as I can see, the macro always does exactly what it
is supposed
to do.
There are some limitiations though:
1. Portability.
It only works if unsigned int is 32-bits stored in the same byte order
as float.
(Which it probably is).
2. Minus zero.
Gets converted into plus (ie normal) zero. Personally I won't be losing
too much
sleep over this one :)
3. Only works on lvalues.
You can't flush an rvalue, so your code has to generate *and store* the
value
you want to flush. Admittedly the optimiser might optimise away the store
but OTOH...
4. The optimiser might defer the flush.
I think (but am not entirely sure) that you can suffer a performance hit
just
by moving a denormal from one location to another, never mind actually doing
any arithmetic on it. If so then an optimiser could conceivably "optimise"
source code which says flush-then-copy into object code which does
copy-then-flush, so incurring this penalty.
5. Its a bit too late.
The macro doesn't do anything until after a denormal has been generated,
so you have already incurred the denormal penalty on one calculation.
You avoid the performance catastrophe of the denormal propogating through
subsequent calculations, but still suffer the minor performance glitch of
generating it in the first place.
6. Performance
I'd classify this as "quite fast" but not "very fast". This is in
some
ways a good thing: Its probably cheap enough to be carefully deployed in
exactly the places where its needed, but not so cheap that it can be
sprinkled around all over the place where it isn't.
Simon Jenkins
(Bristol, UK)