On Tue, Nov 12, 2002 at 08:23:50 -0800, Bob Colwell wrote:
Yes, you have
to specify the use of sse explicity (I think I meantioned it
on IRC when we were benchmarking). It appeared to make zero difference on
the athlon, but I didn't check the assemler to see exactly what it was
doing. I've heard that just using sse instructions instead of 387 on the
P4 is quicker, but I've not tried it. Gcc will do that if you specify -msse
The sse instructions ought to be substantially faster. There are many more
registers available to support the flops, and they aren't organized into the
ridiculous 387 stack, so they're easier to reach. I believe they also
default
to round-to-nearest and flush-denormals, but if you care about such niceties
you should check.
Yes, that is correct AKAIK. However the particular benchmark we are
talking about has a lot of memory access in it, and in that case it didn't
make any difference (or gcc was doing something stupid, I didn't check).
I dont understand processor issues well enough to know what the
bottleneck would be, but it doesn't appear to be the maths instrucitons.
I will check the effect of sse on my plugins as they are generally less
ram hungry, but I dont have a gcc3 machine around at the moment.
- Steve