2009/8/17 Chris Cannam <cannam(a)all-day-breakfast.com>om>:
On Mon, Aug 17, 2009 at 5:21 AM, Ken
Restivo<ken(a)restivo.org> wrote:
I'm trying to squeeze the last little bit of
juice out of my EEE.
The CPU I have is this:
http://restivo.org/projects/eee/cpu.txt
This nifty script at
http://www.pixelbeat.org/scripts/gcccpuopt , says I should use
"-march=core2 -mtune=pentium -mfpmath=sse"
However, the Gentoo people (who I take to be an -funrollloops authority on performance
tuning), say I should "-march=core2 -mtune=generic -fomit-frame-pointer -pipe".
And then there is -march=native which many say is just easier and faster. And others
recommend putting "-msse2" and other such things.
What say you-all?
If you want the fastest possible floating point code, then you
probably want something like:
-march=core2 -msse -msse2 -mfpmath=sse -ffast-math -fomit-frame-pointer -O3
... but with caveats.
Discussion:
Supplying -ffast-math causes the use of non-IEEE-compliant math
functions. Among other things, this screws up any code that
explicitly deals with infinity or NaN values or signed zeroes, and
makes assumptions about properties like associativity for the purposes
of optimisation which may not be true in the floating-point world. In
other words, it can give you the wrong results. In _most_ cases,
audio applications are fine with it, but you need to be aware that it
can be problematic.
However, -ffast-math in combination with -mfpmath-sse has the very
nice quality that it enables denormal flush to zero throughout, thus
avoiding denormal slowdowns in filters and the like. It's also much
faster for some of the apparently simple operations like floor() that
are surprisingly slow in IEEE compliant mode.
It might be interesting to know what the authors of the programs
you're trying to optimise thought about the use of -ffast-math...
Perhaps you could compile them both ways and compare the output.
On the SuperCollider dev list we're just having a conversation about
exactly this. NaNs are used in some cases for signalling, and since
compiling with -ffast-math implies -ffinite-math-only, that trashes
the NaN signalling. This combination seems OK though: "-ffast-math
-fno-finite-math-only". The moral of the story is probably that it
depends strongly on the app. Who knows if your chosen softwares make
use of NaNs and infinities? Hard to tell.
Dan
-fomit-frame-pointer is pretty much guaranteed to make
things
marginally faster but harder to debug. It won't break anything and it
won't make any huge improvements.
-O3 rather than -O2 because it enables -ftree-vectorize, which does
some limited auto-vectorization of loops for things like
floating-point copy into SSE operations. This doesn't always do
anything (depends on the code, obviously) but sometimes it makes a
significant difference, for example it helps when compiling my Rubber
Band library. I've never yet seen any problems with the results, but
of course there's always an increased risk of running into
optimisation bugs the more optimisation you do. You can get
interesting (?) debug output about vectorization successes and
failures (mostly failures) with e.g. -ftree-vectorizer-verbose=2.
I would be slightly suspicious of anyone who recommends -pipe as an
optimisation -- it makes no difference to the resulting code, it just
makes compiling faster.
If you're using a 64-bit distro, then you can omit the options with
SSE in them (they're all enabled by default in 64-bit gcc).
Chris
_______________________________________________
Linux-audio-user mailing list
Linux-audio-user(a)lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-user
--
http://www.mcld.co.uk