[LAU] Performance tuning audio apps

Mon Aug 17 06:44:32 EDT 2009

2009/8/17 Chris Cannam <cannam at all-day-breakfast.com>:
> On Mon, Aug 17, 2009 at 5:21 AM, Ken Restivo<ken at restivo.org> wrote:
>> I'm trying to squeeze the last little bit of juice out of my EEE.
>>
>> The CPU I have is this:
>> http://restivo.org/projects/eee/cpu.txt
>>
>> This nifty script at http://www.pixelbeat.org/scripts/gcccpuopt , says I should use "-march=core2 -mtune=pentium -mfpmath=sse"
>>
>> However, the Gentoo people (who I take to be an -funrollloops authority on performance tuning), say I should "-march=core2 -mtune=generic -fomit-frame-pointer -pipe".
>>
>> And then there is -march=native which many say is just easier and faster. And others recommend putting "-msse2" and other such things.
>>
>> What say you-all?
>
> If you want the fastest possible floating point code, then you
> probably want something like:
>
>  -march=core2 -msse -msse2 -mfpmath=sse -ffast-math -fomit-frame-pointer -O3
>
> ... but with caveats.
>
> Discussion:
>
> Supplying -ffast-math causes the use of non-IEEE-compliant math
> functions.  Among other things, this screws up any code that
> explicitly deals with infinity or NaN values or signed zeroes, and
> makes assumptions about properties like associativity for the purposes
> of optimisation which may not be true in the floating-point world.  In
> other words, it can give you the wrong results.  In _most_ cases,
> audio applications are fine with it, but you need to be aware that it
> can be problematic.
>
> However, -ffast-math in combination with -mfpmath-sse has the very
> nice quality that it enables denormal flush to zero throughout, thus
> avoiding denormal slowdowns in filters and the like.  It's also much
> faster for some of the apparently simple operations like floor() that
> are surprisingly slow in IEEE compliant mode.
>
> It might be interesting to know what the authors of the programs
> you're trying to optimise thought about the use of -ffast-math...
> Perhaps you could compile them both ways and compare the output.

On the SuperCollider dev list we're just having a conversation about
exactly this. NaNs are used in some cases for signalling, and since
compiling with -ffast-math implies -ffinite-math-only, that trashes
the NaN signalling. This combination seems OK though: "-ffast-math
-fno-finite-math-only". The moral of the story is probably that it
depends strongly on the app. Who knows if your chosen softwares make
use of NaNs and infinities? Hard to tell.

Dan


> -fomit-frame-pointer is pretty much guaranteed to make things
> marginally faster but harder to debug.  It won't break anything and it
> won't make any huge improvements.
>
> -O3 rather than -O2 because it enables -ftree-vectorize, which does
> some limited auto-vectorization of loops for things like
> floating-point copy into SSE operations.  This doesn't always do
> anything (depends on the code, obviously) but sometimes it makes a
> significant difference, for example it helps when compiling my Rubber
> Band library.  I've never yet seen any problems with the results, but
> of course there's always an increased risk of running into
> optimisation bugs the more optimisation you do.  You can get
> interesting (?) debug output about vectorization successes and
> failures (mostly failures) with e.g. -ftree-vectorizer-verbose=2.
>
> I would be slightly suspicious of anyone who recommends -pipe as an
> optimisation -- it makes no difference to the resulting code, it just
> makes compiling faster.
>
> If you're using a 64-bit distro, then you can omit the options with
> SSE in them (they're all enabled by default in 64-bit gcc).
>
>
> Chris
> _______________________________________________
> Linux-audio-user mailing list
> Linux-audio-user at lists.linuxaudio.org
> http://lists.linuxaudio.org/mailman/listinfo/linux-audio-user
>


-- 
http://www.mcld.co.uk