[LAD] GCC Vector extensions

Gabriel Beddingfield gabrbedd at gmail.com
Wed Jul 20 16:19:44 UTC 2011


On Wed, Jul 20, 2011 at 10:47 AM, Robin Gareus <robin at gareus.org> wrote:
>> On gcc/Linux, (gcc 4.5.2) the same code produce a *slow down* of around
>> 2.5x.
>>
>> Well, anybody have an idea of why ?
>>
>> I am actually running linux (Ubuntu 11.04) under a VMWare virtual
>> machine, i do not know is this may have any implications.
>
> Maybe. A better comparison would be: clang/Linux vs. gcc/Linux and
> clang/MacOSX vs gcc/MacOSX compiled binaries.
>
> Also as Dan already pointed out: gcc has a whole lot of optimization
> flags which are not enabled by default. try '-O3 -msse2 -ffast-math'.
>  '-ftree-vectorizer-verbose=2' is handy while optimizing code.

In addition... inspecting the disassembly is helpful (-S -o
myprogram.s).  Rule of thumb is that you should have `movaps` (MOVe
Aligned Packed-Storage) and `mulps` (MULtiply Packed Storage)
instructions for multiplying vectors of single-precision floats.

In addition... profiling with valgrind/callgrind is helpful (esp. if
you have it dump instructions/assembly)...

  $ valgrind --tool=callgrind --dump-instr=yes ./myprogram

Open the output file with kcachegrind and it'll save you a lot of time.

-gabriel



More information about the Linux-audio-dev mailing list