Re: [LAD] GCC Vector extensions

20 Jul 2011

On Wed, Jul 20, 2011 at 10:47 AM, Robin Gareus &lt;robin(a)gareus.org&gt; wrote:
...
   On gcc/Linux,
(gcc 4.5.2) the same code produce a *slow down* of around
 2.5x.
 Well, anybody have an idea of why ?
 I am actually running linux (Ubuntu 11.04) under a VMWare virtual
 machine, i do not know is this may have any implications. 
 Maybe. A better comparison would be: clang/Linux vs. gcc/Linux and
 clang/MacOSX vs gcc/MacOSX compiled binaries.
 Also as Dan already pointed out: gcc has a whole lot of optimization
 flags which are not enabled by default. try '-O3 -msse2 -ffast-math'.
  '-ftree-vectorizer-verbose=2' is handy while optimizing code. 
In addition... inspecting the disassembly is helpful (-S -o
myprogram.s).  Rule of thumb is that you should have `movaps` (MOVe
Aligned Packed-Storage) and `mulps` (MULtiply Packed Storage)
instructions for multiplying vectors of single-precision floats.
In addition... profiling with valgrind/callgrind is helpful (esp. if
you have it dump instructions/assembly)...
  $ valgrind --tool=callgrind --dump-instr=yes ./myprogram
Open the output file with kcachegrind and it'll save you a lot of time.
-gabriel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [LAD] GCC Vector extensions