Re: [LAD] vectorization

7 May 2008

On Wed, 2008-05-07 at 01:45 +0300, Jussi Laako wrote:
...
  Fons Adriaensen wrote:
 > Which will determine performance for every algorithm that
 >
 > - is working on a data set that is larger than the cache,
 > - does not produce multiple results from the same inputs.  -<snip>-
...
  There are several use cases where the data set is
rather small and is
 used in several subsequent loops, thus cache can help.
 After profiling, I've identified number of algorithms which
 significantly benefit from handwritten vectorized asm.

One thing that I wonder is what the pattern of addition in Fons's
application really looks like. I assume the fftA * fftB is some windowed
precalculated impulsresponse and a signal? The addition/accumulate
suggests that the output fftD has been touched before, implying that
there could be more variables to work on at once or that the vectors
would still be in the cache if the order of addition was changed.
/j
PS: Your fastest calculation is when the data floods the cache:
N=(1024*1024), n=1000, gcc, clock: 8410 ms (_Complex). Is that a typo?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [LAD] vectorization