Re: [LAD] vectorization

15 Apr 2008

On Tue, 2008-04-15 at 19:45 +0200, Christian Schoenebeck wrote:
...
  Yeah, I'm respawning this topic ...

There is something funny with this benchmark. If we compare your
numbers:
...
  Benchmarking mixdown (WITH coeff):
 pure C++                : 890 ms
 ASM SSE                 : 300 ms
 GCC vector extensions   : 230 ms

.. to mine (on a 1.1G Celeron):
  Benchmarking mixdown (WITH coeff):
  pure C++                : 390 ms
  ASM SSE                 : 170 ms
  GCC vector extensions   : 140 ms
.. there is definately a similar pattern showing up, BUT the loops
appear to interfere with each other as you can see when I comment out
everything but ASM:
  Benchmarking mixdown (WITH coeff):
  ASM SSE                 : 160 ms <-- faster?
.. or leave in  C++ as well:
  Benchmarking mixdown (WITH coeff):
  pure C++                : 400 ms <-- slower?
  ASM SSE                 : 170 ms
.. or take out only the ASM:
Benchmarking mixdown (WITH coeff):
pure C++                : 380 ms <-- faster?
GCC vector extensions   : 160 ms <-- slower?
Me thinks it is very difficult to predict what -O3 will or will not do.
mvh // Jens M Andreasen
g++ (GCC) 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)
BTW: I slightly modified the order in x86_sse_mix_buffers_with_gain for
speed:
.MBWG_SSELOOP:
        movaps  (%esi), %xmm0 #; source => xmm0
        addl $16, %esi #; src+=4 //////////////
        mulps   %xmm1,  %xmm0 #; apply gain to source
        addps   (%edi), %xmm0 #; mix with destination
        movaps  %xmm0, (%edi) #; copy result to destination
        subl $4,  %ecx #; nframes-=4
        addl $16, %edi #; dst+=4
        cmp $4, %ecx
        jge .MBWG_SSELOOP
[...]

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [LAD] vectorization