[LAD] vectorization
    Jens M Andreasen 
    jens.andreasen at comhem.se
       
    Wed Apr 16 00:10:20 UTC 2008
    
    
  
On Tue, 2008-04-15 at 19:45 +0200, Christian Schoenebeck wrote:
> Yeah, I'm respawning this topic ...
> 
There is something funny with this benchmark. If we compare your
numbers:
> Benchmarking mixdown (WITH coeff):
> pure C++                : 890 ms
> ASM SSE                 : 300 ms
> GCC vector extensions   : 230 ms
> 
.. to mine (on a 1.1G Celeron):
  Benchmarking mixdown (WITH coeff):
  pure C++                : 390 ms
  ASM SSE                 : 170 ms
  GCC vector extensions   : 140 ms
.. there is definately a similar pattern showing up, BUT the loops
appear to interfere with each other as you can see when I comment out
everything but ASM:
  Benchmarking mixdown (WITH coeff):
  ASM SSE                 : 160 ms <-- faster?
.. or leave in  C++ as well:
  Benchmarking mixdown (WITH coeff):
  pure C++                : 400 ms <-- slower?
  ASM SSE                 : 170 ms
.. or take out only the ASM:
Benchmarking mixdown (WITH coeff):
pure C++                : 380 ms <-- faster?
GCC vector extensions   : 160 ms <-- slower?
Me thinks it is very difficult to predict what -O3 will or will not do.
mvh // Jens M Andreasen
g++ (GCC) 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)
BTW: I slightly modified the order in x86_sse_mix_buffers_with_gain for
speed:
.MBWG_SSELOOP:
	movaps  (%esi), %xmm0 #; source => xmm0	 
	addl $16, %esi #; src+=4 ////////////// 
	mulps   %xmm1,  %xmm0 #; apply gain to source
	addps   (%edi), %xmm0 #; mix with destination
	movaps  %xmm0, (%edi) #; copy result to destination
	subl $4,  %ecx #; nframes-=4			
	addl $16, %edi #; dst+=4
        
	cmp $4, %ecx
	jge .MBWG_SSELOOP
[...]
    
    
More information about the Linux-audio-dev
mailing list