[LAD] vectorization

Jens M Andreasen jens.andreasen at comhem.se
Wed Apr 16 00:10:20 UTC 2008


On Tue, 2008-04-15 at 19:45 +0200, Christian Schoenebeck wrote:
> Yeah, I'm respawning this topic ...
> 

There is something funny with this benchmark. If we compare your
numbers:

> Benchmarking mixdown (WITH coeff):
> pure C++                : 890 ms
> ASM SSE                 : 300 ms
> GCC vector extensions   : 230 ms
> 

.. to mine (on a 1.1G Celeron):

  Benchmarking mixdown (WITH coeff):
  pure C++                : 390 ms
  ASM SSE                 : 170 ms
  GCC vector extensions   : 140 ms

.. there is definately a similar pattern showing up, BUT the loops
appear to interfere with each other as you can see when I comment out
everything but ASM:

  Benchmarking mixdown (WITH coeff):
  ASM SSE                 : 160 ms <-- faster?

.. or leave in  C++ as well:

  Benchmarking mixdown (WITH coeff):
  pure C++                : 400 ms <-- slower?
  ASM SSE                 : 170 ms

.. or take out only the ASM:

Benchmarking mixdown (WITH coeff):
pure C++                : 380 ms <-- faster?
GCC vector extensions   : 160 ms <-- slower?

Me thinks it is very difficult to predict what -O3 will or will not do.

mvh // Jens M Andreasen
g++ (GCC) 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)


BTW: I slightly modified the order in x86_sse_mix_buffers_with_gain for
speed:

.MBWG_SSELOOP:

	movaps  (%esi), %xmm0 #; source => xmm0	 
	addl $16, %esi #; src+=4 ////////////// 
	mulps   %xmm1,  %xmm0 #; apply gain to source
	addps   (%edi), %xmm0 #; mix with destination
	movaps  %xmm0, (%edi) #; copy result to destination

	subl $4,  %ecx #; nframes-=4			
	addl $16, %edi #; dst+=4
        
	cmp $4, %ecx
	jge .MBWG_SSELOOP
[...]





More information about the Linux-audio-dev mailing list