On Tue, 2008-04-15 at 19:45 +0200, Christian Schoenebeck wrote:
Yeah, I'm respawning this topic ...
There is something funny with this benchmark. If we compare your
numbers:
Benchmarking mixdown (WITH coeff):
pure C++ : 890 ms
ASM SSE : 300 ms
GCC vector extensions : 230 ms
.. to mine (on a 1.1G Celeron):
Benchmarking mixdown (WITH coeff):
pure C++ : 390 ms
ASM SSE : 170 ms
GCC vector extensions : 140 ms
.. there is definately a similar pattern showing up, BUT the loops
appear to interfere with each other as you can see when I comment out
everything but ASM:
Benchmarking mixdown (WITH coeff):
ASM SSE : 160 ms <-- faster?
.. or leave in C++ as well:
Benchmarking mixdown (WITH coeff):
pure C++ : 400 ms <-- slower?
ASM SSE : 170 ms
.. or take out only the ASM:
Benchmarking mixdown (WITH coeff):
pure C++ : 380 ms <-- faster?
GCC vector extensions : 160 ms <-- slower?
Me thinks it is very difficult to predict what -O3 will or will not do.
mvh // Jens M Andreasen
g++ (GCC) 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)
BTW: I slightly modified the order in x86_sse_mix_buffers_with_gain for
speed:
.MBWG_SSELOOP:
movaps (%esi), %xmm0 #; source => xmm0
addl $16, %esi #; src+=4 //////////////
mulps %xmm1, %xmm0 #; apply gain to source
addps (%edi), %xmm0 #; mix with destination
movaps %xmm0, (%edi) #; copy result to destination
subl $4, %ecx #; nframes-=4
addl $16, %edi #; dst+=4
cmp $4, %ecx
jge .MBWG_SSELOOP
[...]