Hi,
And very interesting findings, thanks for looking into this!
Here are some more on my core2 pc.
flags: CFLAGS=-O3 -mmmx -msse -mfpmath=sse -ftree-vectorize
compiler: 4.1.3 or 4.2.1, didn't make a difference.
1.
#define FRAGMENTSIZE 32
Benchmarking mixdown (WITH coeff):
Process time for pure C++: 1505 useconds
Process time for ASM SSE: 2871 useconds
Process time for GCC vector extensions: 503 useconds
2.
#define FRAGMENTSIZE 64
Benchmarking mixdown (WITH coeff):
Process time for pure C++: 3006 useconds
Process time for ASM SSE: 5072 useconds
Process time for GCC vector extensions: 1568 useconds
3.
#define FRAGMENTSIZE 128
Benchmarking mixdown (WITH coeff):
Process time for pure C++: 6793 useconds
Process time for ASM SSE: 8232 useconds
Process time for GCC vector extensions: 6091 useconds
3.
#define FRAGMENTSIZE 512
Benchmarking mixdown (WITH coeff):
Process time for pure C++: 19843 useconds
Process time for ASM SSE: 31141 useconds
Process time for GCC vector extensions: 17669 useconds
4.
#define FRAGMENTSIZE 1024
Benchmarking mixdown (WITH coeff):
Process time for pure C++: 27083 useconds
Process time for ASM SSE: 42730 useconds
Process time for GCC vector extensions: 31436 useconds
I only modified the example to use gettimeofday() instead of clock().
Maybe a gcc developer can shed some light on this issue ?
Greetings,
Remon