Steve Harris wrote:
On Tue, Sep 17, 2002 at 11:49:26 +0200, Frank Neumann
wrote:
PS: No, I haven't been doing any Linux
assembly programming yet. I only
wondered every now and then how much performance one could get out of
some of the existing Linux soundapps if there were optimized versions of
such programs for certain CPUs.
From experimenting with the intel vecorising
compiler I would estimate you
could get 10-30% speed increase over gcc 2.96, for
inner loops. I think
gcc 3 does a better job, and maybe even does some 3dNow vectorisation?
unfortunately there's nothing in
http://gcc.gnu.org/gcc-3.0/features.html that would suggest this.
the vector instructions (SSE, 3DNow and Altivec etc.)
are mostly useful
when you have parallel streams to be processed identically, but you can
also get wins just by dropping in vec. instructions inplace of float
equivalents as they are sometimes less accruate, but faster and have
common branch avoiding instructions (eg. min and max).
applying static gain seems to be a good candidate for SIMD even if
there's only one channel. needs some additional logic if buffers
aren't an integral multiple of the multiword size, though. finding
peaks without branching is very attractive, and a routine integrating
both would be very useful.
I think there is some cost in switching between float
mode and vector mode
(on x86, probably not powerpc), and I dont know how high that is.
i don't know either. it seems that all register content is lost, so
i think it is safe to assume it is internal to the cpu, and that the
cost will be fairly small. rdtsc will tell for sure. :)
with the need for [f]emms and gcc's crude inline assembly syntax, it
seems very desirable to isolate vectorized routines into .s sources
to me.
tim