Le 21 juil. 2011 à 20:15, Maurizio De Cecco a écrit :
On Thursday21/7/11 2:22 PM, Stéphane Letz wrote:
Or you can use LLVM to *directly* generate vector
code, as in the following example, result of some experiments done with Faust and it's
LLVM backend:
I would love to have enough time to implement this :->
This is not completely trivial... it needs to go inside Faust FIR (Faust Intermediate
Representation) and improve it..... Then some new code have to be added in the LLVM
backend itself.
Anyway, under clang/Mac OS X, using the vector types reduced the time spent in simple
jmax dsp operator and in the general dsp virtual machine execution to be around 15-18% for
some "typical"
jmax patch. By simple operators i intend the basic signal operations, like
multiplication, constants and so on.
The rest of the time is spent on complex objects, that implements
specific dsp algorithms.
This means that the further performance improvements on jmax dsp execution code (other
than specific objects), whatever technique used, cannot be higher than around 10%. (OK, as
long as this typical patches are really typical, of course).
Anyway, i repeat the tests on clang/gcc on Mac OS X, and i got, for a given example
patch, 30Msamples/seconds for clang with vector extensions, 13Msamples/seconds for gcc
with normal code, ad 11 Msamples/second with gcc with vector operators (gcc 4.2, latest
apple clang).
Using -mss2 or -mss3 on Linux improved the performance of the code using the vector
extensions but really marginally, leaving it largely slower of the basic code.
By the way, by vector extensions i intend what is described here:
http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
Maurizio
You should really look at the generated SSE code. Can you paste the result of each test on
this list?
Thanks.
Stéphane