[LAD] vectorization

Mario Lang mlang at delysid.org
Wed Apr 16 20:41:36 UTC 2008

Christian Schoenebeck <cuse at users.sourceforge.net> writes:

> Am Mittwoch, 16. April 2008 19:25:01 schrieben Sie:
>> The distributor in me cried out when I read over this.  I know I am kind
>> if nitpicking here, but please consider how much nicer this would be if
>> the small benchmark was run at start up time instead of compilation time.
>> Its not much more work to compile both algorithms into different object
>> files and set up a function pointer at startup.  And it will
>> give you best performance even if you move the precompiled binary
>> from one machine to the other, or if you change the CPU in your
>> computer without reinstalling.
> Mmm right, but in that case the upstream author has to do the compilation for 
> all kind of architectures ... not even knowing what kind of architectures are 
> covered by distribution X. Isn't that the "job" of a distribution maintainer 
> to do such compilation task? ;-)

I am not talking about compilation.  I am trying to explain that it
saves a lot of pain on all sides if the benchmarking and choosing of best
performing variant is done at startup time, instead of forcing the users to
compile a package from source just to have a configure based benchmark
guess their hardware correctly.  I think the linux kernel
is a nice example.  At startup, you may see some messages in dmesg
where Linux tries differently optimized variants of functions for
RAID related things.  The winner is used.  So if you shutdown, change your
CPU, and boot up again, the best optimisation for the new CPU is choosen,
and you do not have to recompile the kernel.  The same should
be true for user space.

>> Besides, a binary distribution has no chance of knowing the exact
>> hardware in use on the users side.  The configure test will only benchmark
>> the build host, which is not really useful.
> Yeah I know, that's why the benchmarks can be circumvented by configure script 
> parameters for cross compilation. Somebody has to compile and somebody has to 
> run the benchmarks ... whoever that will be ... :-)

You are not really following what I am trying to get across.  Cross compilation
isn't the issue.  The issue is that something as generic as i386 (or i686 for
rpm based distros IIRC) actually targets a lot of different types of hardware.
It can run on pretty old pentium based CPUs, but also modern
systems.  A binary distributor has no way of knowing which
CPU is going to be used, in fact, a single binary package
is going to be used by many of the supported variants.
So if you want working performance optimisations in stock Linux apps that
are not being recompiled by geeky users locally, you need to use runtime
benchmarking, instead of hardcoding the choice into the binary at compilation

  ⡍⠁⠗⠊⠕ | Debian Developer <URL:http://debian.org/>
  .''`. | Get my public key via finger mlang at db.debian.org
 : :' : | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44
 `. `'
   `-      <URL:http://delysid.org/>  <URL:http://www.staff.tugraz.at/mlang/>

More information about the Linux-audio-dev mailing list