On Mon, 2008-05-05 at 22:33 +0200, Fons Adriaensen wrote:
   ...
 #define N 1024
 ...
 int n = 1000000;
 ... 
 Looping a million times over the same small data vector
 is _not_ very realistic.  
 
OK, good point!
#define N (1024 * 1024 ) // flood the level 2 cache ...
...
int n = 1000; // we need to loop for clock() to take notice
...
with icc I am then getting these numbers:
  clock:  4020 ms (_Complex)
 clock:  1550 ms (cvec_t)
 clock: 70000 ms (original float array[N][2]) 
... which supports the notion that the shape of the data structure is an
important factor, not only for the vectorization but also for data
thruput.
(With gcc results for all versions were in the 50 - 60 000 ms range.)