when i look into the code in question the next time (it
works well now
so not much incentive), i'll probably do a complete assembler rewrite.
no more messing with my pointers, mr. gcc, thanks.
fwiw,
float * t = &four_floats_somewhere_on_a_128_bit_boundary;
asm ("
movl %0, %%eax
movaps %%xmm4, (%%eax)
flds (%%eax)
fadds 4(%%eax)
fadds 8(%%eax)
fadds 12(%%eax)
fstps (%%eax)" : : "p" (t) : "%eax",
"%st");
sums xmm4 into t[0].
tim