Simon Jenkins wrote:
I can definitely get
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].
I was getting a segfault on about 50% of compiles, as I modified
the code, because the array was being aligned to 8 byte boundaries
but not to 16 bytes. Declaring it as
float t[4] __attribute__ ((aligned(16)));
got rid of those. Note though that this attribute doesn't work for
automatic variables.
ok, here is a distilled test of how i allocate and use the
instructions:
int main (int argc, char ** argv)
{
char scratch [128 + 15];
float f = 2.3;
int s = (int) scratch;
s &= 0xF;
if (s)
s = 16 - s;
float * d = (float *) (((char *) scratch) + s);
fprintf (stderr, "%p\n", d);
asm ("movss %0, %%xmm0" : : "m" (f));
asm ("shufps $0, %xmm0, %xmm0");
asm ("movaps %%xmm0, %0" : "=m" (d[0]));
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
}
you'll agree that the program should print "2.30 2.30 2.30 2.30".
it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
here's what the assembly block compiles to with "=m" (d):
#APP
movss -148(%ebp), %xmm0
shufps $0, %xmm0, %xmm0
movaps %xmm0, -156(%ebp)
#NO_APP
and here's with "=m" (d[0]):
#APP
movss -148(%ebp), %xmm0
shufps $0, %xmm0, %xmm0
#NO_APP
movl -156(%ebp),%eax
#APP
movaps %xmm0, (%eax)
#NO_APP
so saying "=m" (d) causes xmm0 to be written to &d, not d, as
intended. if &d isn't 128-bit aligned, it will segfault now.
even if it is, that's not where we wanted the numbers from xmm0
to go ...
tim