Tim Goetze wrote:
Simon Jenkins wrote:
I can definitely get
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].
[snip]
ok, here is a distilled test of how i allocate and use the
instructions:
int main (int argc, char ** argv)
{
char scratch [128 + 15];
float f = 2.3;
int s = (int) scratch;
s &= 0xF;
if (s)
s = 16 - s;
float * d = (float *) (((char *) scratch) + s);
fprintf (stderr, "%p\n", d);
asm ("movss %0, %%xmm0" : : "m" (f));
asm ("shufps $0, %xmm0, %xmm0");
asm ("movaps %%xmm0, %0" : "=m" (d[0]));
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
}
you'll agree that the program should print "2.30 2.30 2.30 2.30".
it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
here's what the assembly block compiles to with "=m" (d):
#APP
movss -148(%ebp), %xmm0
shufps $0, %xmm0, %xmm0
movaps %xmm0, -156(%ebp)
#NO_APP
and here's with "=m" (d[0]):
#APP
movss -148(%ebp), %xmm0
shufps $0, %xmm0, %xmm0
#NO_APP
movl -156(%ebp),%eax
#APP
movaps %xmm0, (%eax)
#NO_APP
so saying "=m" (d) causes xmm0 to be written to &d, not d, as
intended. if &d isn't 128-bit aligned, it will segfault now.
even if it is, that's not where we wanted the numbers from xmm0
to go ...
The discrepency here is because you originally said you were trying to
get the data into a named array of floats:
float t[4];
but it turns out you're actually trying to get them into some memory
to which you have a named pointer:
float *d;
Now, there are a great many circumstances in which you could treat
such names interchangeably, but this isn't one of them.
The following code demonstrates
asm ("movaps %%xmm0, %0" : "=m" (d));
working correctly if d is an aligned array of floats. Also, if
you change the d to d[0], it exhibits the optimization problem.
/* start */
float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };
int main (int argc, char ** argv)
{
float z = 1.1f;
float f = 2.3f;
z += 3.3f;
asm ("movss %0, %%xmm0" : : "m" (f));
asm ("shufps $0, %xmm0, %xmm0");
asm ("movaps %%xmm0, %0" : "=m" (d));
z += d[1];
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
printf ("z is %.2f\n", z );
}
/* end */
We're expecting (and we get):
2.30 2.30 2.30 2.30
z is 6.70
but using d[0] instead of d we end up getting:
2.30 2.30 2.30 2.30
z is 5.50
Simon Jenkins
(Bristol, UK)