[linux-audio-dev] [i686] xmm regs + gcc inline assembly
Simon Jenkins
sjenkins at blueyonder.co.uk
Fri Feb 13 21:12:53 UTC 2004
Tim Goetze wrote:
>Simon Jenkins wrote:
>
>>I can definitely get
>>
>> asm ("movaps %%xmm1 %0" : "=m" (t[0]));
>>
>>to exhibit the optimisation problem (the one I couldn't get your
>>original line to show) and then fix it again by removing the [0].
>>
>>[snip]
>>
>
>ok, here is a distilled test of how i allocate and use the
>instructions:
>
>int main (int argc, char ** argv)
>{
> char scratch [128 + 15];
> float f = 2.3;
>
> int s = (int) scratch;
> s &= 0xF;
> if (s)
> s = 16 - s;
> float * d = (float *) (((char *) scratch) + s);
> fprintf (stderr, "%p\n", d);
>
> asm ("movss %0, %%xmm0" : : "m" (f));
> asm ("shufps $0, %xmm0, %xmm0");
> asm ("movaps %%xmm0, %0" : "=m" (d[0]));
>
> printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
>}
>
>you'll agree that the program should print "2.30 2.30 2.30 2.30".
>it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
>
>here's what the assembly block compiles to with "=m" (d):
>
>#APP
> movss -148(%ebp), %xmm0
> shufps $0, %xmm0, %xmm0
> movaps %xmm0, -156(%ebp)
>#NO_APP
>
>and here's with "=m" (d[0]):
>
>#APP
> movss -148(%ebp), %xmm0
> shufps $0, %xmm0, %xmm0
>#NO_APP
> movl -156(%ebp),%eax
>#APP
> movaps %xmm0, (%eax)
>#NO_APP
>
>so saying "=m" (d) causes xmm0 to be written to &d, not d, as
>intended. if &d isn't 128-bit aligned, it will segfault now.
>even if it is, that's not where we wanted the numbers from xmm0
>to go ...
>
The discrepency here is because you originally said you were trying to
get the data into a named array of floats:
float t[4];
but it turns out you're actually trying to get them into some memory
to which you have a named pointer:
float *d;
Now, there are a great many circumstances in which you could treat
such names interchangeably, but this isn't one of them.
The following code demonstrates
asm ("movaps %%xmm0, %0" : "=m" (d));
working correctly if d is an aligned array of floats. Also, if
you change the d to d[0], it exhibits the optimization problem.
/* start */
float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };
int main (int argc, char ** argv)
{
float z = 1.1f;
float f = 2.3f;
z += 3.3f;
asm ("movss %0, %%xmm0" : : "m" (f));
asm ("shufps $0, %xmm0, %xmm0");
asm ("movaps %%xmm0, %0" : "=m" (d));
z += d[1];
printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
printf ("z is %.2f\n", z );
}
/* end */
We're expecting (and we get):
2.30 2.30 2.30 2.30
z is 6.70
but using d[0] instead of d we end up getting:
2.30 2.30 2.30 2.30
z is 5.50
Simon Jenkins
(Bristol, UK)
More information about the Linux-audio-dev
mailing list