[linux-audio-dev] [i686] xmm regs + gcc inline assembly

Simon Jenkins sjenkins at blueyonder.co.uk
Fri Feb 13 21:12:53 UTC 2004


Tim Goetze wrote:

>Simon Jenkins wrote:
>
>>I can definitely get
>>
>>   asm ("movaps %%xmm1 %0" : "=m" (t[0]));
>>
>>to exhibit the optimisation problem (the one I couldn't get your
>>original line to show) and then fix it again by removing the [0].
>>
>>[snip]
>>
>
>ok, here is a distilled test of how i allocate and use the
>instructions:
>
>int main (int argc, char ** argv)
>{
>  char scratch [128 + 15];
>  float f = 2.3;
>
>  int s = (int) scratch;
>  s &= 0xF;
>  if (s)
>    s = 16 - s;
>  float * d = (float *) (((char *) scratch) + s);
>  fprintf (stderr, "%p\n", d);
>
>  asm ("movss %0, %%xmm0" : : "m" (f));
>  asm ("shufps $0, %xmm0, %xmm0");
>  asm ("movaps %%xmm0, %0" : "=m" (d[0]));
>
>  printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
>}
>
>you'll agree that the program should print "2.30 2.30 2.30 2.30".
>it does if you use "=m" (d[0]). if you say "=m" (d), it doesn't.
>
>here's what the assembly block compiles to with "=m" (d):
>
>#APP
>  movss -148(%ebp), %xmm0
>  shufps $0, %xmm0, %xmm0
>  movaps %xmm0, -156(%ebp)
>#NO_APP
>
>and here's with "=m" (d[0]):
>
>#APP
>  movss -148(%ebp), %xmm0
>  shufps $0, %xmm0, %xmm0
>#NO_APP
>  movl -156(%ebp),%eax
>#APP
>  movaps %xmm0, (%eax)
>#NO_APP
>
>so saying "=m" (d) causes xmm0 to be written to &d, not d, as
>intended. if &d isn't 128-bit aligned, it will segfault now.
>even if it is, that's not where we wanted the numbers from xmm0
>to go ...
>
The discrepency here is because you originally said you were trying to
get the data into a named array of floats:

    float t[4];

but it turns out you're actually trying to get them into some memory
to which you have a named pointer:

  float *d;

Now, there are a great many circumstances in which you could treat
such names interchangeably, but this isn't one of them.

The following code demonstrates

  asm ("movaps %%xmm0, %0" : "=m" (d));

working correctly if d is an aligned array of floats. Also, if
you change the d to d[0], it exhibits the optimization problem.


/* start */

float d[4] __attribute__ ((aligned(16))) = { 1.1f, 1.1f, 1.1f, 1.1f };

int main (int argc, char ** argv)
{
  float z = 1.1f;
  float f = 2.3f;

  z += 3.3f;

  asm ("movss %0, %%xmm0" : : "m" (f));
  asm ("shufps $0, %xmm0, %xmm0");
  asm ("movaps %%xmm0, %0" : "=m" (d));

  z += d[1];

  printf ("%.2f %.2f %.2f %.2f\n", d[0], d[1], d[2], d[3]);
  printf ("z is %.2f\n", z ); 
} 
/* end */


We're expecting (and we get):

2.30 2.30 2.30 2.30
z is 6.70

but using d[0] instead of d we end up getting:

2.30 2.30 2.30 2.30
z is 5.50

Simon Jenkins
(Bristol, UK)





More information about the Linux-audio-dev mailing list