Simon Jenkins wrote:
for a
simplified example, i'm using
float t[4];
...
asm ("movaps %%xmm1, %0" : : "m" (t[0]));
to move 4 packed floats from xmm1 into 't'.
I couldn't get this to fail in practice - though I didn't
try all that hard - unless t isn't on a 16 byte boundary
in which case it segfaults.
it failed here just a minute ago, with g++ -O6. not a segfault, but
gcc seemed to think that some members of t are zero and omitted them
from the final summation in my code (r = t[0] + t[1] + t[2] + t[3]).
In theory however your code is telling the
compiler that
array element t[0] is in memory from which the instruction
reads. It should be more like:
asm ("movaps %%xmm1 %0" : "=m" (t) );
which now tells the compiler that the entire array t
is in memory to which the instruction writes. This
*ought* to discourage the optimiser from doing
anything too drastic. (Maybe/AFAIK/IANAL/etc).
you're right of course, 't' should be an input, not an output.
however,
asm ("movaps %%xmm1 %0" : "=m" (t));
segfaults, but
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
works. think i'll have to resort to 128 bit wide data types, a
simple cast should do. all this gcc inline asm stuff is ugly anyway,
and what's another cast among friends.
I can definitely get
asm ("movaps %%xmm1 %0" : "=m" (t[0]));
to exhibit the optimisation problem (the one I couldn't get your
original line to show) and then fix it again by removing the [0].
I was getting a segfault on about 50% of compiles, as I modified
the code, because the array was being aligned to 8 byte boundaries
but not to 16 bytes. Declaring it as
float t[4] __attribute__ ((aligned(16)));
got rid of those. Note though that this attribute doesn't work for
automatic variables.
Simon Jenkins
(Bristol, UK)