can anybody help out with gcc inline assembly syntax applied to sse
registers/memory locations?
for a simplified example, i'm using
float t[4];
...
asm ("movaps %%xmm1, %0" : : "m" (t[0]));
to move 4 packed floats from xmm1 into 't'.
my suspicion is that gcc concludes from the expression that only t[0]
has changed, so if i'm unlucky the optimizer ignores the contents of
t[1:]. it does the right thing right now, but i want to go sure this
is reliable under all conditions.
one possible solution seems to use struct/union instead of float[4]
(iirc that's the way fftw-3 does it) but i'm aesthetically inclined
towards the more direct float[] notation.
i've dug deep into the gcc info files, but they rarely touch the
issue (and are darn tedious to read). using gcc-3.x builtins is not an
option.
so, can anybody help me code this properly?
thanks,
tim