Am Donnerstag, 7. Februar 2008 19:42:46 schrieb Jens M Andreasen:
It did change (or will do?) with the Penryn Core 2:
The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86
register memory location and insert it into a field in the destination
register given by an immediate operand, EXTRACTPS and PEXTR read a field
from the source register and insert it into an x86 register or memory
location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax],
xmm1, 1 stores the first field of xmm1 in the address given by the first
field of xmm0.
http://en.wikipedia.org/wiki/SSE4
I didn't mean the hardware restriction, that is the limited power of current
SSE instructions. What I meant was the yet (IMO) incomplete implementation of
vector extensions in gcc.
I would expect to be able to access the cells of a vector in C(++) by pure
high level instructions and gcc to compile them to e.g. "workaround" SHUFPS
instruction sequences automatically. But there's no such (or was not such)
high level access support in gcc at all yet. As soon as you want to be able
to do things on element / cell level (e.g. also rotating the vector cells),
you would have to go low level and in this case there's no much sense in
using gcc's vector extensions at all.
CU
Christian