On Mon, 2009-05-11 at 20:55 +0300, Jussi Laako wrote: > I did some testing on this in past when developing bunch on SSE > routines. The performance difference was around 2x. > What was 2x and compared to what? Unaligned SSE or exact cacheline match?