On Thu, 1 Jul 2004 15:15:22 -0700 (PDT)
Dan Hollis <goemon(a)sasami.anime.net> wrote:
On Fri, 2 Jul 2004, Erik de Castro Lopo wrote:
On Thu, 01 Jul 2004 18:18:41 +0200
Benno Senoner <sbenno(a)gardena.net> wrote:
Eric what do you think ? can something like that
be coded efficiently
using SSE/SSE2 ?
Probably not. There are some algorithms which simply can't be
vectorized.
Would an opteron be any better at it?
No, its not an issue of the CPU. The problem is that SSE and SSE2 on
the x86 extended family and Altivec on the PowerPC all require that
the algorithm can be refactored to allow blocks of data to be
processed at time. However, the refactoring can in some cases slow
the algorithm down to an extent that it ends up slower on the vector
processor than on the standard FPU.
Here's an example (where a, b anc c are arrays of floats) :
for (k = 0 ; k < 128 ; k++)
a [k] = b [k] * c [k] ;
which can be written as:
for (k = 0 ; k < 128 / 4 ; k+ = 4)
{ a [k] = b [k] * c [k] ;
a [k + 1] = b [k + 1] * c [k + 1] ;
a [k + 2] = b [k + 2] * c [k + 2] ;
a [k + 3] = b [k + 3] * c [k + 3] ;
}
This can then be vectorized so that four entries in the b array can
be added to four entries in the c array in a single instruction and
then assigned to four entries in the a array in another single
instruction. like:
for (k = 0 ; k < 128 / 4 ; k+ = 4)
{ load_four_b_values () ; /* single instruction. */
load_four_c_values () ; /* single instruction. */
add_four_b_to_four_c_vals () ; /* single instruction. */
store_4_a_values () ; /* single instruction. */
}
The above should run at close to four times the speed of the original.
Now look at a new example:
a [0] = b [0] * c [0];
for (k = 1 ; k < 128 ; k++)
a [k] = a [k - 1] + b [k] * c [k] ;
The problem here is that each a [k] depends on the value of a [k - 1].
Refactoring this to be vectorized is likely to require so much
shuffling of data that it ends up slower than the non-vectorized
version.
Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nospam(a)mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"This is like creating laws against blasphemy and then complaining that
unbelievers can't come up with any logical argument against the existence
of God" --
www.infoanarchy.org on the Digital Millenium Copyright Act