Hi!
This was indeed interesting, I set out to prove that floating point would be
the better choice from a performance perspective, but I can't do that,
atleast not judging from the data from my systems!
I've only tested with GCC, a few test with some different parameters. My
conclusion is that floating point performance is extremely sensitive to
architecture and the optimizations you enable for the architecture.
Integer on the otherhand behaves almost completely predictably.
I've not looked at the code and what it _actually_ does, I trust that it was
conceived in the best interest to compare the different methods.
I experimented with enabling SSE and SSE2 to see what happened... alot
happened it seems...interestingly it helped in only one case, for the athlon
box(!?!)...
Anyway here's the figures:
###### P4 1.5 w/ -march=i686
time ./fixp 2.56user
time ./fixpsse 2.57user
time ./fixpsse2 2.83user
time ./float 4.86user
time ./floatsse 5.33user
time ./floatsse2 5.31user
time ./fistl 4.91user
time ./fistlsse 10.09user
time ./fistlsse2 10.09user
time ./lrintf 3.73user
time ./lrintfsse 8.01user
time ./lrintfsse2 8.00user
###### Athlon XP 1700+ w/ -march=i686
time ./fixp 1.87user
time ./fixpsse 1.87user
time ./fixpsse2 2.58user
time ./float 6.29user
time ./floatsse 3.56user
time ./floatsse2 3.58user
time ./fistl 3.29user
time ./fistlsse 4.05user
time ./fistlsse2 4.04user
time ./lrintf 2.19user
time ./lrintfsse 3.07user
time ./lrintfsse2 3.09user
##### Athlon XP 1700+ w/ -march=athlon-xp
time ./fixp 1.72user
time ./fixpsse 1.72user
time ./fixpsse2 1.72user
time ./float 3.99user
time ./floatsse 3.99user
time ./floatsse2 3.99user
time ./fistl 4.50user
time ./fistlsse 4.48user
time ./fistlsse2 4.49user
time ./lrintf 3.75user
time ./lrintfsse 3.75user
time ./lrintfsse2 3.75user
(sse setting is probably ignored if arch=athlon-x)
Comments? Improvements?
/Robert
Friday 17 October 2003 09.55 skrev Robert Jonsson:
Ohh, this is very interesting... Thanks Juan for the
code snippets!
I did some short tests with extremely varying and surprising results.
But... I got no time right now, will proceed with more tests tonight.
As much as don't like it, I have to support the claim that integer is
faster as of now...
But I'll do my best to prove otherwise tonight. I'll post my results here.
/Robert
Friday 17 October 2003 01.09 skrev Juan Linietsky:
> Benno Sennoner and I were discussing today on IRC about
> the usual fixed point vs floating point (regarding to some resampling
> code) We developed some tests and ran them on a variety of computers.
> It would be interesting if ladders here could run them on different
> computers (and specially non x86, like amd64. or the Gx processors) so we
> can see what performance can we expect on each , and how things
> seem to be shaping for the future. It will also be a key factor
> on how our projects will develop in the future.
>
> The code is available at:
>
http://reduz.dyndns.org/resamp_fixp.c // fixed point version
>
http://reduz.dyndns.org/resamp_float.c // floating point version,
> portable
http://reduz.dyndns.org/resamp_float_fistl.c // X86 VERSION
> ONLY!! Uses fistl instruction
>
> The results dont mean just int vs float performance. They
> also test for float->int conversion, which is common
> in most algorithms that work with buffers. It is a linearly
> interpolated resampling with volume control.
>
> please use GCC options -
> -O3 -ffast-math -march=<yourcpu> so it's a fair comparison
> results from other compilers is also appreciated!
>
> Here's some results for reference
> ****************************************************************
>
> vendor_id : AuthenticAMD
> model name : AMD-K6(tm) 3D processor
> cpu MHz : 412.508
> cache size : 64 KB
> flags : fpu vme de pse tsc msr mce cx8 pge mmx syscall 3dnow
> k6_mtrr bogomips : 822.47
>
> resamp_fixp - 0m8.460s
> resamp_float_fistl - 0m27.390s
>
> ****************************************************************
> vendor_id : AuthenticAMD
> model name : AMD Duron(tm) Processor
> cpu MHz : 951.701
> cache size : 64 KB
> flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca
> cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
> bogomips : 1900.54
>
> resamp_float - 0m11.180s
> resamp_float_fistl - 0m5.810s
> resamp_fixp_optimized - 0m2.790s
>
> ************************************************
> Benno gave me some results:
>
> Intel p4 1800 celeron
>
> resamp_float_fistl - 4.00user
> resamp_fixp - 4.51user
> (float faster?)
>
> --------------------------------
>
> VIA nehemiah 1GHz
> resamp_float_fistl - 0m21.079s
> resamp_fixp - 0m7.129s
>
> *************************
>
> Some results on SPARC:
>
> 2x 125MHz HyperSPARC
>
> resamp_fixp - user 2m59.010s
> resamp_float - user 0m44.290s
>
> (float faster! :)
> ****************************
>
> Intel 2.4 GHZ P4:
>
> resamp_float - 0m3.440s
> resamp_float_fistl - 0m2.960s
> resamp_fixp - 0m1.450s
> ************************
>
> 1.25GHz G4 (laptop)
>
> resamp_float - 0m11.170s
> resamp_fixp - 0m2.130s
>
>
>
> -----------------------------------
>
> Conclusions SO FAR.
>
> My own conclusions about the subject is that the float -> int conversion
> is STILL the biggest bottleneck on most common architectures. And until
> this is sorted out, fixed point is still the best solution for some
> specific cases, and I dont see any problem mixing it with floating point
> code. If you look at that algorithm closely in the source, you could
> replace "counter/ increment"
> for purely fixed point values, and do the rest (managing the samples) in
> float. This will undoubly speed it up..
>
>
>
> Cheers!
> Juan Linietsky