Hi all,
A couple of days ago Steve Harris posted results from a quick benchmark on
on the jackit-devel mailing list.
On Wed, 12 Feb 2003 23:57:02 +0000
Steve Harris <S.W.Harris(a)ecs.soton.ac.uk> wrote:
Just to follow up, I rebuilt my test code on gcc3, and
it went from C++
being around 30% slower, to about 1% faster, which is what I expected to
see in the first place. I guess they fixed the optimizer deficiency.
As a C fan I was rather curious about this. I didn't want people getting the
wrong impression that C++ is automatically faster than C (it isn't) or that in
the long term improvements in the C++ compiler will make it faster than C
(it won't). So I asked Steve for more details and he pointed me at the code:
http://plugin.org.uk/filter/
and suggested that further discussion of this issue should probably be moved
from the jackit-devel list to this one. So here we are. (Steve, please don't
take this as a slight on you, I simply want to get the facts right).
For those interested, my machine is a dual 450Mhz PIII, with SCSI disks,
2.4.20 kernel and running Debian Testing. All compilers and toolchain
programs are from Debian Testing; none have been compiled from source.
Debian is rather nice in that it allows more than one compiler to be
installed on a machine at anyone time. Here are the compilers I used for
my testing:
erikd@coltrane > gcc-2.95 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
erikd@coltrane > g++-2.95 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
erikd@coltrane > gcc-3.2 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/3.2.1/specs
Configured with: /mnt/data/gcc-3.1/gcc-3.2-3.2.1ds2/src/configure -v
--enable-languages=c,c++,java,f77,proto,objc,ada --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-gxx-include-dir=/usr/include/c++/3.2 --enable-shared
--with-system-zlib --enable-nls --without-included-gettext
--enable-java-gc=boehm --enable-objc-gc i386-linux
Thread model: posix
gcc version 3.2.1 20020924 (Debian prerelease)
erikd@coltrane > g++-3.2 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/3.2.1/specs
Configured with: /mnt/data/gcc-3.1/gcc-3.2-3.2.1ds2/src/configure -v
--enable-languages=c,c++,java,f77,proto,objc,ada --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-gxx-include-dir=/usr/include/c++/3.2 --enable-shared
--with-system-zlib --enable-nls --without-included-gettext
--enable-java-gc=boehm --enable-objc-gc i386-linux
Thread model: posix
gcc version 3.2.1 20020924 (Debian prerelease)
And here are the results running Steve's tests (neglecting the Objective
C tests because my system seems to be missing something):
Compiler Program Cycles
-------- ------- ------
gcc-2.95 ansic.c 85058
g++-2.95 cpp.C 90719
gcc-3.2 ansic.c 196581
g++-3.2 cpp.C 85042
Thats all a little strange. So lets look at the code. In cpp.C, Steve
defines a Lowpass class and its process member functions like this:
class LowPass {
private:
float ym1;
float a;
public:
void setA(float newa);
void reset();
float process(float x);
};
float LowPass::process(float x)
{
float y = ym1 * (1.0f - a) + x * a;
ym1 = y;
return y;
}
while the C version is defined like this:
typedef struct {
float ym1;
float a;
} lowpass;
float lowpass_process(lowpass *this, float x)
{
float y;
y = this->ym1 * (1.0f - this->a) + x * this->a;
this->ym1 = y;
return y;
}
So what is going on here? Well my guess is that both versions of g++ and the
older version of gcc are applying an optimisation that the new gcc isn't and
my guess is that the missing optimisation is function in-lining. So why was
in-lining present in gcc-2.95 and absent in gcc-3.2? Thats probably because
gcc-3.2 is working towards compliance with the C1999 ISO Standard. In C99,
inline is a new C keyword.
If I now change the lowpass_process function as follows:
inline float lowpass_process(lowpass *this, float x)
{
float y;
y = this->ym1 * (1.0f - this->a) + x * this->a;
this->ym1 = y;
return y;
}
and add -std=c99 to the gcc-3.2 command line, the new results are:
Compiler Program Cycles
-------- ------- ------
gcc-3.2 ansic.c 86299
g++-3.2 cpp.C 85042
So, g++-3.2 is still beating gcc-3.2. However, if g++ is in-lining one
function, its probably in-lining them all. Adding inline to all functions
in ansic.c gets these results:
Compiler Program Cycles
-------- ------- ------
gcc-3.2 ansic.c 85044
g++-3.2 cpp.C 85042
which is a 0.002 percent difference. My guess is that even this small
difference is purely in the start-up code which is completely irrelevant
when the speed that matters is in the processing loop.
This little exercise shows that there is more to benchmarking than just
getting results. Once you have results, you have to:
1) Compare the results with other results (ie gcc-3.2 and g++-32. vs
gcc-2.95 and g++-2.95).
2) Analyze the results and figure out why you get the results you get
(in this case, by compiling to assembler it would have obvious that
gcc-3.2 was not by-default in-lining functions while the others were).
I think that for applications like audio processing where speed is one of
the main goals benchmarking is extremely important. Personally I would
love to see more people do it properly and publish their results like I
did here:
http://www.mega-nerd.com/FPcast/
Results like these are repeatable and verifiable.
Cheers,
Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nospam(a)mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
When a user mailbombs me with 100,000 messages, we
call it denial of service and the guy can be thrown
in jail. When 100,000 SPAMMERS send me one mail each,
we call it marketing.