PDA

View Full Version : GCC performance testing request


birdie
07-08-2009, 08:43 AM
Hello, phoronix guys!

Can you PLEASE test these versions of GCC: 4.2.4, 4.3.4 and 4.4.2 using whatever benchmarks you like (the more the better).

I suggest running at least these benchmarks:

Multimedia
1) LAME compression/decompression (mp3)
2) FLAC compression/decompression
3) VORBIS compression/decompression (ogg)
4) x264 compression/decompression (H264)

Web
1) Firefox page rendering time/miscellaneous JS benchmarks
2) Chrome
3) Aurora (rebuild Qt as it's based on Qt's WebKit component)

Compression
1) GZIP/BZIP2/7-ZIP compression (of big enough TAR archive)
2) If you manage to compile e.g. GCC 4.2.4 using all these compilers (which is difficult, since GCC rebuilds itself as a part of a compilation process), then please test compilation times of e.g. heavy C code (say entire kernel 2.6.30 tree) and some heavy C++ code (say Qt library)

Games
1) Quake 2 performance when running in software rendering mode
2) Quake 3 performance using software GL mode (say, running in VESA X.org server), MesaGL library should be rebuild using different compilers, of course

Rendering
1) Blender rendering time
2) PovRay rendering time

Please try to avoid testing IO bound applications since those results will be meaningless.

Thank you!

birdie
07-08-2009, 08:48 AM
And if you have enough time, please, test GCC 3.4.6 and ICC 11 (in this case your CPU must be Intel Core 2 or higher, as Intel compiler is known to pessimize code for AMD CPUs).

birdie
07-17-2009, 01:23 AM
If you haven't yet started testing, please, use the same CFLAGS/CXXFLAGS across all compiled applications.

I suggest using this set:

-O2 -march=native -pipe (for GCC >= 4.2)

and

-O2 -march=pentium4 -pipe (for GCC 3.4.6 if you are going to use it).

all other flags can be detrimental or bring unnecessary noise to results.

birdie
07-31-2009, 05:40 AM
Are you going to run these tests?

StringCheesian
10-20-2009, 02:37 PM
Probably he'd want to publish benchmarks like that in response to some news. Like when gcc 4.5 is released.

The next time something happens in the world of compilers, bring it to his attention and say it calls for benchmarks like the ones you want. It could work.

Ant P.
10-20-2009, 04:48 PM
It'd be nice to see a benchmark with and without the -omg-rice CFLAGS, just to see how much difference it makes compared to Ubuntu's i686 binaries. Total size of /usr/{lib,bin} would be interesting to see too.

I've noticed a few people saying the new ones in GCC4.4 actually make a difference.

svrocket
10-21-2009, 11:55 PM
if u really want to see something dramatic, run your own bake-off between gcc 2.95, 3.2, and 4.x.

yes, even gcc gets more strick, more bloated, and slower over time. All software suffers from Creeping Featureitus.

birdie
11-13-2009, 01:44 AM
You could at least answer to this thread.

"No" will qualify just fine.

bnolsen
11-13-2009, 04:16 AM
And if you have enough time, please, test GCC 3.4.6 and ICC 11 (in this case your CPU must be Intel Core 2 or higher, as Intel compiler is known to pessimize code for AMD CPUs).

Seems like you just gave every reasson needed to not include icc in this specific test since it then would fail as a good general compiler. Writing a sepaprate article about icc amd compatibility itself might be interesting.

bnolsen
11-13-2009, 04:18 AM
If you haven't yet started testing, please, use the same CFLAGS/CXXFLAGS across all compiled applications.

I suggest using this set:

-O2 -march=native -pipe (for GCC >= 4.2)

and

-O2 -march=pentium4 -pipe (for GCC 3.4.6 if you are going to use it).

all other flags can be detrimental or bring unnecessary noise to results.

probably nocona for 64bit. At this point 32bit testing should be nothing more than a sideshow.

Thev00d00
12-06-2009, 06:27 PM
We all love ricer flags, ask any Gentoo user about how much faster they make your system. =O

But seriously, I would be interested in the difference the Graphite framework makes in the newer (4.4+) GCCs with multicore systems.

Smorg
12-09-2009, 06:34 AM
Currently on my i7 920 box with gcc-4.4.2

CFLAGS="-march=native -O3 -msse4 -mmmx -floop-interchange -floop-strip-mine -floop-block -pipe"
CXXFLAGS="${CFLAGS}"

With the big 8mb shared L3 cache I think the O3 is probably beneficial more often than not. The last benchmark I read using core2s showed O2 and O3 to be about even with O3 maybe pulling a bit ahead.

hmmm
12-09-2009, 06:53 AM
I thought I came across intel doing some gcc compile testing in the latest linux kernel podcast - heres the link http://patchwork.kernel.org/patch/63078/

birdie
12-12-2009, 01:02 AM
I thought I came across intel doing some gcc compile testing in the latest linux kernel podcast - heres the link http://patchwork.kernel.org/patch/63078/

The net result doesn't seem obvious to me :)

The first post clearly shows that -O2 is a winner over -Os, but later the only test won for -O2 kernel and it seemed like given more time and diligence, kernel built with -Os would equal to -O2. :)

However I digress. I still want someone to run Phoronix tests with these three compilers.

sabriah
12-14-2009, 06:14 AM
Can you PLEASE test these versions of GCC: 4.2.4, 4.3.4 and 4.4.2 using whatever benchmarks you like (the more the better).


There was a gcc comparison earlier this year http://www.phoronix.com/scan.php?page=article&item=sun_studio_gcc&num=1


As for the compiler flag choices, check out ACOVEA and http://www.phoronix.com/forums/showpost.php?p=38044&postcount=3


An article by Dunlop and others from 2008, "On the Use of a Genetic Algorithm in High Performance Computer Benchmark Tuning", concluded:

This paper has addressed the issue of extracting the best adapted parameters for the HPL reference benchmark. Adjustment
of the seventeen tuning parameters to achieve maximum performance is a time-consuming task that must be performed
by hand. The use of a genetic algorithm is proposed here to manage this task with individuals corresponding to an
HPL run. Indeed we do not provide here a description of a particular version of a GA. The Acovea framework has been
used to validate the approach over a Beowulf cluster composed of heterogeneous resources: a majority of so-called
“small” nodes and two “large” nodes. In particular, starting from a hand-tuned performance of 84 Gflops, it was possible
to attain the peak performance of 111.6 Gflops on the cluster using a set of parameters determined nearly automatically by
Acovea.

Cheers