Sun Studio 12 vs. GCC3 vs. GCC4 Benchmarks
Phoronix: Sun Studio 12 vs. GCC3 vs. GCC4 Benchmarks
Earlier this month we published an article looking at the Linux versus OpenSolaris performance when using the new AMD Shanghai Opteron CPUs. Ubuntu Linux was faster than OpenSolaris 2008.11 in nearly all of the tests, but as mentioned in that article, OpenSolaris is still dependent upon GCC 3.4 where as Ubuntu and most other Linux distributions are now shipping with the newer and much-improved GCC 4 series. Following that article being published, Sun Microsystems had requested some compiler tests since they were confident the results would be different had their Sun Studio compiler been used. Well, in this article we now have some OpenSolaris benchmarks from the same AMD setup using GCC 3.4, GCC 4.0, and Sun Studio 12.
Not fair test, 4.0 gcc was not tuned in any way and is result of big merge of some new technologies (SSA) into mainline, AFAIK. It happened some *years* ago. Current version is 4.3.3 (!). Don't know anything about Sun's compiler, but I think Intel's icc is much more considered to be performance-king on Linux (at least thats what MySQL people said), so I would go with that compiler (or even better all 4: gcc, icc, LLVM and Sun). Anyway, criticizing is easier then doing something so I will stop here. ;-)
"gcc, icc, LLVM and Sun"
..this would be interesting indeed.
On the graphicsmagick.org frontpage it says it uses OpenMP, which was added in GCC 4.2. How does GCC >=4.2 compare with Sun Studio? Otherwise you are giving Sun the advantage.
Yeah, if you have a no-openmp versus openmp on a quadcore, that's not really fair.
Also the "timed compilation" isn't very interesting, or at least shouldn't be buried with the others.
According to this thread on opensolaris.org you can download GCC 4.3 from here:
It was built against Build 104, but should work on build 101b (Opensolaris 2008.11)
What were the flags used for Sun Studio btw? Those mp3 and ogg encoding results looks almost like SSE(2/3) wasn't enabled, or that Suns compiler didn't understand the SSE intrinsics used in the code.
(.. or there's just a bug in Sun Studio here)
I was thinking the same thing. It looks like LAME needs nasm to assemble it's asm sources. (libmp3lame/i386/*.nas). I don't know why gcc is fast, though. Maybe the nasm objects won't link with Sun-studio .o files? Maybe because of a -xarch=native/-xarch=native64 mixup?
Originally Posted by npcomplete
Some projects have their asm optimizations in GNU-extension asm() statements, which they'd have to disable to build with Sun's compiler. Or anything else that needs GNU C is not going to work with Sun's compiler.
I tried to find the benchmark results on http://global.phoronix-test-suite.com/?k=search_results by searching for "solaris" in the operating system field. I did find a 40 second LAME result labeled SunStudio_OpenSolaris: http://global.phoronix-test-suite.co...186-4598-27835
and this is probably from the first story: http://global.phoronix-test-suite.co...84-28573-17746
So if you do it right, you can get asm optimizations with Sun's Studio compiler on LAME. (Although that result says compiler: gcc 3.4.3, so maybe it was just a trial run trying to get the parameters right.) I didn't find anything really useful googling for "mp3 lame sun studio". Hmm, I did find something with "nasm lame sun studio": LAME 3.98.2 has a commit: "Disable MMX when using Sun Studio." Maybe that's because Studio optimizes the C to better MMX/SSE itself (probably only with -xvector=simd, unless that's on by default these days), or because something is broken. It's just a few lines added to the Makefile.
Unfortunately, the global.p-t-s.com results don't show compiler flags used or anything.
As others are saying: what compiler flags were used!! I have no idea what the results mean without seeing them. I don't even know which of the tests used multiple cores. That kind of matters, because if you have an 8 core machine, you usually plan on keeping at least some of the cores busy most of the time. So you can't just compile every program to bust out multiple threads, because what if you want to run multiple things at the same time?
I'm coming at this from an HPC cluster background, where we tended to have embarrasingly parallel workloads, so we'd use the same single-threaded program running on a hundred different input files. With grid engine, or just make -j 8 style parallelism. I guess a desktop would be different, and someone might conceivably buy a dual quad-core just so threaded apps could run fast, and not tend to have any number crunching jobs using up any CPUs most of the time.
BTW, flags you should use with Sun CC (unless these are outdated now): cc -fast -xarch=native64 -xvector=simd -xipo
-xarch=native64 Make a binary that doesn't waste time being compatible with anything but your machine (in 64bit mode).
-xipo : cross-file optimization by putting source analysis into .o files, so the optimizer can run at link time.
read the docs. You can use -xjobs=8 to let cc fork off worker jobs when it has a lot of work to do, e.g. at link time with -xipo, if I recall correctly.
It is my result. I was very surprised by result in LAME and I launched test on my machine (Intel Core 2 Quad Q8200 2.33GHz), but I built LAME with "-fast" option.
Originally Posted by llama
It is my result. I was surprised by first result and I launched LAME test on my machine (Intel Cole 2 Quad Q8200 2.33GHz), but I built LAME by Sun Studio Express with "-fast" option.
Originally Posted by llama