how to Improve performance with Sun Studio 12
Note: With Sun Studio 12, use the '-fast -xarch=sse3a -xopenmp=parallel' flag optimizations!
1. The OS 2008.11 Ogg library is not optimized for the FPU or features
of anything above the normal i386. You need to set the optimization flags
for SSE2/FPU features of Pentium4/AMD64 or higher.
2. The Lame MP3 and GnuPg library dependencies was not optimized (see ogg library info) properly using Sun Studio 12.
3. Blastwave.org provides the latest GCC 4.3.2 binaries for Solaris/OpenSolaris at:
Use Sun Studio 12 optimizations when using the Phoronix Test Suite. This will provide you with more realistic results on OpenSolaris performance!
Hopefully the gcc people will see it and improve on the less good aspects.
However, why produced graphs based on steps of 9, and 29? I find them much harder to read, even if the message is there and otherwise clear.
How do you produce the histograms? Surely there must be a way of having them in steps of 1,2,5,10,20,25,50,100, etc.
Otherwise, please more of this! Great stuff!
Now to get a benchmark of latest GCC vs latest Intel C++ on an Intel CPU
How is this a fair test?
SS12 was released in October 2008 (according to the article)
GCC 4.0.2 was released September 2005 (gcc.gnu.org)
GCC 3.4.3 was released November 2004 (gcc.gnu.org)
A lot has happened in the GCC 4.x branch. If you want to do a fair comparison, compare with GCC 4.3.2, released August 2008 and not some unsupported and obsolete version from ancient times.
(and no, I did not want to register and remember yet another account on some random website. Please support OpenID instead.)
I was thinking the same thing. It looks like LAME needs nasm to assemble it's asm sources. (libmp3lame/i386/*.nas). I don't know why gcc is fast, though. Maybe the nasm objects won't link with Sun-studio .o files? Maybe because of a -xarch=native/-xarch=native64 mixup?
Originally Posted by npcomplete
Some projects have their asm optimizations in GNU-extension asm() statements, which they'd have to disable to build with Sun's compiler. Or anything else that needs GNU C is not going to work with Sun's compiler.
I tried to find the benchmark results on http://global.phoronix-test-suite.com/?k=search_results by searching for "solaris" in the operating system field. I did find a 40 second LAME result labeled SunStudio_OpenSolaris: http://global.phoronix-test-suite.co...186-4598-27835
and this is probably from the first story: http://global.phoronix-test-suite.co...84-28573-17746
So if you do it right, you can get asm optimizations with Sun's Studio compiler on LAME. (Although that result says compiler: gcc 3.4.3, so maybe it was just a trial run trying to get the parameters right.) I didn't find anything really useful googling for "mp3 lame sun studio". Hmm, I did find something with "nasm lame sun studio": LAME 3.98.2 has a commit: "Disable MMX when using Sun Studio." Maybe that's because Studio optimizes the C to better MMX/SSE itself (probably only with -xvector=simd, unless that's on by default these days), or because something is broken. It's just a few lines added to the Makefile.
Unfortunately, the global.p-t-s.com results don't show compiler flags used or anything.
As others are saying: what compiler flags were used!! I have no idea what the results mean without seeing them. I don't even know which of the tests used multiple cores. That kind of matters, because if you have an 8 core machine, you usually plan on keeping at least some of the cores busy most of the time. So you can't just compile every program to bust out multiple threads, because what if you want to run multiple things at the same time?
I'm coming at this from an HPC cluster background, where we tended to have embarrasingly parallel workloads, so we'd use the same single-threaded program running on a hundred different input files. With grid engine, or just make -j 8 style parallelism. I guess a desktop would be different, and someone might conceivably buy a dual quad-core just so threaded apps could run fast, and not tend to have any number crunching jobs using up any CPUs most of the time.
BTW, flags you should use with Sun CC (unless these are outdated now): cc -fast -xarch=native64 -xvector=simd -xipo
-xarch=native64 Make a binary that doesn't waste time being compatible with anything but your machine (in 64bit mode).
-xipo : cross-file optimization by putting source analysis into .o files, so the optimizer can run at link time.
read the docs. You can use -xjobs=8 to let cc fork off worker jobs when it has a lot of work to do, e.g. at link time with -xipo, if I recall correctly.
It is my result. I was very surprised by result in LAME and I launched test on my machine (Intel Core 2 Quad Q8200 2.33GHz), but I built LAME with "-fast" option.
Originally Posted by llama
It is my result. I was surprised by first result and I launched LAME test on my machine (Intel Cole 2 Quad Q8200 2.33GHz), but I built LAME by Sun Studio Express with "-fast" option.
Originally Posted by llama
GCC 3.4.3 with default options - 43.49s
CXXFLAGS="-fast" - 40.71s
CXXFLAGS="-fast -xarch=native64 -xvector=simd -xipo" - 38.25s
Based on the correct compiler flags when using Sun Studio 12 on the tested AMD64 hardware (i.e. -fast -xarch=amd64a -xipo=2), we saw the
LAME test performance improve to beat the Ubuntu 8.10 scores (i.e. 38s-40s) mentioned in the article!
We believe that the OS-2008.11 and the newer OS 2008.11-b107 (i.e. which properly matches the Ubuntu 8.10 specs) and the use of Sun Studio 12 or Blastwave.org's GCC 4.3.3 port (see: http://blastwave.network.com/testing...386-CSW.pkg.gz) can match or beat all of the Ubuntu 8.10/9.10 benchmarks hands down.
I think this article is more fair than the earlier articles. But there are some complaints still.
For instance, why focus on compile time? If the resulting code is twice as slow but compiles 10 secs faster, is it good? UPDATE: see below.
Obviously it is difficult to do good benchmarks with compilers. Maybe SUN and GCC people should have given their input. But this is a better article I think. Thanks phoronix for listening and willing to try again!
UPDATE: As Ex-Cyber pointed it out, there is no focus on compile time. I take it back. In fact pointers on compile time can be important. I like this test better than the earlier ones.
Last edited by kebabbert; 02-25-2009 at 06:43 AM.