Results 1 to 6 of 6

Thread: AMD Llano Compiler Performance

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Jan 2007

    Default AMD Llano Compiler Performance

    Phoronix: AMD Llano Compiler Performance

    Last week were a set of AMD Fusion A8-3850 Linux benchmarks on Phoronix, but for you this week is a look at the AMD Fusion "Llano" APU performance when trying out a few different compilers. In particular, the latest GCC release and then using the highly promising Clang compiler on LLVM, the Low-Level Virtual Machine.

  2. #2
    Join Date
    Aug 2009
    Russe, Bulgaria


    Nice benchmarks. Michael there are many programmers in your forum, so benchmarks like this are very useful. Not just the compilers, but we can answer question:"What CPU i should buy, that will bring me max compiler performance for the buck."

  3. #3
    Join Date
    Oct 2007
    Under the bridge

    Default Obviously kidding, but...

    LLVM is used by Mono? Well, I guess I'll have to remove it from my system. I wouldn't use any applications infected by that.

    You probably shouldn't, either.

  4. #4
    Join Date
    Jun 2010
    ฿ 16LDJ6Hrd1oN3nCoFL7BypHSEYL84ca1JR


    As usual, each test case was using its stock compiler flags during the build process for each of the tested compilers.
    That's not even -O2, right?

  5. #5
    Join Date
    Dec 2007
    Edinburgh, Scotland


    Quote Originally Posted by ChrisXY View Post
    That's not even -O2, right?
    It depends on the software and what's in the make files, most software uses -O2, libav / ffmpeg and now Firefox use -O3 I believe

  6. #6

    Default Compiler flags matter

    I'd like to note that the Phoronix habit of using some compiler flags, without caring much which flags, can be seen e.g. on the PovRay numbers.
    povray-3.6.1, being a 2004ish package, has this jam in configure:
    k8-*|x86_64-*) pov_arch="k8"; pov_arch_fallback="i686";;
    This means, if the compiler accepts -march=k8 -mtune=k8 (-O3 -msse2 and a couple of other options), that will be used to compile it, instead of tuning for the CPU you are compiling on, or at least tuning for contemporary CPUs.
    Looking at speed of program optimized for a completely different CPU than you are using is uninteresting, either you tune for contemporary CPUs (as most distributions do and several compilers even default to), or optimize for your own CPU.
    Looking at povray 3.7.0 rc3, this has changed there quite a bit (though it is still at least two years behind on CPUs and features it wants to use).
    E.g. gcc is by default configured to tune for -mtune=generic, which is tuning for recentish Intel and AMD CPUs, but also supports -march=native/-mtune=native and/or -Ofast options which tune for the CPU running the compiler.
    I don't have a Llano CPU, so I couldn't repeat the measurements there, but have run (just single time each, just to show that the compiler flags really matter) it on an Intel i7-2600 CPU:

    gcc 4.6.0 20110603 (Red Hat 4.6.0-10) -O3 -march=k8 -mtune=k8
    Total Time: 0 hours 11 minutes 30 seconds (690 seconds)
    gcc 4.6.0 20110603 (Red Hat 4.6.0-10) -O3 -march=x86-64 -mtune=generic
    Total Time: 0 hours 8 minutes 24 seconds (504 seconds)
    gcc 4.6.0 20110603 (Red Hat 4.6.0-10) -O3 -march=corei7 -mtune=corei7
    Total Time: 0 hours 8 minutes 10 seconds (490 seconds)
    gcc 4.6.0 20110603 (Red Hat 4.6.0-10) -O3 -march=corei7-avx -mtune=corei7-avx
    Total Time: 0 hours 8 minutes 10 seconds (490 seconds)
    gcc 4.7.0 20110822 (experimental) -O3 -march=corei7-avx -mtune=corei7-avx
    Total Time: 0 hours 7 minutes 57 seconds (477 seconds)
    clang 2.9 (tags/RELEASE_29/final) -O3 -march=k8 -mtune=k8
    Total Time: 0 hours 9 minutes 28 seconds (568 seconds)
    clang 2.9 (tags/RELEASE_29/final) -O3 -march=corei7 -mtune=corei7
    Total Time: 0 hours 9 minutes 32 seconds (572 seconds)
    clang 2.9 (tags/RELEASE_29/final) -O3 -march=corei7 -mtune=corei7 -mavx
    Compiler Crash

    As can be seen, yes, gcc k8 tuned code on Intel SandyBridge is significantly slower than clang k8 tuned code, but all other tunings, even just tuning for a generic CPU, is faster, some significantly. From what I've seen, similar hardcoded options (some time ago I saw even -O option being used in one of the phoronix benchmarks, haven't rechecked if it has been fixed since then or not, I think it was the byte benchmark) exist in many other phoronix benchmarks. E.g. -O1 is (at least for GCC) defined to do only some cheap optimizations, with stress on fast compilation and not making code much hard to debug.

    Until this is changed, I think the compiler benchmarks aren't really useful at all.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts