Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: GCC vs. LLVM/Clang On The AMD Richland APU

  1. #1
    Join Date
    Jan 2007
    Posts
    14,834

    Default GCC vs. LLVM/Clang On The AMD Richland APU

    Phoronix: GCC vs. LLVM/Clang On The AMD Richland APU

    Along with benchmarking the AMD A10-6800K "Richland" APU on Linux and its Radeon HD 8670D graphics, I provided some GCC compiler tuning benchmarks for this AMD APU with Piledriver cores. The latest Linux testing from the A10-6800K is a comparison of GCC 4.8.1 to LLVM/Clang 3.3 on this latest-generation AMD low-power system.

    http://www.phoronix.com/vr.php?view=18877

  2. #2
    Join Date
    Dec 2012
    Posts
    97

    Default

    I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.

  3. #3
    Join Date
    Oct 2012
    Location
    Washington State
    Posts
    462

    Default

    The gap is only going to widen with LLVM/Clang 3.4 pushing ahead in large areas of performance and scalability.

  4. #4
    Join Date
    Jul 2011
    Posts
    73

    Default

    And there I thought, the language had gotten better…

    “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.”

    ugh… you couldn’t have found a better way to say that those tests are completely useless?

  5. #5
    Join Date
    Jul 2011
    Posts
    73

    Default

    Quote Originally Posted by MWisBest View Post
    I'm actually fairly impressed by these results. LLVM/Clang is really kicking ass; it had numerous huge wins and just about any time it was behind (minus where OpenMP plays a huge factor) it wasn't by much.
    Please have a look at

    * Timed MAFFT alignment,
    * BLAKE2,
    * Botan MAC,
    * Himeno and
    * C-Ray (please call this one “not by much” again…)

    LLVM seems to be very fast at matrix multiplication, though.

    So my summary of the results would be:

    * LLVM is great at Successive Jacobi Relaxation.
    * GCC is great at C-Ray.
    * LLVM has no OpenMP support, so don’t even try to use it for scientific code, except if you want to go all the way and use explicit MPI (which makes the SciMark test somewhat less useful).

  6. #6
    Join Date
    Oct 2009
    Posts
    845

    Default

    Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

    If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

    As such the Botan benchmarks are pointless in this context.

    This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.

  7. #7
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,111

    Default

    Quote Originally Posted by ArneBab View Post
    “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.”

    ugh… you couldn’t have found a better way to say that those tests are completely useless?
    "Those tests are useless"

    now switch perspective to someone who needs OpenMP

    "That compiler is useless"

    Funny, isn't it.

  8. #8
    Join Date
    Nov 2012
    Posts
    169

    Default

    Quote Originally Posted by XorEaxEax View Post
    Well there were some impressive results from clang-llvm here, that said the Botan tests were absolutely pointless. Comparing two compilers against eachother at -O2 (or lower) means nothing, there's no 'standard' between compilers on which optimizations should be added at the O2 level.

    If Clang/LLVM or GCC add more optimizations at -O2 than the other, it will win at that level, but that says nothing about their relative performance when they are set to generate the fastest code they can, which is at -O3.

    As such the Botan benchmarks are pointless in this context.

    This is why, if you are measuring performance of the generated code, you default to -O3 which is the setting in which the compilers strive to generate the _fastest_ code which is after all what is benchmarked here. This has been stated over and over so I can't help but wonder if Michael is deliberately using these flawed settings in order to sway results to his liking.
    -O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.

  9. #9
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by carewolf View Post
    -O3 does not necessarily generate the fastest code. It enables the most optimization but is intended for smaller segments of code and inner loops. If used for entire applications it may cause slowdown due to a larger memory footprint and more cache misses.
    Yes, sometimes -O2 actually beats -O3, but that is because the optimizer sometimes fails in it's job of accurately weighing things like increased cache use against the improved performance of a larger code segment (through inlining, unrolling etc), also -O3 is not specifically indended for 'smaller segments of code', the compiler heuristics typically does a good job of deciding which code benefits from unrolling and inlining, and which codepath's are hot and cold, just because an optimization is enabled it doesn't mean that it will end up used on all segments of code, so yes, you can use -O3 on entire applications just fine, and most cpu intense ones default to -O3 in their configurations.

    Of course if you want to give the compiler the best help, you can always use profile guided optimization where you let the compiler gather runtime data which it can then use to better optimize the code.

    But despite the fact that -O2 beats -O3 due to failed compiler heuristics, if you only test ONE optimization level then of course it must be -O3, again there is no 'standard' on compiler optimizations enabled per 'level' between compilers. The ONLY standard is that -O3 is supposed to generate the _fastest_ code.

    So unless you know beforehand that -O2 in a particular test generates the fastest code for BOTH compilers on a particular benchmark, using -O2 means nothing in a benchmark where you want to see which compiler generates the _fastest_ code, as that is what -O3 is supposed to do and also does in the vast majority of cases.

  10. #10
    Join Date
    Jul 2011
    Posts
    73

    Default

    Quote Originally Posted by curaga View Post
    "Those tests are useless"

    now switch perspective to someone who needs OpenMP

    "That compiler is useless"
    Actually that’s what I’m talking about: The tests are useless, because their result is useless. If you need OpenMP, you don’t need to look at the results. The compiler is not for you. And if you don’t need OpenMP you don’t need the results either: They have no meaning for you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •