Results 1 to 9 of 9

Thread: AMD Bulldozer "bdver1" Compiler Performance

  1. #1
    Join Date
    Jan 2007
    Posts
    14,787

    Default AMD Bulldozer "bdver1" Compiler Performance

    Phoronix: AMD Bulldozer "bdver1" Compiler Performance

    It is time for another round of compiler benchmarks on AMD's latest FX-8150 Bulldozer processor. In this article is comparing the GCC 4.6.1, GCC 4.7 development, Open64 4.2.4, and AMD Open64 4.2.5.2 compilers in their stock configuration, when the binaries are built again but with the march/mtune flags set to "bdver1", and a third when being built with the "bdver1" architecture and tuning flags along with "-Ofast" for the highest-level of compiler optimizations.

    http://www.phoronix.com/vr.php?view=16688

  2. #2
    Join Date
    Oct 2009
    Posts
    845

    Default

    Again, these types of benchmarks just makes me wonder if Micheal knows anything of compiler flags.

    We see HUGE differences between stock/bdver1+Ofast and bdver1 on some tests, and I'm thinking that Micheal just can't be so stupid that 'bdver1' means he is ONLY setting "-march=bdver1" and not setting an -O flag. But I fear that this is exactly what's happening on those tests.

    Stock settings I assume are the CFLAGS/CXXFLAGS which the test packages ship with, which likely are -O2 or -O3. Setting -Ofast on GCC equals '-O3 -ffast-math'. When NOT setting a -O flag on GCC, it defaults to NO optimization (-O0).

    So in the tests where (I assume) Micheal only sets CFLAGS to '-march=bdver1' the optimization is set to default -O0 on GCC, which is NO optimization, making that benchmark totally pointless. '-march=' only tells the compiler which cpu target to optimize for, -O is what actually turns on optimization at different levels of strenght.

    Meanwhile I'm glad that he is now venturing outside the stock-settings when benchmarking which is good since stock settings (as in tarball from author) can be tuned to a particular compiler or set to being very low (even -O0) with the author(s) expecting the end user/packager to set appropriate optimization levels according to their needs.

    To make the Phoronix tests atleast somewhat reflect the capacity of the compilers then atleast -O2 and -O3 should be tested for each package. For tests relying on floating point math I would say -ffast-math should be warranted aswell.

  3. #3

    Default

    Michael needs to spend some quality time with Gentoo to learn which CFLAGS actually matter.

    On that note, the results would be semi-useful if he benchmarked with -O2 -march=native. Most Gentoo users tend to use -O2 -march=native. That makes GCC pick what it thinks is best and GCC does a good job of it. For benchmarking purposes, it would be better to figure out the flags that -march=native sets and then set those manually. That way people don't need to guess what GCC was thinking on the CPU. He can do that by running "gcc -Q -v -march=native -O2 --help=target", which will usually output the real flags that GCC uses.
    Last edited by Shining Arcanine; 11-15-2011 at 05:45 AM.

  4. #4
    Join Date
    Dec 2007
    Location
    Edinburgh, Scotland
    Posts
    577

    Default

    I wonder if the Gcrypt tests failed due to --fast-math

    Any one on here with Gentoo and PTS installed that can repeat these tests with proper compiler flags set?

    Any idea why Open64 5 wasn't tested?

  5. #5
    Join Date
    Jun 2010
    Location
    ฿ 16LDJ6Hrd1oN3nCoFL7BypHSEYL84ca1JR
    Posts
    1,052

    Default

    Quote Originally Posted by XorEaxEax View Post
    So in the tests where (I assume) Micheal only sets CFLAGS to '-march=bdver1' the optimization is set to default -O0 on GCC, which is NO optimization, making that benchmark totally pointless. '-march=' only tells the compiler which cpu target to optimize for, -O is what actually turns on optimization at different levels of strenght.
    So am I reading the povray graph right when I say: going from generic binaries + optimization to bdver1 binaries + no optimization makes it 25% faster and then from bdver1 binaries + no optimizations to bdver1 binaries + Ofast optimization is only very little improvement?

    Also afaik "-march=native -mtune=native" is redundant, since -mtune only optimizes in the range -march allows, which is only 1 when giving -march=native.

  6. #6
    Join Date
    Jul 2008
    Location
    Berlin, Germany
    Posts
    822

    Default

    Interesting would be to compare performance with generic CFLAGS with the CPU-specific ones. Benchmarking -O0 against -Ofast is mostly pointless.

    In particular, the following I'd like to see compared:
    • -O2 this is what most binary distributions use
    • -O2 -mtune=native this will produce code which is optimized for the current platform, but still runs on others (maybe slower)
    • -O2 -march=native this can in addition use instructions which are specific to the CPU

  7. #7
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by ChrisXY View Post
    So am I reading the povray graph right when I say: going from generic binaries + optimization to bdver1 binaries + no optimization makes it 25% faster and then from bdver1 binaries + no optimizations to bdver1 binaries + Ofast optimization is only very little improvement?
    No, obviously on the Povray test there was a -OX flag set since the difference between the '-bdver1' and '-bdver1 + Ofast' where so small, assuming that the -O level was -O3 for both, the difference would be due to -ffast-math which is enabled with -Ofast and can improve floating point math performance.

    However, looking at the other tests (GraphicsMagick, GCrypt) it's obvious that it's not using the same optimization (-O) since that's the only thing that can differ between '-march=bdver1' and '-march=bdver1 -Ofast'. And looking at the HUGE difference in performance between those benchmarks it's obvious that there's totally different optimization levels at play, and knowing that GCC defaults to -O0 (NO optimization) when no -O level is set (which is indicated by Michael omitting any -O setting in those increadibly poor performance tests) it makes sense that this is the case.

    So the results of the GraphicsMagick, GCrypt tests indicate that the flags were:

    'stock' = -O2 or -O3 I assume
    'bdver1 + Ofast' = -Ofast -march=bdver1
    'bdver1' = -march=bdver1 (thus GCC defaulting to -O0 which is no optimization at all)

    Again it's so sad that there's no reason to the way Micheal organizes these tests. To properly test each compiler you would atleast compile each package using -O2 and -O3, preferably add test where -march=native is used aswell. As it is Michael states that he uses whatever settings the packages came with which may or may not be the most optimal settings for compiler A, but not for compiler B, and he doesn't even declare those settings in the tests so we can judge the results with proper data at our disposal.

    Quote Originally Posted by ChrisXY View Post
    Also afaik "-march=native -mtune=native" is redundant, since -mtune only optimizes in the range -march allows, which is only 1 when giving -march=native.
    Yes, which is why I only mention '-march'.

  8. #8
    Join Date
    Oct 2007
    Location
    Sweden
    Posts
    174

    Default Optimizations

    All performance tests should always be done on optimized code (most used level is -O2).
    Option -march=bdver1 alone does not turn on any code optimizations, such as constant propagation or even dead code removal. Optimizations rewrite the code into using fewer instructions, which gives rather dramatic results. The march option will suggest for the compiler which instructions are faster than others doing the same job, effect of which is negligible compared to the time the instructions take to execute when they don't have to.
    I agree the results in graphicsmagick suggest optimizations where turned off for bdver standard.

  9. #9
    Join Date
    Jul 2009
    Posts
    257

    Default

    anything new on the "bedevere" optimisations?
    a fully optimised bulldozer system would be interesting to see in action...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •