Results 1 to 10 of 16

Thread: Open64 Compiler Tuning On AMD Bulldozer FX-8150

Hybrid View

  1. #1
    Join Date
    Jan 2007
    Posts
    13,467

    Default Open64 Compiler Tuning On AMD Bulldozer FX-8150

    Phoronix: Open64 Compiler Tuning On AMD Bulldozer FX-8150

    After recently comparing the AMD Bulldozer with the GCC, Open64, and LLVM/Clang compilers, in this article is a look at the performance of AMD's Open64 compiler when using their recommended compiler tuning options for Bulldozer when building software.

    http://www.phoronix.com/vr.php?view=16638

  2. #2

    Default

    Why test only single options? At least one -O2 -march=bdver1 run and one with all recommended options would give more relevant results I think.

  3. #3
    Join Date
    Jan 2010
    Location
    Ghent
    Posts
    198

    Default

    Great. It shows that the "stock" options are pretty OK normally. I remember getting more and more requests for options to add when trying to compare array of compilers which in the end turned into an unmantainable amount of work.

    By the way... Any news on PCC (last I read was that it could compile the FreeBSD kernel - what about the OpenBSD migration to PCC etc - any progress?) and/or the mob-branch of TCC (which can be statically compiled to musl libc or uclibc, which might make it even faster)?

  4. #4
    Join Date
    May 2007
    Location
    Third Rock from the Sun
    Posts
    6,532

    Default

    Pretty predictable results. There are no consistent "magic" flags or compiler that deliver huge gains across all applications. Every application out there behaves differently to different flags or compiler and usually just leaving them to auto detect the capabilites and features will end up giving you the best overall performance. One just has to ask is it worth going through all the effort of finding that right combination for what is usually minimal gains.

  5. #5
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by deanjo View Post
    Pretty predictable results. There are no consistent "magic" flags or compiler that deliver huge gains across all applications.
    No, certainly no 'magic' flags. However, by giving the compiler the best possible data with which to make it's optimization decisions you do get the best result which can result in very substantial gains. The option which enables this is profile-guided-optimization (also known as feedback-driven-optimization).

    The reason this is not enabled by default is because it requires you to first compile the program you wish to optimize in an information-gathering stage and the execute it to gather runtime data (branching, cache usage etc), and finally compile it again (final time) where the runtime data gathered will be used to optimize the code in the most efficient way.

    This can of course be automated, an example would be Firefox which allows you the generate these pgo-optimized binaries leading to much faster performance (many may recall the debate surrounding the windows binary of firefox running much faster under wine than the native linux version which was due to the linux builds not enabling pgo at that time).

    While I've never come across a program where it wasn't faster with pgo than without, it's worth mentioning that the gains depend very much on how much the optimization heuristics failed to make accurate guesses when compiling without pgo. Many optimizations which has potential to bring huge performance gains such as loop unrolling are notoriously hard to estimate which is why no compiler I know of enables them by default, however when using pgo I know atleast GCC automatically enables them since they can accurately employed with the given profile-data.

    The performance increases are (in my tests) generally 5-20% depending on application (prime candidates are archivers/compressors, emulation, encoders).

    Apart from that, link time optimization can also yield decent performance increase by being able to look at an entire program as a whole rather than as separate code chunks. From my tests, link-time optimizations yield performance gains of 5% at best but I'm sure there are exceptions.

  6. #6
    Join Date
    Jan 2008
    Posts
    177

    Default

    Link-time optimization is always a good idea; code placement can be crucial particularly if you can cram more of the critical path into L1 instruction cache.

    Speaking of which - I'm always annoyed by the lack of analysis in these articles. "We ran foo and it yielded this number X. Next."

    Articles like this teach readers next to nothing, it offers pretty much zero enhancement to understanding.

    I would look at results for, e.g. GraphicsMagick and ask myself "why aren't the BD-specific optimizations helping?"
    And then I'd re-run the test using valgrind cachegrind and see what the code that the compiler generated is actually doing, in an instruction-level profile, and look at the cache hits and misses. (Of course, this assumes that you've built a new enough valgrind that has already been updated to support the new AVX etc. instructions....)

  7. #7

    Default

    Quote Originally Posted by AnonymousCoward View Post
    Why test only single options? At least one -O2 -march=bdver1 run and one with all recommended options would give more relevant results I think.
    It could be that Michael doesn't use Gentoo. People who use Gentoo know to do that. Users of other distributions do not unless they are kernel/systems programmers.

  8. #8
    Join Date
    Oct 2011
    Posts
    6

    Default

    The default Optimization level of open64 is -O2 .. if you specify nothing you get O2's.

    In other words: The "-march=bdver1" are with O2

    + quote from the article:

    The options tested included stock (not overriding any CFLAGS/CXXFLAGS and Open64 defaults to the -O2 optimization level)

  9. #9

    Default

    Quote Originally Posted by AgY! View Post
    The default Optimization level of open64 is -O2 .. if you specify nothing you get O2's.
    Oh, good, then at least that one is covered. Still it would be nice to know how all recommended flags together change the results.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •