Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Open64 Compiler Tuning On AMD Bulldozer FX-8150

  1. #1
    Join Date
    Jan 2007
    Posts
    15,125

    Default Open64 Compiler Tuning On AMD Bulldozer FX-8150

    Phoronix: Open64 Compiler Tuning On AMD Bulldozer FX-8150

    After recently comparing the AMD Bulldozer with the GCC, Open64, and LLVM/Clang compilers, in this article is a look at the performance of AMD's Open64 compiler when using their recommended compiler tuning options for Bulldozer when building software.

    http://www.phoronix.com/vr.php?view=16638

  2. #2

    Default

    Why test only single options? At least one -O2 -march=bdver1 run and one with all recommended options would give more relevant results I think.

  3. #3
    Join Date
    Jan 2010
    Location
    Ghent
    Posts
    216

    Default

    Great. It shows that the "stock" options are pretty OK normally. I remember getting more and more requests for options to add when trying to compare array of compilers which in the end turned into an unmantainable amount of work.

    By the way... Any news on PCC (last I read was that it could compile the FreeBSD kernel - what about the OpenBSD migration to PCC etc - any progress?) and/or the mob-branch of TCC (which can be statically compiled to musl libc or uclibc, which might make it even faster)?

  4. #4
    Join Date
    May 2007
    Location
    Third Rock from the Sun
    Posts
    6,584

    Default

    Pretty predictable results. There are no consistent "magic" flags or compiler that deliver huge gains across all applications. Every application out there behaves differently to different flags or compiler and usually just leaving them to auto detect the capabilites and features will end up giving you the best overall performance. One just has to ask is it worth going through all the effort of finding that right combination for what is usually minimal gains.

  5. #5
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by deanjo View Post
    Pretty predictable results. There are no consistent "magic" flags or compiler that deliver huge gains across all applications.
    No, certainly no 'magic' flags. However, by giving the compiler the best possible data with which to make it's optimization decisions you do get the best result which can result in very substantial gains. The option which enables this is profile-guided-optimization (also known as feedback-driven-optimization).

    The reason this is not enabled by default is because it requires you to first compile the program you wish to optimize in an information-gathering stage and the execute it to gather runtime data (branching, cache usage etc), and finally compile it again (final time) where the runtime data gathered will be used to optimize the code in the most efficient way.

    This can of course be automated, an example would be Firefox which allows you the generate these pgo-optimized binaries leading to much faster performance (many may recall the debate surrounding the windows binary of firefox running much faster under wine than the native linux version which was due to the linux builds not enabling pgo at that time).

    While I've never come across a program where it wasn't faster with pgo than without, it's worth mentioning that the gains depend very much on how much the optimization heuristics failed to make accurate guesses when compiling without pgo. Many optimizations which has potential to bring huge performance gains such as loop unrolling are notoriously hard to estimate which is why no compiler I know of enables them by default, however when using pgo I know atleast GCC automatically enables them since they can accurately employed with the given profile-data.

    The performance increases are (in my tests) generally 5-20% depending on application (prime candidates are archivers/compressors, emulation, encoders).

    Apart from that, link time optimization can also yield decent performance increase by being able to look at an entire program as a whole rather than as separate code chunks. From my tests, link-time optimizations yield performance gains of 5% at best but I'm sure there are exceptions.

  6. #6
    Join Date
    Jan 2008
    Posts
    194

    Default

    Link-time optimization is always a good idea; code placement can be crucial particularly if you can cram more of the critical path into L1 instruction cache.

    Speaking of which - I'm always annoyed by the lack of analysis in these articles. "We ran foo and it yielded this number X. Next."

    Articles like this teach readers next to nothing, it offers pretty much zero enhancement to understanding.

    I would look at results for, e.g. GraphicsMagick and ask myself "why aren't the BD-specific optimizations helping?"
    And then I'd re-run the test using valgrind cachegrind and see what the code that the compiler generated is actually doing, in an instruction-level profile, and look at the cache hits and misses. (Of course, this assumes that you've built a new enough valgrind that has already been updated to support the new AVX etc. instructions....)

  7. #7
    Join Date
    Feb 2010
    Posts
    519

    Default

    Quote Originally Posted by highlandsun View Post
    Speaking of which - I'm always annoyed by the lack of analysis in these articles. "We ran foo and it yielded this number X. Next."

    Articles like this teach readers next to nothing, it offers pretty much zero enhancement to understanding.

    I would look at results for, e.g. GraphicsMagick and ask myself "why aren't the BD-specific optimizations helping?"
    And then I'd re-run the test using valgrind cachegrind and see what the code that the compiler generated is actually doing, in an instruction-level profile, and look at the cache hits and misses. (Of course, this assumes that you've built a new enough valgrind that has already been updated to support the new AVX etc. instructions....)
    Thank you, I was starting to think I was alone on that point.
    Countless sites have already thrown together graphs and called it a day, leaving readers to do the "e-peen = f(bar length)" math.

    I can read the data just fine, please focus on giving me information instead.

  8. #8
    Join Date
    Mar 2008
    Location
    Milan, Italy
    Posts
    108

    Default

    Quote Originally Posted by PsynoKhi0 View Post
    Thank you, I was starting to think I was alone on that point.
    Countless sites have already thrown together graphs and called it a day, leaving readers to do the "e-peen = f(bar length)" math.

    I can read the data just fine, please focus on giving me information instead.
    Totally agree!!
    I have to rely on readers' comments to gain some info/clue on what's happening in the figures! (provided that no troll wars start or OT hijacking occur)
    Maybe some follow-up article (or some editing on the article later on) deriving from forum discussions would be of much use to increase the overall Phoronix value and interest.

    Even a wrong guess is better than no guess: forums are there to correct/insult/discuss about'em

  9. #9
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,186

    Default

    Apart from that, link time optimization can also yield decent performance increase by being able to look at an entire program as a whole rather than as separate code chunks. From my tests, link-time optimizations yield performance gains of 5% at best but I'm sure there are exceptions.
    It was still somewhat broken in 4.6.1 for me, failing to build some apps and libs altogether. Apparently also not working too well w/ mingw.

  10. #10
    Join Date
    Jul 2011
    Posts
    125

    Default Open64 Compiler on Ubuntu

    How to install Open64 Compiler on Ubuntu?
    I followed their instructions in Ubuntu 10.04 - 11.10 and it never worked...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •