Announcement

Collapse
No announcement yet.

Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

    Phoronix: Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

    With the GCC 9 compiler due to be officially released as stable in the next month or two, we've been running benchmarks of this near-final state to the GNU Compiler Collection on a diverse range of processors. In recent weeks that has included extensive compiler benchmarks on a dozen x86_64 systems, POWER9 compiler testing on the Talos II, and also the AArch64 compiler performance on recent releases of GCC and LLVM Clang. In this latest installment of our GCC 9 compiler benchmarking is an extensive look at the AMD EPYC Znver1 performance on various releases of the GCC compiler as well as looking at various optimization levels under this new compiler on the Znver1 processor.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    So this confirms my choice to do -O3 -march=native for my Gentoo installation

    Comment


    • #3
      I'm going to continue to object to compiling C-Ray without -ffast-math at any optimisation level, as it's in the original Makefile, which should put to bed any arguments about whether it is valid to enable that optimisation. I haven't looked at the Makefiles of other projects in the benchmark suite but there might be more.

      Comment


      • #4
        If you want reasonable link-times with LTO, use -flto=<number_hyperthreads>

        Comment


        • #5
          It would have been interesting to see how Intel-optimzed stuff runs compared to znver1. IIRC back at release Haswell optimization ran much better than the intended znver1.

          Comment


          • #6
            Originally posted by Spam View Post
            So this confirms my choice to do -O3 -march=native for my Gentoo installation
            Ditto with my freebsd installs from ports. I've had many people ask why bother when binary pkg is easier. "This isn't 2001 any more..." etc....

            It's good to see that optimising for march still yields tangible performance results!

            Comment


            • #7
              Note that the graphicmagick/imagemagick performance issue is known at has patch (or workaround) posted here https://gcc.gnu.org/ml/gcc-patches/2.../msg01380.html

              Since GCC 7 memcpy is expanded using SSE/AVX instructions where profitable. Problem here is that function takes as parameter structure which is copied. Since the structure is built by the callers element-wise and the stores are smaller this trigger store->load forwarding issue in modern CPUs.

              This problem affects all compilers which do not copy aggregates element-wise. For example compiling:
              struct a {char a,b,c,d;} aa,bb;
              test ()
              {
              aa=bb;
              }

              Depending on the context test() is invoked, it may be loss to use 32bit move to copy the structure. Both GCC and clang does that.

              Comment


              • #8
                As we have said before -ftree-vectorize implies -ftree-slp-vectorize, setting both is pointless. You probably meant to set -ftree-loop-vectorize and -ftree-slp-vectorize, though that is the exact same as just setting -ftree-vectorize.

                Comment


                • #9
                  Originally posted by hubicka View Post
                  If you want reasonable link-times with LTO, use -flto=<number_hyperthreads>
                  And -fno-fat-lto-objects when compiling. Otherwise you end up needlessly optimizing everything twice.

                  Comment


                  • #10
                    Originally posted by carewolf View Post

                    And -fno-fat-lto-objects when compiling. Otherwise you end up needlessly optimizing everything twice.
                    no-fat-lto-objects is default for all targets which supports linker plugin for quite some releases
                    But yes, another thing you want to be sure about is that linker plugin path is working. Without plugin the code quality as well as compile time is not very good.

                    Comment

                    Working...
                    X