Announcement

**S.Pam** · 20 February 2019, 01:12 PM

So this confirms my choice to do -O3 -march=native for my Gentoo installation

**campbell** · 20 February 2019, 01:42 PM

I'm going to continue to object to compiling C-Ray without -ffast-math at any optimisation level, as it's in the original Makefile, which should put to bed any arguments about whether it is valid to enable that optimisation. I haven't looked at the Makefiles of other projects in the benchmark suite but there might be more.

**hubicka** · 20 February 2019, 02:47 PM

If you want reasonable link-times with LTO, use -flto=<number_hyperthreads>

**eydee** · 20 February 2019, 06:15 PM

It would have been interesting to see how Intel-optimzed stuff runs compared to znver1. IIRC back at release Haswell optimization ran much better than the intended znver1.

**edenist** · 20 February 2019, 10:22 PM

Originally posted by Spam View Post

So this confirms my choice to do -O3 -march=native for my Gentoo installation

Ditto with my freebsd installs from ports. I've had many people ask why bother when binary pkg is easier. "This isn't 2001 any more..." etc....

It's good to see that optimising for march still yields tangible performance results!

**hubicka** · 21 February 2019, 10:05 AM

Note that the graphicmagick/imagemagick performance issue is known at has patch (or workaround) posted here https://gcc.gnu.org/ml/gcc-patches/2.../msg01380.html

Since GCC 7 memcpy is expanded using SSE/AVX instructions where profitable. Problem here is that function takes as parameter structure which is copied. Since the structure is built by the callers element-wise and the stores are smaller this trigger store->load forwarding issue in modern CPUs.

This problem affects all compilers which do not copy aggregates element-wise. For example compiling:
struct a {char a,b,c,d;} aa,bb;
test ()
{
aa=bb;
}

Depending on the context test() is invoked, it may be loss to use 32bit move to copy the structure. Both GCC and clang does that.

**carewolf** · 21 February 2019, 11:40 AM

As we have said before -ftree-vectorize implies -ftree-slp-vectorize, setting both is pointless. You probably meant to set -ftree-loop-vectorize and -ftree-slp-vectorize, though that is the exact same as just setting -ftree-vectorize.

**carewolf** · 21 February 2019, 11:45 AM

Originally posted by hubicka View Post

If you want reasonable link-times with LTO, use -flto=<number_hyperthreads>

And -fno-fat-lto-objects when compiling. Otherwise you end up needlessly optimizing everything twice.

**hubicka** · 21 February 2019, 12:12 PM

Originally posted by carewolf View Post

And -fno-fat-lto-objects when compiling. Otherwise you end up needlessly optimizing everything twice.

no-fat-lto-objects is default for all targets which supports linker plugin for quite some releases

But yes, another thing you want to be sure about is that linker plugin path is working. Without plugin the code quality as well as compile time is not very good.

Announcement

Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

Extensive Benchmarks Looking At AMD Znver1 GCC 9 Performance, EPYC Compiler Tuning

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment