Announcement

**CochainComplex** · 20 May 2021, 11:23 AM

is there something wrong with flto on gcc 11?

**indepe** · 20 May 2021, 08:56 PM

What surprises me is the big differences, in both directions, especially among the first half of the benchmarks.

**hubicka** · 21 May 2021, 05:07 AM

Originally posted by CochainComplex View Post

is there something wrong with flto on gcc 11?

It seems that the flto regresses only on ncnn benchmark but since there are just few benchmakrs done it affects the geomean noticeably. I will take a look on it.

As mentioned on the other thread, when comparing performance with -fpic/-fPIC one needs to take into account that gcc defaults to -fsemantic-interpositoin (as specified by ELF standard) while clang to -fno-semantic-interposition. This affect performance noticeably since it blocks inter-procedural optimization. So I would use -fno-semantic-interposition for gcc (note that -fno-semantic-interposition in clang is buggy by localising variables)

For -O2 the main difference is that clang enables vectorization, gcc needs -ftree-vectorize -ftree-slp-vectorize for that. I hope the default will be changed in future. For type of benchmarks tested here vectorization makes noticeable difference.

**CochainComplex** · 26 May 2021, 05:14 AM

Originally posted by hubicka View Post

It seems that the flto regresses only on ncnn benchmark but since there are just few benchmakrs done it affects the geomean noticeably. I will take a look on it.

As mentioned on the other thread, when comparing performance with -fpic/-fPIC one needs to take into account that gcc defaults to -fsemantic-interpositoin (as specified by ELF standard) while clang to -fno-semantic-interposition. This affect performance noticeably since it blocks inter-procedural optimization. So I would use -fno-semantic-interposition for gcc (note that -fno-semantic-interposition in clang is buggy by localising variables)

For -O2 the main difference is that clang enables vectorization, gcc needs -ftree-vectorize -ftree-slp-vectorize for that. I hope the default will be changed in future. For type of benchmarks tested here vectorization makes noticeable difference.

thx for your insights - I always like your additional informations.

Announcement

LLVM Clang 12 Compiler Is Performing Very Well For AMD Ryzen 9 5950X / Zen 3

LLVM Clang 12 Compiler Is Performing Very Well For AMD Ryzen 9 5950X / Zen 3

Comment

Comment

Comment

Comment