Announcement

**Anty** · 31 October 2018, 06:44 AM

I just wonder if they are going to fix scheduler cost tables because right now code compiled with znver1 in most cases is slower than compiled with haswell or skylake when run on ryzen CPU.

**hubicka** · 31 October 2018, 06:50 AM

Originally posted by Anty View Post

I just wonder if they are going to fix scheduler cost tables because right now code compiled with znver1 in most cases is slower than compiled with haswell or skylake when run on ryzen CPU.

Latencies and scheduler was retuned after final hardware arrived (for gcc 8). If you have benchmarks where gcc 8 produces worse code with zen tuning than skylake, I would be interested in looking into them.

**ms178** · 31 October 2018, 07:28 AM

Originally posted by hubicka View Post

Latencies and scheduler was retuned after final hardware arrived (for gcc 8). If you have benchmarks where gcc 8 produces worse code with zen tuning than skylake, I would be interested in looking into them.

Very quick response from the developer himself.

I was about to answer him that you did some optimization work in this area but wasn't sure anymore if it went into gcc8 or gcc9. By the way, I've noticed your June 2018 update on LTO and that it went through smoothly. Well done! One question though: Have you experimented with enabling GRAPHITE optimizations on top of LTO on packages where it could be beneficial? I don't have any hard numbers, but I did notice a notable reduction in RAM usage using these flags on my custom Kernel build. But as I also used a custom config I cannot attribute all of it to GRAPHITE. As many improvements went into gcc8 for GRAPHITE it has become more usable (I got ICE's before on ffmpeg or VLC with gcc7) and I'd like to see all of these optimization work used more commonly all around in the Linux world. Could you tell me what is holding back more widespread usage of these processor agnostic optimizations by the distros?

**Anty** · 31 October 2018, 07:42 AM

Originally posted by hubicka View Post

Latencies and scheduler was retuned after final hardware arrived (for gcc 8). If you have benchmarks where gcc 8 produces worse code with zen tuning than skylake, I would be interested in looking into them.

In free time I can look for cases I mentioned - but AFAIR GCC 8 also exhibit this behavior in AVX/AVX2 heavy code. Impact was not huge but measurable and reproducible.

**BoMbY** · 31 October 2018, 07:55 AM

You previously wrote:

When checking the latest model data, the later Family 17h Models up through 2Fh (47) are indeed for Zen 2.

but this patch says:

Code:

+ if (model >= 0x30)
+ __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;

**Zan Lynx** · 31 October 2018, 04:49 PM

If I remember correctly, those cache instructions are for PMEM (Persistent Memory) such as Intel's Optane NVDIMMs. This looks like AMD adding these for compatibility.

Does anyone know what AMD's plans are regardiing NVDIMM and PMEM?

**Anty** · 01 November 2018, 11:24 AM

Originally posted by hubicka View Post

Latencies and scheduler was retuned after final hardware arrived (for gcc 8). If you have benchmarks where gcc 8 produces worse code with zen tuning than skylake, I would be interested in looking into them.

Jan, I have generated comparison for sandybridge ivybridge haswell broadwell skylake znver1, for optimization levels -O2, -O3, -Ofast and -Os for GCC 7.3 and GCC 8.2.
What I realize now is GCC 8.2 has serious problems when compared to 7.3 - sometimes more than 50%! Anyway my report shows that -march= haswell, broadwell or skylake is win-win situation when using Ofast.

I will send you full info to private email soon with all the details.

Announcement

AMD Publishes Zen 2 Compiler Patch "znver2" Exposing Some New Instructions

AMD Publishes Zen 2 Compiler Patch "znver2" Exposing Some New Instructions

Comment

Comment

Comment

Comment

Comment

Comment

Comment