Announcement

**pegasus** · 20 November 2018, 10:57 AM

HPCG and Scimark sparse matrix multiply results show that they improved instruction scheduling for memory bandwidth sensitive code a lot. Which is what HPC world likes a lot.

**GrayShade** · 20 November 2018, 11:01 AM

I'd like to see the same comparison with an Intel CPU.

Is AMD planning to upstream these improvements to LLVM?

**duby229** · 20 November 2018, 11:06 AM

It's ok benchmark run, but it's actually kinda hard to interpret. Here's my interpretation based on my own prior assumptions. It seems like AOCC catches LLVM up to GCC in some ways. Just look at LLVM vs GCC going way back and you can see LLVM is behind GCC in many performance benchmarks. What it seems like to me just based on that prior assumption and the little bit I see here in these benchmarks is that AOCC produces binaries that are as fast or bit faster than what GCC produces. That's not to say that LLVM was ever that far behind, i'm talking fractions represented on bar graphs here, just that AOCC catches the LLVM codebase up to or maybe a little better than GCC.

**Michael** · 20 November 2018, 11:07 AM

Originally posted by GrayShade View Post

I'd like to see the same comparison with an Intel CPU.

Is AMD planning to upstream these improvements to LLVM?

They upstream the patches where appropriate for upstreaming.

AOCC isn't designed to be run on anything but Znver1. There are GCC vs. Clang Intel benchmarks regularly on Phoronix, including like earlier this month. No ICC benchmarks if that is what you mean since I don't have commercial license for it.

**GrayShade** · 20 November 2018, 11:16 AM

Originally posted by Michael View Post

They upstream the patches where appropriate for upstreaming.

AOCC isn't designed to be run on anything but Znver1. There are GCC vs. Clang Intel benchmarks regularly on Phoronix, including like earlier this month. No ICC benchmarks if that is what you mean since I don't have commercial license for it.

I remembered that old issue with Intel intentionally generating code that was slower on AMD processors and I was wondering what effect these optimizations have on Intel CPUs. Of course, that was something else (run-time feature detection using the CPU manufacturer, IIRC). But yeah, this wouldn't be too interesting if the optimizations only kick on for znver1.

**duby229** · 20 November 2018, 11:17 AM

Originally posted by Michael View Post

They upstream the patches where appropriate for upstreaming.

AOCC isn't designed to be run on anything but Znver1. There are GCC vs. Clang Intel benchmarks regularly on Phoronix, including like earlier this month. No ICC benchmarks if that is what you mean since I don't have commercial license for it.

I wonder how much they would charge you for it? Have you ever looked into it, they might be willing to show off their tech and give you the hookup. Would anybody here be willing to donate the funds to Michael for such a cause?

**ms178** · 20 November 2018, 11:23 AM

Originally posted by Michael View Post

They upstream the patches where appropriate for upstreaming.

AOCC isn't designed to be run on anything but Znver1. There are GCC vs. Clang Intel benchmarks regularly on Phoronix, including like earlier this month. No ICC benchmarks if that is what you mean since I don't have commercial license for it.

Intel should sponsor a ICC license for you! And I'd love to see a compiler battle on your server hardware (I hear ICC is popular with the HPC crowd).

**zaphod_** · 20 November 2018, 11:44 AM

Icc is popular but in the cases which I have seen it only provides minimal improvements or sometimes even generates slower code than gcc/clang, which improved a lot in recent years. But really important in hpc is an optimized blas library. (like cudnn is curcial for deep learning)

Thus, another interesting benchmark would be a blas benchmark. This could be done in "hacky way" by using the R and python (pybench) benchmark with openblas, atlas and the intel mkl (these can be dynamically linked with update-alternatives on ubuntu/debian see [0]) on ryzen and intel cpus. This would show which vendor is better for number crunching. However, all results I have seen so far, show that amd unfortunately is significantly slower due to worse AVX support. However, this should be improved with zen2.

The intel mkl is available as deb with a source entry for debian/ubuntu (see the tutorial: [0] and the official intel website [1])

[0] https://github.com/eddelbuettel/mkl4deb
[1] https://software.intel.com/en-us/art...ython-apt-repo

**ldesnogu** · 20 November 2018, 11:56 AM

Originally posted by zaphod_ View Post

Icc is popular but in the cases which I have seen it only provides minimal improvements or sometimes even generates slower code than gcc/clang, which improved a lot in recent years. But really important in hpc is an optimized blas library. (like cudnn is curcial for deep learning)

I have the same experience: I try icc every few years on my code only to find it brings no speed improvement. OTOH my code is mostly integer code, for vectorizable FP code I guess the outcome wouldn't be the same.

Also to note is that icc is good at cheating at SPEC 2006 and that's why all entries in spec.org use it. I expect SPEC 2017 is the same, though they didn't break any of the subtests... yet

Announcement

AMD AOCC 1.3 Compiler Benchmarks vs. GCC 8.2 vs. LLVM Clang 7.0

AMD AOCC 1.3 Compiler Benchmarks vs. GCC 8.2 vs. LLVM Clang 7.0

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment