LLVM Clang Shows Off Great Performance Advantage On NVIDIA GH200's Neoverse-V2 Cores

Written by Michael Larabel in Software on 18 March 2024 at 11:20 AM EDT. Page 4 of 4. 14 Comments.
SecureMark benchmark with settings of Benchmark: SecureMark-TLS. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 1, Buffer Length: 256, Filter Length: 32. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 1, Buffer Length: 256, Filter Length: 57. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 1, Buffer Length: 256, Filter Length: 512. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 32, Buffer Length: 256, Filter Length: 32. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 72, Buffer Length: 256, Filter Length: 32. Clang 17 was the fastest.
Liquid-DSP benchmark with settings of Threads: 72, Buffer Length: 256, Filter Length: 57. Clang 17 was the fastest.
Stress-NG benchmark with settings of Test: Matrix Math. Clang 17 was the fastest.
Stress-NG benchmark with settings of Test: Vector Math. Clang 17 was the fastest.
Stress-NG benchmark with settings of Test: Floating Point. GCC 13 was the fastest.
Stress-NG benchmark with settings of Test: Fused Multiply-Add. GCC 13 was the fastest.
Stress-NG benchmark with settings of Test: Vector Floating Point. Clang 17 was the fastest.

Clang was typically yielding the fastest binaries tested on this NVIDIA GH200 server compared to GCC 13.2 as shipped by Ubuntu 23.10 ARM64.

Geometric Mean Of All Test Results benchmark with settings of Result Composite, NVIDIA GH200 Compilers. Clang 17 was the fastest.

Across nearly five dozen benchmarks tested, running the Clang 17 binaries were around 9% faster than GCC 13 on average for this Ubuntu Linux AArch64 testing on the GH200. Though the performance advantage of using Clang varies a lot depending upon the workload, so ultimately it's important to test your own particular workload(s) to evaluate the performance advantages or not to using Clang. There's also other factors to consider in choosing between compilers besides just the performance of the generated binary. In any event for those always curious around GCC vs. Clang performance, these numbers from the NVIDIA GH200 with the 72-core Neoverse-V2 CPU cores was quite interesting and some of the most advantageous yet for the LLVM stack. Again though given the popularity of Clang on mobile devices and its work by Apple as the default compiler and now under heavy ARM focus with Apple Silicon, these results shouldn't come as too much of a shocker.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.