Amazon Graviton3 Compiler Tuning Benchmarks For The Arm Neoverse-V1 Cores
The TNN Tencent neural network library performance varied with SVE.
Long story short, as is often the case, it largely comes down to the particular code-bases and the workloads you actively engage in for whether the extra compiler tuning is worthwhile. In some workloads GCC 12 as tested did show some benefit with its SVE auto-vectorization support while in other cases not so much and in some cases did regress the performance. It will be interesting to see the impact of hand-tuned SVE intrinsics usage by major open-source software as more Arm processors come to mark with SVE/SVE2 support as well as how the never-ending open-source compiler advancements evolve.
Those wanting to see all of the tested compiler flag comparison benchmarks for the Graviton3 instance can see this result page.
If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.