Intel Advanced Matrix Extensions [AMX] Performance With Xeon Scalable Sapphire Rapids

Written by Michael Larabel in Processors on 16 January 2023 at 04:00 PM EST. Page 6 of 6. 20 Comments.

Keep in mind the Xeon Platinum 8490H Linux review / benchmarks from last week were already running oneDNN and OpenVINO in their out-of-the-box state that does leverage Advanced Matrix Extensions on supported CPUs. That's also why the oneDNN and OpenVINO results were so favorable in that comparison to AMD EPYC Genoa processors and prior generation Ice Lake processors... AMX has a huge benefit for INT8/BF16 matrix multiplication performance with the AMX TMUL unit and then with Xeon Scalable "Granite Rapids" is also AMX-FP16 support in the pipe and it will be interesting to see what other AMX tiles may be introduced by Intel in future processors.

For the last of today's Sapphire Rapids AMX testing is a look on the same Ubuntu installation seeing how the AMD EPYC 9654 2P performance compares to the OpenVINO results with/without AMX on the Xeon Sapphire Rapids 8490H.

Without AMX, the Xeon Platinun 8490H performance with OpenVINO loses out to the flagship AMD EPYC 9654 processors across the board while otherwise the EPYC 9654 2P still edges out a win for some OpenVINO models.

The AMD EPYC 9654 processors do consume less power than the Xeon Platinun 8490H for this OpenVINO benchmarking and regardless of AMX or AVX-512 only.

The AMX benchmarks show the huge potential of Advanced Matrix Extensions for better performance in the areas of AI software. There were cases of 2~4x better performance while the Xeon Platinum 8490H processors were consuming less power than when just engaging AVX-512. With Intel's Sapphire Rapids launch slides they had 8x and 10x references for AMX performance in their slide deck, but that was comparing as well comparing FP32 against BF16 rather than maintaining the same data types as done for my benchmarking with just changing the ISA level and keeping everything else the same. In any event, these numbers showcase the significant potential for Sapphire Rapids with AI / machine learning workloads.

AMX also allows Sapphire Rapids to compete with the AVX-512-enabled AMD 4th Gen EPYC processors that otherwise would have outpaced this flagship 4th Gen Xeon Scalable SKU for OpenVINO / AI if not for the Advanced Matrix Extensions. AMX is to be thanked for really making the Sapphire Rapids AI performance competitive to the AMD EPYC 4th Gen "Genoa" processors. With Zen 4 processors adding their efficient AVX-512 implementation, AMD really upped the game there for AVX-512 relevant workloads over prior generation EPYC processors.

All 4th Gen Xeon Scalable "Sapphire Rapids" processors offer AMX functionality, which is great to see and not being artificially segmented. I continue to work on exploring and benchmarking the other accelerators with Sapphire Rapids. Still showing extremely promising prospects out of the Sapphire Rapids line-up would be the Intel Xeon CPU Max Series with 64GB of HBM2e memory given the significant performance uplift it should provide in memory bandwidth intensive applications and paired with AMX for AI workloads could be a real powerhouse for AI systems. The Xeon CPU Max Series 9480 as the flagship model topping out at $12,980 is also more comparable to the listed EPYC 9654 pricing, especially if you are able to forego some DDR5 memory expenses as a result of the onboard HBM2e memory acting alone or in combination with some DDR5 system memory. The Xeon CPU Max Series 9480 model has 56 cores / 112 threads compared to the Xeon Platinum 8490H as tested topping out Intel's current core offerings with 60 cores / 120 threads per processor.

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.


Related Articles
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.