Intel Advanced Matrix Extensions [AMX] Performance With Xeon Scalable Sapphire Rapids

Written by Michael Larabel in Processors on 16 January 2023 at 04:00 PM EST. Page 3 of 6. 20 Comments.

Right away when pulling back to the max CPU ISA level on the Sapphire Rapids server, there was a clear difference in performance when making use of AMX versus restricting to AVX-512. Some operations with oneDNN were more than three and a half times faster thanks to AMX! It was also interesting as well with oneDNN 3.0 to show the difference compared to restricting to base AVX-512, to which the AMX implementation was more than five times faster.

While AVX-512 in earlier generations of Intel processors were known for severe downclocking due to the thermal/power impact of 512-bit Advanced Vector Extensions, this was thankfully not the case here with the oneDNN 3.0 testing observing no major peak frequency differences when AMX was utilized versus restricting to different AVX-512 levels.

it was wonderful to see though that when making use of AMX, the recorded CPU power consumption for the two Xeon Platinum 8490H processors was lower compared to just the AVX-512 numbers. For the AMX benchmark run, the combined 8490H power consumption was a 533 Watt average compared to 585~595 Watts for the AVX-512 only runs. The peak CPU power consumption was also lower at 637 Watts compared to 722~744 Watts with a max ISA level of AVX-512.

The lower CPU power consumption did translate to slightly lower CPU core temperatures when leveraging AMX. After the time period where AVX-512 caused CPU core temperatures to go wild, it's wonderful seeing AMX providing better thermal and power efficiency.

Across other oneDNN benchmark runs with BF16 use, it was nice seeing Intel AMX successfully yield better performance compared to when only using AVX-512.

In some runs the performance advantage wasn't as significant but there was still a clear power-savings benefit from AMX acceleration.

With Intel's oneAPI Deep Neural Network Library being used by the likes of PyTorch, TensorFlow, ONNX Runtime, MXNet, MATLAB, and others, it's great to see the possible INT8/BF16 MATMAUL speed-ups to find with AMX and that it's even yielding better power efficiency compared to just the AVX-512 use. It's a very promising start for AMX performance and now to hopefully find more use of the Advanced Matrix Extensions in other AI/ML software not leveraging the oneDNN building blocks.


Related Articles