Harnessing Incredible AI Compute Power Atop Open-Source Software: 8 x AMD MI300X Accelerators On Linux

Written by Michael Larabel in Graphics Cards on 14 March 2024 at 03:00 PM EDT. Page 1 of 1. 26 Comments.

A few days ago I had the chance to indulge on an incredible compute nirvana: eight AMD Instinct MI300X accelerators at my disposal for some albeit brief testing. Not only was it fantastic from the sheer compute performance, but for Phoronix fans, all the more exciting knowing it's atop a fully open-source software stack from the kernel driver up through the various user-space libraries (well, sans the GPU microcode). This first encounter with the AMD MI300 series was eye-opening in seeing how far the ROCm software stack has come and the increased challenges for NVIDIA going forward with the rising competitiveness of AMD's hardware and software efforts.

AMD MI300X

On short notice AMD allowed me gratis access to the AMD Accelerator Cloud to an instance with eight MI300X accelerators. They had three days between customer uses that over the past weekend allowed me to experiment with the MI300X hardware and their latest open-source Linux software stack.

I had unfettered access to the MI300X hardware to experiment. AMD did setup some Docker containers in advance to easily try out the Llama 2 large language model and the like, beyond all of the other ROCm models and various AI workloads. Running on the AMD Accelerator Cloud was ROCm 6.0 as well as the ability to try out ROCm 6.1 that is still undergoing work for its official release in the coming weeks. For ROCm 6.1, AMD is preparing a number of important optimizations for the MI300X series including vLLM support, greater quantization support, HIP Graph support, and more. AMD also announced today that OpenAI has merged Triton AMD GPU support upstream for OpenAI Triton 3.0.

Considering that I've been covering the AMD Linux graphics driver scene for twenty years now -- going back to the years before they had an open-source driver strategy and were starting out with the notorious "fglrx" proprietary driver, this AMD MI300X experience was an incredible reflection at how far AMD's open-source software support has come. Llama 2 and other AI workloads were running with great speed and the software support was behaving well atop the Ubuntu 22.04 LTS installation.

From the open-source perspective, the only other competition at this point would be Intel with their Habana Labs Gaudi2 hardware that is supported by the upstream Linux kernel and with the accompanying SynapseAI user-space software though haven't yet been able to test out the software experience myself. Over the past two years NVIDIA's open kernel modules have come about to at least provide open-source kernel drivers for their hardware but CUDA and all of their other user-space drivers like OpenGL, OpenCL, and Vulkan support remains closed-source. There are no signs of NVIDIA having any plans on making CUDA more open-source friendly. With the NVIDIA open kernel modules not being upstream in the mainline Linux kernel is a further hinderance to adoption and open-source/Linux ideals.

This was also my first time trying out the AMD Accelerator Cloud "AAC" after previously trying out their former AMD Cloud Platform back in 2022. AAC was easy to deal with and trouble-free from my brief encounter. Intel on the other side of the table has their Developer Cloud "DevCloud" but hasn't yet been tested at Phoronix.

AMD slide
More background information on the Instinct MI300 series via AMD's December AI event.

The AMD Instinct MI300X are rated for 750 Watt power capacity and indeed with the hardware under test was able to push all eight Instinct MI300X to their 750 Watt rating as reported via the sensors exposed by rocm-smi.

Due to the short notice on AMD Instinct MI300X availability and then the very brief time to "kick the tires" with the AAC access as well as for not having a similarly configured server locally with different GPUs/accelerators, this article is just serving as an overview of my initial experience. I hope to have longer access to AMD MI300 series hardware soon at which point will be focused more on the performance benchmarking. But from this initial encounter, I will say for now this initial testing was very positive and exceeded my expectations. It's amazing how far the AMD open-source compute support has come with getting Llama 2 and other AI workloads up and running as AMD software engineers feverishly tackling more software improvements for ROCm in 2024. There is still some catching up to do with NVIDIA on the software side while with not having had the time to experiment much with ROCm 6.0 previously it was excellent to see the recent progress achieved.

So for now that's the brief summary of my first rodeo with the AMD Instinct MI300 series, but stay tuned for more testing (hopefully) soon. It's been a great joy at just how far AMD Linux driver support has come over the past two decades of closely covering it on Phoronix. Thanks to AMD for offering the gratis access for some preliminary tests of the Instinct MI300X on the AMD Accelerator Cloud.

26 Comments

If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.