AMD EDAC/RAS Code Adds GPU/Accelerator Support In Linux 6.5

Written by Michael Larabel in AMD on 27 June 2023 at 06:08 AM EDT. 4 Comments
AMD
In addition to yesterday bringing EDAC support for AMD Zen 4 client CPUs, the set of RAS "Reliability, Availability and Serviceability" updates for the Linux 6.5 kernel have separately brought initial GPU/accelerator support.

This is the code that has been in the works the past few months for extending the Linux EDAC driver for data center GPUs. In particular, getting the AMD64 Error Detection and Correction driver working for AMD Instinct MI200 GPUs with HBM.

AMD.com graphic


The RAS pull request sent out yesterday for Linux 6.5 explains:
"Add initial support for RAS hardware found on AMD server GPUs (MI200). Those GPUs and CPUs are connected together through the coherent fabric and the GPU memory controllers report errors through x86's MCA so EDAC needs to support them. The amd64_edac driver supports now HBM (High Bandwidth Memory) and thus such heterogeneous memory controller systems."

That code has now been merged for Linux 6.5. While the initial focus is on the MI200 series, it will also be important for the forthcoming AMD Instinct MI300 series too.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week