Linux 6.6 Will Avoid Unnecessary Kernel Panics On AMD Zen Systems

Written by Michael Larabel in AMD on 28 August 2023 at 06:18 AM EDT. 16 Comments
AMD
As part of the Reliability, Availability and Serviceability (RAS) updates submitted today for the Linux 6.6 kernel is adding a quirk/workaround for dealing with current AMD Zen systems where a processor bug could lead to erroneously increased error severity and unneeded kernel panics.

A fix is on the way to the kernel for dealing with a possibility where RAS error severity could be erroneously increased and unnecessary kernel panics triggered.

Linux kernel panic


The bug and resulting quirk are explained in this resulting patch:
"The Instruction Fetch (IF) units on current AMD Zen-based systems do not guarantee a synchronous #MC is delivered for poison consumption errors. Therefore, MCG_STATUS[EIPV|RIPV] will not be set. However, the microarchitecture does guarantee that the exception is delivered within the same context. In other words, the exact rIP is not known, but the context is known to not have changed.

There is no architecturally-defined method to determine this behavior.

The Code Segment (CS) register is always valid on AMD Zen-based IF unit poison errors regardless of the value of MCG_STATUS[EIPV|RIPV].

Add a quirk to save the CS register for poison consumption from the IF unit banks.

Restrict this quirk to only the affected CPU families.

This is needed to properly determine the context of the error. Otherwise, the severity grading function will assume the context is IN_KERNEL due to the m->cs value being 0 (the initialized value). This leads to unnecessary kernel panics on data poison errors due to the kernel believing the poison consumption occurred in kernel context."

This quirk for AMD Zen systems to avoid the erroneously increased error severity and unneeded kernel panics has been submitted as one of two RAS/core changes for Linux 6.6. The patch is also marked for back-porting so it should work its way to stable Linux kernel versions soon.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week