That Linux 5.12 Severe Data Corruption Bug Hits Intel CI Systems - Issue Caused By Swap File

Written by Michael Larabel in Linux Kernel on 2 March 2021 at 01:43 PM EST. 34 Comments
LINUX KERNEL
Last week I issued a warning of possible data loss on the early Linux 5.12 kernel code that was reliably leaving my test systems severely corrupted. Intel's internal graphics test systems it turns out have now been bitten by this issue in encountering this significant file-system corruption and as such they've been quick to jump on the issue - there's now an idea what's causing the nasty issue and a workaround by reverting select patches.

As reported last week, on my test systems with the Linux 5.12 kernel I have been suffering from significant data corruption during benchmarking. Running e2fsck on the EXT4 file-systems would yield a plethora of errors and ultimately not recoverable. Besides the fact of having to either recover from a backup image or reinstall from scratch each time, making it more complex was seeing this behavior even before EXT4 file-system changes were merged for the 5.12 cycle and they tended to be on the mundane side anyhow -- likely indicating a problem elsewhere in the kernel and not something specific to EXT4, just that many of my test systems are using EXT4.

It's been a slow process sorting bisecting it given that a block-level backup needs to be restored each time or just re-installing Ubuntu from scratch, thus much more time consuming than bisecting a "simple" performance regression of the kernel. Plus with everything else on my plate, it's been a rough week dealing with Linux 5.12. At the same time left wondering why more folks aren't hitting this nasty bug and screaming about it - but today, Intel has joined the chorus.


Phew, others now seeing this issue too... Thanks to impacting Intel and their greater resources, the issue should be buttoned up much quicker.


It turns out Intel's graphics continuous integration systems have been impacted by this system so fortunately that led Intel engineers to looking at the issue. There was a notice sent out today regarding the issue.


Intel's Tomi Sarvela noted, "Hitting the bug corrupts the underlying filesystem very thoroughly, wiping out large amount of data from the beginning of the partition which leaves fsck sad with thousands of items lost. Bisection of the IGT testlist was done with two root filesystems, where testable kernel booted from 2. partition, and copy of the 2. partition was stored on 1. partition and could be restored at will."

The analysis on the Intel side found it to happen during their testing but an important discovery is that it appears to be related to systems with an active Swapfile rather than a swap partition or no swap at all. With the current Linux code, the file-system left trashed is the one containing a swapfile.

Intel's Chris Wilson was able to bisect the issue on their three systems and found three patches to the Linux kernel's memory management code touching the swapfile handling. When reverting the three patches they are no longer seeing this severe file-system corruption appear.

So hopefully after further testing those patches will be upstreamed or a more adequate swapfile fix to those patches carried out. I'll be testing on my end now thanks to the Intel discoveries. As it stands right now, Linux 5.12 Git is still vulnerable to this nasty issue and so for those relying on a swap file would certainly recommend to avoid testing out this development kernel until a fix or revert to those patches have landed.

UPDATE: A fix has landed for Linux 5.12 on 3 March.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week