The Linux 4.0 EXT4 RAID Corruption Bug Has Been Uncovered

Written by Michael Larabel in Linux Storage on 21 May 2015 at 08:32 AM EDT. 23 Comments
LINUX STORAGE
A few days ago we reported on an EXT4 file-system corruption issue being discovered within the stable Linux 4.0 kernel series. The good news is the issue has been uncovered and a patch is available, but it could still be a few days before it starts getting sent out in stable updates.

In the original article it was mentioned it looked to be an EXT4 RAID issue and that indeed turned out to be the case. The issue was caused by an MD commit late into the Linux 4.0 kernel cycle, a.k.a. commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd that was md/raid0: fix bug with chunksize not a power of 2.. The commit by SUSE's Neil Brown explained, "Since commit 20d0189b1012a37d2533a87fb451f7852f2418d1 in v3.14-rc1 RAID0 has performed incorrect calculations when the chunksize is not a power of 2. This happens because sector_div() modifies its first argument, but this wasn't taken into account in the patch. So restore that first arg before re-using the variable."


It turns out that this "fix" to an issue present since Linux 3.14-rc1 is what's causing the EXT4 RAID corruption problems on Linux 4.0.x. Eric Work has devised a small fix to address the corruption problem, but for now it's only present within the MD Git tree. Neil Brown commented, "The patch was only added to my tree today. I will send to Linus tomorrow so it should appear in the next -rc. Any -stable kernel released since mid-April probably has the bug. Once the fix gets into Linus' tree, it should get into subsequent -stable releases."

Thus for now all EXT4 RAID0 users on the Linux 4.0.x kernel or current Linux 4.1 Git code are advised to downgrade until the next 4.1 release candidate or 4.0.x stable release otherwise you stand good chances of hosing your file-system. It also looks like if dropping the discard mount option you will also avoid being hit by this serious issue. This isn't a problem for Linux users on distributions like RHEL, Ubuntu, and other fixed-release distributions that don't tend to update major versions of their kernel post-release, but this corruption issue has already become a problem for Arch Linux and other rolling-release distributions with users who quickly jump to new versions of upstream software.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week