Systemd/Microsoft Effort For A Global Counter On Block/Disk Changes Coming To Linux 5.15

Written by Michael Larabel in Linux Storage on 29 July 2021 at 03:00 AM EDT. 33 Comments
LINUX STORAGE
Last month I wrote about a possible global counter for block/disk changes on Linux being discussed by Microsoft and systemd developers to better track changes via a system-wide monotonically increasing number as an alternative to the existing per-disk tracking. That functionality is now queued up as part of the block subsystem changes ahead of the Linux 5.15 merge window in a few weeks.

This global counter for block device changes is sought after to better correlate events for devices that may end up re-using the same device, commonly for cases like /dev/sda or /dev/loop0 when a device is detached and then later reattached but not necessarily the same device. User-space software like systemd could thus benefit from such a system-wide numbering scheme to better handle events to avoid issues around device re-use confusion or events arriving to user-space out-of-order.

Those patches providing this global counter for block device changes by Microsoft's Matteo Croce were queued on Thursday to the block subsystem's "for-5.15" Git branch.


The main commit further sums up the motivation:
Associating uevents with block devices in userspace is difficult and racy: the uevent netlink socket is lossy, and on slow and overloaded systems has a very high latency. Block devices do not have exclusive owners in userspace, any process can set one up (e.g. loop devices). Moreover, device names can be reused (e.g. loop0 can be reused again and again). A userspace process setting up a block device and watching for its events cannot thus reliably tell whether an event relates to the device it just set up or another earlier instance with the same name.

Being able to set a UUID on a loop device would solve the race conditions. But it does not allow to derive orderings from uevents: if you see a uevent with a UUID that does not match the device you are waiting for, you cannot tell whether it's because the right uevent has not arrived yet, or it was already sent and you missed it. So you cannot tell whether you should wait for it or not.

Associating a unique, monotonically increasing sequential number to the lifetime of each block device, which can be retrieved with an ioctl immediately upon setting it up, allows to solve the race conditions with uevents, and also allows userspace processes to know whether they should wait for the uevent they need or if it was dropped and thus they should move on.

Additionally, increment the disk sequence number when the media change, i.e. on DISK_EVENT_MEDIA_CHANGE event.

The disk sequence number is exported via uevents, sysfs, and there is also a new BLKGETDISKSEQ ioctl. Assuming no last minute design objections, this code is slated to make it for Linux 5.15 as part of the pending block subsystem for-5.15 material.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week