~5 Minutes Of Coding Yields A 6%+ Boost To Linux I/O Performance

Written by Michael Larabel in Linux Storage on 16 January 2024 at 06:50 AM EST. 62 Comments
LINUX STORAGE
IO_uring creator and Linux block subsystem maintainer Jens Axboe spent about five minutes working on two patches to implement caching for issue-side time querying in the block layer and can yield 6% or more better I/O performance.

Axboe shared about his latest interesting Linux I/O performance optimization, "Something I've had in the back of my mind for years, and finally did it today. Which is kind of sad, since it was literally a 5 min job, yielding a more than 6% improvement. Would likely be even larger on a full scale distro style kernel config."

Axboe explained he typically disables iostats when testing due to the performance overhead of the time querying by default. But when providing some basic caching for the issue-side time querying, he's seeing around a 6% boost to IOPS and for a more bloated Linux distribution vendor kernel the gains are likely more significant.

Intel Optane storage


He detailed in the RFC patch series:
"Querying the current time is the most costly thing we do in the block layer per IO, and depending on kernel config settings, we may do it many times per IO.

None of the callers actually need nsec granularity. Take advantage of that by caching the current time in the plug, with the assumption here being that any time checking will be temporally close enough that the slight loss of precision doesn't matter.

If the block plug gets flushed, eg on preempt or schedule out, then we invalidate the cached clock.
...
which is more than a 6% improvement in performance. Looking at perf diff, we can see a huge reduction in time overhead:

10.55% -9.88% [kernel.vmlinux] [k] read_tsc
1.31% -1.22% [kernel.vmlinux] [k] ktime_get

Note that since this relies on blk_plug for the caching, it's only applicable to the issue side. But this is where most of the time calls happen anyway. It's also worth nothing that the above testing doesn't enable any of the higher cost CPU items on the block layer side, like wbt, cgroups, iocost, etc, which all would add additional time querying. IOW, results would likely look even better in comparison with those enabled, as distros would do."

A nice win and hopefully this continues to pan out and prove useful for upstreaming with the Linux v6.9 cycle in a few months,
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week