Some AMD CPUs To Benefit From New Micro-Optimization In Linux 6.6

Written by Michael Larabel in AMD on 5 September 2023 at 05:50 AM EDT. 6 Comments
AMD
One of the patches to be picked up by the Linux 6.6 kernel this week brings back REP MOSQ for user-access on CPUs without Enhanced REP MOVSB (ERMS) support. In turn this can equate to some performance benefits on AMD CPUs lacking ERMS.

While Intel CPUs going back to Ivy Bridge have offered Enhanced REP MOVSB (ERMS), some AMD CPUs even relatively recent models have lacked it (you can check for "erms" in your /proc/cpuinfo flags to see if your hardware is impacted). ERMS can lead to more efficient memory copies while the kernel change for Linux 6.6 brings back the REP MOVSQ sequence for the CPUs lacking it.

Linux 6.6 Assembly patch for REP MOVSQ


Mateusz Guzik explained in the patch tweaking the kernel's hand-tuned Assembly code:
x86: bring back rep movsq for user access on CPUs without ERMS

Intel CPUs ship with ERMS for over a decade, but this is not true for AMD. In particular one reasonably recent uarch (EPYC 7R13) does not have it (or at least the bit is inactive when running on the Amazon EC2 cloud -- I found rather conflicting information about AMD CPUs vs the extension).

Hand-rolled mov loops executing in this case are quite pessimal compared to rep movsq for bigger sizes. While the upper limit depends on uarch, everyone is well south of 1KB AFAICS and sizes bigger than that are common.

While technically ancient CPUs may be suffering from rep usage, gcc has been emitting it for years all over kernel code, so I don't think this is a legitimate concern.

Sample result from read1_processes from will-it-scale (4KB reads/s):

before: 1507021
after: 1721828 (+14%)

Note that the cutoff point for rep usage is set to 64 bytes, which is way too conservative but I'm sticking to what was done in 47ee3f1dd93b ("x86: re-introduce support for ERMS copies for user space accesses"). That is to say *some* copies will now go slower, which is fixable but beyond the scope of this patch.

The patch in Linux 6.6 mainline adjusts a few dozen lines of Assembly for this performance win on select AMD CPUs like the common EPYC 7R13 found in Amazon's cloud.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week