LLVM Merges Machine Function Splitter For ~32% Reduction In TLB Misses

Written by Michael Larabel in LLVM on 31 August 2020 at 05:29 PM EDT. 18 Comments
LLVM
At the beginning of August we reported on Google engineers proposing the Machine Function Splitter to LLVM as a means of making binaries up to a few percent faster thanks to this code generation optimization pass for splitting code functions into hot and cold portions. That work has now been merged into LLVM 12.0 with very promising results.

The LLVM Machine Function Splitter was merged prior to the weekend into the Git code-base for what will be LLVM 12.0 early next year. Making use of this optimization pass ensures the hot code paths are loaded into the CPU cache while keeping the cold code paths at lower priority for the cache.

The Google engineers found a 2.33% runtime improvement with a ~32% reduction in iTLB and sTLB misses. The L1 iCache misses were done by 9.5% while L2 instruction misses dropped by 20%. For SPECInt, the Clang performance improved by 0.6~1.6%.

The code is merged and I'll be working on some Machine Function Splitter benchmarks soon. The Machine Function Splitter does rely upon profile information for being able to evaluate the hot/cold paths of the program.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week