Initial Retpoline Support Added To LLVM For Spectre v2 Mitigation
The LLVM code has been merged to mainline for the Retpoline x86 mitigation technique for Spectre Variant 2. This will be back-ported to LLVM 6.0 and also LLVM 5.0 with an immediate point release expected to get this patched compiler out in the wild.
The compiler-side work -- similar to GCC's Retpoline code -- is to avoid generating code where an indirect branch could have its prediction poisoned by a rogue actor. The Retpoline support uses indirect calls in a non-speculatable way.
The LLVM switches are slightly different from GCC's -mindirect-branch options with LLVM opting for -mretpoline and -mretpoline-external-thunk if wanting to use a custom thunk.
About the Retpoline'd LLVM performance impact:
The code is currently in LLVM 7.0 Git/SVN while will be back-ported to LLVM 6.0 being prepared for release next month and is also working its way into LLVM 5.0 code-base. Once that code is in the LLVM 5.0 branch with that being the current stable series, LLVM developers are planning on issuing a new point release as soon as possible to get this -mretpoline support more widely available.
The compiler-side work -- similar to GCC's Retpoline code -- is to avoid generating code where an indirect branch could have its prediction poisoned by a rogue actor. The Retpoline support uses indirect calls in a non-speculatable way.
The LLVM switches are slightly different from GCC's -mindirect-branch options with LLVM opting for -mretpoline and -mretpoline-external-thunk if wanting to use a custom thunk.
About the Retpoline'd LLVM performance impact:
When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel.
When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline.
The code is currently in LLVM 7.0 Git/SVN while will be back-ported to LLVM 6.0 being prepared for release next month and is also working its way into LLVM 5.0 code-base. Once that code is in the LLVM 5.0 branch with that being the current stable series, LLVM developers are planning on issuing a new point release as soon as possible to get this -mretpoline support more widely available.
3 Comments