Announcement

**SofS** · 25 September 2021, 11:19 AM

I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.

**Jannik2099** · 25 September 2021, 11:43 AM

Originally posted by SofS View Post

I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.

O3 is mainly not recommended because gcc likes to replace UB with catastrophically failing code, and the kernel has more UB than we'd all like it too (not saying you can have a kernel without UB, but definitely less than linux has right now). Also historically, gccs vectorizer used to have *interesting* miscompilations every other month - this has been fixed and very reliable for the past few major releases though.

BOLT isn't exactly a risky optimization. It reorders (and potentially deduplicates) code so that "hot" sections fit together / are close, to optimize cache lines. Since most of the kernel is built as PIC (position independent code) this is not an issue.

**thechef** · 25 September 2021, 12:32 PM

Once profiled? So optimized toward a particular way of using the kernel?

**CommunityMember** · 25 September 2021, 02:23 PM

Originally posted by Jannik2099 View Post

BOLT isn't exactly a risky optimization. It reorders (and potentially deduplicates) code so that "hot" sections fit together / are close, to optimize cache lines.

In addition to the CPU cache line improvements, it can also improve the TLB efficiency, and the branch predictor. As with all else, YMWV.

**CommunityMember** · 25 September 2021, 02:33 PM

Originally posted by thechef View Post

Once profiled? So optimized toward a particular way of using the kernel?

Yes, but for the hyperscalers (such as Facebook, or Google), there are many many thousands of systems doing specific things in specific ways, and if they can reduce the number of those systems by improving efficiency they can save many millions of dollars in both system acquisition and ongoing operational costs. So, one can imagine, a BOLT'd kernel for thousands of systems doing their file systems, and a different BOLT'd kernel for their thousands of systems running their web application services, etc.

**microcode** · 25 September 2021, 03:46 PM

Originally posted by SofS View Post

I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.

Binary layout optimization is pretty safe, because if your regions don't overlap, and the linker is linking the right symbols (these are easy to check) then it's working fine. Unlike fine grained optimizations, the risk of layout optimizations affecting the functioning of the kernel is virtually zero.

**sophisticles** · 25 September 2021, 03:51 PM

Great, now Facebook can take your data and sell it even faster.

**microcode** · 25 September 2021, 03:51 PM

Originally posted by CommunityMember View Post

Yes, but for the hyperscalers (such as Facebook, or Google), there are many many thousands of systems doing specific things in specific ways, and if they can reduce the number of those systems by improving efficiency they can save many millions of dollars in both system acquisition and ongoing operational costs. So, one can imagine, a BOLT'd kernel for thousands of systems doing their file systems, and a different BOLT'd kernel for their thousands of systems running their web application services, etc.

And beyond this, you can score layouts by different metrics as well. If you measured desktop latency on an AMD APU with GNOME, conceivably you could find binary layouts for the kernel and modules that have smaller latency spikes when interacting with desktop GUI apps; or maybe instead you do it for games. Maybe Valve can find binary layouts for the kernel, with static drivers for their hardware, that minimize frame time spikes in selected performance-critical games.

**cb88** · 25 September 2021, 04:42 PM

Originally posted by thechef View Post

Once profiled? So optimized toward a particular way of using the kernel?

Actualy no. It wouldn't change *how* it is using the kernel at all... only reordering the program so that it stays in cache more often.

Announcement

Facebook Has Been Working On BOLT'ing The Linux Kernel For Greater Performance

Facebook Has Been Working On BOLT'ing The Linux Kernel For Greater Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment