Announcement

Collapse
No announcement yet.

Facebook Has Been Working On BOLT'ing The Linux Kernel For Greater Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Facebook Has Been Working On BOLT'ing The Linux Kernel For Greater Performance

    Phoronix: Facebook Has Been Working On BOLT'ing The Linux Kernel For Greater Performance

    For several years now Facebook engineers have been working on BOLT as a way to speed-up Linux/ELF binaries. This "Binary Optimization and Layout Tool" is able to re-arrange executables once profiled to generate even faster performance than what can be achieved by a compiler's LTO and PGO optimizations. One of the latest BOLT efforts has been on optimizing the Linux kernel image...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.

    Comment


    • #3
      Originally posted by SofS View Post
      I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.
      O3 is mainly not recommended because gcc likes to replace UB with catastrophically failing code, and the kernel has more UB than we'd all like it too (not saying you can have a kernel without UB, but definitely less than linux has right now). Also historically, gccs vectorizer used to have *interesting* miscompilations every other month - this has been fixed and very reliable for the past few major releases though.

      BOLT isn't exactly a risky optimization. It reorders (and potentially deduplicates) code so that "hot" sections fit together / are close, to optimize cache lines. Since most of the kernel is built as PIC (position independent code) this is not an issue.

      Comment


      • #4
        Once profiled? So optimized toward a particular way of using the kernel?

        Comment


        • #5
          Originally posted by Jannik2099 View Post
          BOLT isn't exactly a risky optimization. It reorders (and potentially deduplicates) code so that "hot" sections fit together / are close, to optimize cache lines.
          In addition to the CPU cache line improvements, it can also improve the TLB efficiency, and the branch predictor. As with all else, YMWV.

          Comment


          • #6
            Originally posted by thechef View Post
            Once profiled? So optimized toward a particular way of using the kernel?
            Yes, but for the hyperscalers (such as Facebook, or Google), there are many many thousands of systems doing specific things in specific ways, and if they can reduce the number of those systems by improving efficiency they can save many millions of dollars in both system acquisition and ongoing operational costs. So, one can imagine, a BOLT'd kernel for thousands of systems doing their file systems, and a different BOLT'd kernel for their thousands of systems running their web application services, etc.

            Comment


            • #7
              Originally posted by SofS View Post
              I understand where this is coming from, but is it not too risky to automatically optimize to such extent the Kernel? There may be unintended side effects. This is similar to why -O3 is not recommended.
              Binary layout optimization is pretty safe, because if your regions don't overlap, and the linker is linking the right symbols (these are easy to check) then it's working fine. Unlike fine grained optimizations, the risk of layout optimizations affecting the functioning of the kernel is virtually zero.

              Comment


              • #8

                Great, now Facebook can take your data and sell it even faster.
                Last edited by sophisticles; 25 September 2021, 04:15 PM.

                Comment


                • #9
                  Originally posted by CommunityMember View Post

                  Yes, but for the hyperscalers (such as Facebook, or Google), there are many many thousands of systems doing specific things in specific ways, and if they can reduce the number of those systems by improving efficiency they can save many millions of dollars in both system acquisition and ongoing operational costs. So, one can imagine, a BOLT'd kernel for thousands of systems doing their file systems, and a different BOLT'd kernel for their thousands of systems running their web application services, etc.
                  And beyond this, you can score layouts by different metrics as well. If you measured desktop latency on an AMD APU with GNOME, conceivably you could find binary layouts for the kernel and modules that have smaller latency spikes when interacting with desktop GUI apps; or maybe instead you do it for games. Maybe Valve can find binary layouts for the kernel, with static drivers for their hardware, that minimize frame time spikes in selected performance-critical games.
                  Last edited by microcode; 25 September 2021, 03:54 PM.

                  Comment


                  • #10
                    Originally posted by thechef View Post
                    Once profiled? So optimized toward a particular way of using the kernel?
                    Actualy no. It wouldn't change *how* it is using the kernel at all... only reordering the program so that it stays in cache more often.

                    Comment

                    Working...
                    X