Announcement

Collapse
No announcement yet.

LLVM Now Using PGO For Building x86_64 Windows Release Binaries: ~22% Faster Builds

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • LLVM Now Using PGO For Building x86_64 Windows Release Binaries: ~22% Faster Builds

    Phoronix: LLVM Now Using PGO For Building x86_64 Windows Release Binaries: ~22% Faster Builds

    The LLVM project is now employing profile-guided optimizations (PGO) when building their x86_64 Microsoft Windows release packages. Making use of PGO is able to make their Clang build a stunning 22% faster...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    CachyOS and GentooLTO must be proud

    Comment


    • #3
      Originally posted by Kjell View Post
      CachyOS and GentooLTO must be proud
      Nah, that's just LTO. PGO is a different beast entirely.

      I just checked on Gentoo, and LLVM does not have an option to build it with PGO. GCC does though.

      Comment


      • #4
        Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?

        Comment


        • #5
          Originally posted by down1 View Post
          Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?
          I read it as the clang compiler is 22% faster at compiling other programs, not that the programs being compiled is 22% faster

          Comment


          • #6
            Originally posted by down1 View Post
            Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?
            The compiler and toolchain would be 22% faster.

            Comment


            • #7
              Much more about PGO benchmarks for different applications can be found here: https://github.com/zamazan4ik/awesome-pgo/

              Comment


              • #8
                Originally posted by Kjell View Post
                CachyOS and GentooLTO must be proud
                CachyOS does use PGO on GCC, but not on the LLVM/Clang distro compiler as far as I know. However, CachyOS provides a llvm-bolt build which you can download from https://aur.cachyos.org/ (which is even more optimized as you get a BOLTed Clang binary on top of LTO+PGO), but that comes with some restrictions (e.g. it is not a system-compiler replacement but needs to get called via an environment variable in the terminal before you can use it) and is only usable where you don't need the llvm-libs (which is thankfully still of value for a lot of projects, but for compiling Mesa, you'd need to revert MR 25042 locally and keep the default system LLVM compiler for the llvm-libs; I haven't tried to get rid of that dependency by solely using ACO for radeonsi yet).
                Last edited by ms178; 19 November 2023, 03:24 PM.

                Comment


                • #9
                  I wish there was some kind of public pool for profiling data. It's possible to merge profliling data across different runs, so with some caveats a database where users publish their profiles to get them merged would be doable. The big problem with PGO is getting representative profiles. Some projects (gcc, python, probably llvm also) just use their test suite to generate them, but that is not necessarily optimal. With a public database we'd have profiling data that reflects use across a wider spectrum of real world use cases. With that, compiling everything with PGO would be rather simple and take no more time than a normal non-PGO build. (Maybe FDO would be more appropriate since it seems to deal with less than perfect profiles better, but that's more of a nitpick. The results are comparable.)
                  Last edited by binarybanana; 20 November 2023, 06:08 AM.

                  Comment


                  • #10
                    Originally posted by binarybanana View Post
                    I wish there was some kind of public pool for profiling data. It's possible to merge profliling data across different runs, so with some caveats a database where users publish their profiles to get them merged would be doable. The big problem with PGO is getting representative profiles. Some projects (gcc, python, probably llvm also) just use their test suite to generate them, but that is not necessarily optimal. With a public database we'd have profiling data that reflects use across a wider spectrum of real world use cases. With that, compiling everything with PGO would be rather simple and take no more time than a normal non-PGO build. (Maybe FDO would be more appropriate since it seems to deal with less than perfect profiles better, but that's more of a nitpick. The results are comparable.)
                    Agree with your proposal about the shared profiles pool. It would be very beneficial for all PGO users/package maintainers. However, many things should be considered here like privacy issues, infrastructure around profile collection, some mitigations against potentially malicious PGO profiles, etc. It's all solvable IMO but right now unfortunately there is no ready to use infra for this. Maybe one day it will be fixed.

                    Regarding currently used profiles. AFAIK, only CPython uses its test suite as a training workload. Bot GCC and LLVM (more precisely Clang since PGO in LLVM right now is enabled only for Clang, for other LLVM subprojects there is an open issue: https://github.com/llvm/llvm-project/issues/63486) as a PGO training workload use compiling GCC/LLVM itself with the instrumented compiler, it's done via multi-stage builds (so instrumented GCC compiles GCC iteself to collect the profiles, Clang compiles Clang) but of course maintainers can decide to choose another training workload).

                    Sampling PGO (also known as AutoFDO: https://github.com/google/autofdo) is an interesting approach to perform PGO in practice with an aim to reduce PGO instrumentation overhead but also has some limitations like requirement for having LBR/BRS support in your hardware, weak tooling support from the Google side (you can check AutoFDO issue tracker for them), etc. Things here also need to be improved.

                    Comment

                    Working...
                    X