Announcement

**Kjell** · 18 November 2023, 02:27 PM

CachyOS and GentooLTO must be proud

**RealNC** · 18 November 2023, 10:16 PM

Originally posted by Kjell View Post

CachyOS and GentooLTO must be proud

Nah, that's just LTO. PGO is a different beast entirely.

I just checked on Gentoo, and LLVM does not have an option to build it with PGO. GCC does though.

**down1** · 19 November 2023, 03:52 AM

Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?

**ochita** · 19 November 2023, 05:33 AM

Originally posted by down1 View Post

Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?

I read it as the clang compiler is 22% faster at compiling other programs, not that the programs being compiled is 22% faster

**brad0** · 19 November 2023, 06:14 AM

Originally posted by down1 View Post

Sorry, jargon confusion. Is this making the windows operating system 20% faster or some windows apps?

The compiler and toolchain would be 22% faster.

**zamazan4ik** · 19 November 2023, 09:23 AM

Much more about PGO benchmarks for different applications can be found here: https://github.com/zamazan4ik/awesome-pgo/

**ms178** · 19 November 2023, 03:21 PM

Originally posted by Kjell View Post

CachyOS and GentooLTO must be proud

CachyOS does use PGO on GCC, but not on the LLVM/Clang distro compiler as far as I know. However, CachyOS provides a llvm-bolt build which you can download from https://aur.cachyos.org/ (which is even more optimized as you get a BOLTed Clang binary on top of LTO+PGO), but that comes with some restrictions (e.g. it is not a system-compiler replacement but needs to get called via an environment variable in the terminal before you can use it) and is only usable where you don't need the llvm-libs (which is thankfully still of value for a lot of projects, but for compiling Mesa, you'd need to revert MR 25042 locally and keep the default system LLVM compiler for the llvm-libs; I haven't tried to get rid of that dependency by solely using ACO for radeonsi yet).

**binarybanana** · 20 November 2023, 06:05 AM

I wish there was some kind of public pool for profiling data. It's possible to merge profliling data across different runs, so with some caveats a database where users publish their profiles to get them merged would be doable. The big problem with PGO is getting representative profiles. Some projects (gcc, python, probably llvm also) just use their test suite to generate them, but that is not necessarily optimal. With a public database we'd have profiling data that reflects use across a wider spectrum of real world use cases. With that, compiling everything with PGO would be rather simple and take no more time than a normal non-PGO build. (Maybe FDO would be more appropriate since it seems to deal with less than perfect profiles better, but that's more of a nitpick. The results are comparable.)

**zamazan4ik** · 20 November 2023, 11:49 AM

Originally posted by binarybanana View Post

I wish there was some kind of public pool for profiling data. It's possible to merge profliling data across different runs, so with some caveats a database where users publish their profiles to get them merged would be doable. The big problem with PGO is getting representative profiles. Some projects (gcc, python, probably llvm also) just use their test suite to generate them, but that is not necessarily optimal. With a public database we'd have profiling data that reflects use across a wider spectrum of real world use cases. With that, compiling everything with PGO would be rather simple and take no more time than a normal non-PGO build. (Maybe FDO would be more appropriate since it seems to deal with less than perfect profiles better, but that's more of a nitpick. The results are comparable.)

Agree with your proposal about the shared profiles pool. It would be very beneficial for all PGO users/package maintainers. However, many things should be considered here like privacy issues, infrastructure around profile collection, some mitigations against potentially malicious PGO profiles, etc. It's all solvable IMO but right now unfortunately there is no ready to use infra for this. Maybe one day it will be fixed.

Regarding currently used profiles. AFAIK, only CPython uses its test suite as a training workload. Bot GCC and LLVM (more precisely Clang since PGO in LLVM right now is enabled only for Clang, for other LLVM subprojects there is an open issue: https://github.com/llvm/llvm-project/issues/63486) as a PGO training workload use compiling GCC/LLVM itself with the instrumented compiler, it's done via multi-stage builds (so instrumented GCC compiles GCC iteself to collect the profiles, Clang compiles Clang) but of course maintainers can decide to choose another training workload).

Sampling PGO (also known as AutoFDO: https://github.com/google/autofdo) is an interesting approach to perform PGO in practice with an aim to reduce PGO instrumentation overhead but also has some limitations like requirement for having LBR/BRS support in your hardware, weak tooling support from the Google side (you can check AutoFDO issue tracker for them), etc. Things here also need to be improved.

Announcement

LLVM Now Using PGO For Building x86_64 Windows Release Binaries: ~22% Faster Builds

LLVM Now Using PGO For Building x86_64 Windows Release Binaries: ~22% Faster Builds

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment