Ubuntu Praises 5~7% PGO Compiler Optimization Performance Benefits
Over the past year we have seen Canonical engineers focus more on optimizing the performance potential of Ubuntu Linux. With Ubuntu 25.04 they are now using the -O3 compiler optimization level by default and there has been other efforts like better performance tooling on Ubuntu and frame pointers by default. Another area they have been exploring is making use of Profile Guided Optimizations (PGO) for faster performance in certain scenarios.
Profile Guided Optimizations and related profile-based optimizations is much harder to pull off at large for Linux distributions due to the reliance on needing accurate profiles that are representative of real-world use. Without accurate profiles, PGO and related techniques like AutoFDO are less useful for the compiler to make intelligent optimization decisions. Sergio Durigan Junior has recently been exploring PGO optimizations for RISC-V while using QEMU emulation and running from AMD Ryzen hardware.
Sergio explored the PGO performance benefits across building OpenSSL, GDB, Emacs, and Python in the context of RISC-V-on-x86_64 emulation with QEMU given that is how the Ubuntu build farm is setup for RISC-V packages. The perf-based profile generation was based on the QEMU process itself and seeing what sort of build speed improvements there could be from using PGO.
Today's Ubuntu Blog post concludes with:
Here's their PGO Python 3.12 build comparison as an example:
So while Canonical/Ubuntu have no plans for using PGO at scale for being able to optimize all Ubuntu packages given the dependence on profiles, this more narrowed exploration of speeding up their RISC-V package compilation with x86_64 build farm is showing some very beneficial 5~7% performance improvements thanks to Profile Guided Optimizations.
Profile Guided Optimizations and related profile-based optimizations is much harder to pull off at large for Linux distributions due to the reliance on needing accurate profiles that are representative of real-world use. Without accurate profiles, PGO and related techniques like AutoFDO are less useful for the compiler to make intelligent optimization decisions. Sergio Durigan Junior has recently been exploring PGO optimizations for RISC-V while using QEMU emulation and running from AMD Ryzen hardware.
Sergio explored the PGO performance benefits across building OpenSSL, GDB, Emacs, and Python in the context of RISC-V-on-x86_64 emulation with QEMU given that is how the Ubuntu build farm is setup for RISC-V packages. The perf-based profile generation was based on the QEMU process itself and seeing what sort of build speed improvements there could be from using PGO.
Today's Ubuntu Blog post concludes with:
"Overall, we can conclude that using PGO did make a difference in the performance achieved by the modified QEMU package. We saw average CPU utilization and build time improvements of around 5-7%, which is significant especially when we consider that Launchpad performs several builds per day. A quick, back-of-the-envelope calculation shows that if we are able to save 5 minutes on a 1 hour build, and if we perform one build after another, over the course of a day we would have saved 2 hours, or two full builds. Conversely, when it comes to CPU utilization, improving the numbers might translate to energy savings or even the possibility of running more parallel QEMU processes.
...
Another important scenario where PGO might be helpful is when a developer is trying to manually optimize a piece of software (using regular profiling tools, like perf top or perf record), but the program is so complex that it is hard to pinpoint what to actually improve. It is important to analyze the constraints of your particular scenario and decide whether using PGO makes sense.
Nonetheless, as we have seen above, Profile-Guided Optimization is a great technique when it comes to increasing the performance of a software, and we hope that this blog post and our documentation can paint a better picture of how you can implement it in your stack."
Here's their PGO Python 3.12 build comparison as an example:
So while Canonical/Ubuntu have no plans for using PGO at scale for being able to optimize all Ubuntu packages given the dependence on profiles, this more narrowed exploration of speeding up their RISC-V package compilation with x86_64 build farm is showing some very beneficial 5~7% performance improvements thanks to Profile Guided Optimizations.
13 Comments