LTO'ing Mesa Is Getting Discussed For Performance & Binary Size Reasons
Enabling compiler Link-Time Optimizations (LTO) by default for Mesa in non-debug builds is being discussed in the name of performance and binary size.
Following an unsolicited patch for enabling the -flto flag for Mesa builds, the discussion thread moved onto whether LTO'ing Mesa was appropriate and would make any meaningful difference.
The initial performance change indicated from LTO'ing Mesa was "about 1% for glxgears." Of course, this change would only be appropriate for doing Link Time Optimizations on production builds since LTO'ed binaries are much harder if not impossible to fully debug.
An Intel developer commented that LTO'ing Mesa did yield some performance improvements in previous tests. For the graphics games/applications more CPU bound, LTO'ing Mesa has the potential for performance wins -- especially if using the modern GCC 5 or GCC 6 compiler stack.
Besides performance, a developer from Kitware reported big size wins for the resulting binaries once passed through the LTO process. "With gcc 5.3.1 I end up with lib{GL,OSMesa}.so @ 44M and libswrAVX{,2}.so @ 70M. With flto turned on it drops WAY down to lib{GL,OSMesa}.so @ 13M and libswrAVX{,2}.so @ 18M."
Hopefully this discussion will prove useful and we'll see LTO turned on by default in Mesa. I'll work on some fresh benchmarks of real-world OpenGL workloads with optimized vs. unoptimized Mesa stacks for some fun. Of course, outside of Mesa you can see my many other past LTO compiler benchmarks.
Following an unsolicited patch for enabling the -flto flag for Mesa builds, the discussion thread moved onto whether LTO'ing Mesa was appropriate and would make any meaningful difference.
The initial performance change indicated from LTO'ing Mesa was "about 1% for glxgears." Of course, this change would only be appropriate for doing Link Time Optimizations on production builds since LTO'ed binaries are much harder if not impossible to fully debug.
An Intel developer commented that LTO'ing Mesa did yield some performance improvements in previous tests. For the graphics games/applications more CPU bound, LTO'ing Mesa has the potential for performance wins -- especially if using the modern GCC 5 or GCC 6 compiler stack.
Besides performance, a developer from Kitware reported big size wins for the resulting binaries once passed through the LTO process. "With gcc 5.3.1 I end up with lib{GL,OSMesa}.so @ 44M and libswrAVX{,2}.so @ 70M. With flto turned on it drops WAY down to lib{GL,OSMesa}.so @ 13M and libswrAVX{,2}.so @ 18M."
Hopefully this discussion will prove useful and we'll see LTO turned on by default in Mesa. I'll work on some fresh benchmarks of real-world OpenGL workloads with optimized vs. unoptimized Mesa stacks for some fun. Of course, outside of Mesa you can see my many other past LTO compiler benchmarks.
36 Comments