Announcement

**avis** · 24 February 2024, 08:07 AM

In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).

**binarybanana** · 24 February 2024, 09:04 AM

gcc enabled auto vectorization and maybe other stuff in O2 recently so it's not that surprising that in some (many/most?) cases there is no longer a benefit to going with O3.

**Weasel** · 24 February 2024, 03:08 PM

Originally posted by avis View Post

In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).

Wine doesn't support lto either, I think.

**oleid** · 24 February 2024, 05:08 PM

Originally posted by avis View Post

In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).

It highly depends on your code base. When doing number crunching or simulations -O3 can help a lot, mostly due to certain floating point tricks that are not standard conform.

In my experience LTO usually doesn't yield faster albeit smaller binaries. That being said, since the result was never slower in my testing, I always use LTO for release builds.

**shmerl** · 24 February 2024, 10:13 PM

I lately switched to zstd. Slightly bigger compressed size, but way faster and fully parallelized.

**ahrs** · 25 February 2024, 06:04 AM

Originally posted by avis View Post

In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).

I think you could do PGO with Wine, the problem would be how would you properly exercise it? It has the same problems as PGO in the Linux kernel, which I think Google has also experimented with but ultimately rejected upstream for not being viable. It's not hard to imagine how you you could devise a set of profiles that favour performance in some scenarios over others and doesn't properly exercise everything and sometimes ends up reducing performance as a result.

**V1tol** · 25 February 2024, 08:15 AM

During my experiments I found that doing -O3 with -ffunction-sections -fdata-sections and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.

**binarybanana** · 26 February 2024, 05:03 AM

Originally posted by V1tol View Post

During my experiments I found that doing -O3 with -ffunction-sections -fdata-sections and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.

Will try your flags on my system, thanks. Currently I'm using "vanilla" gentooLTO flags + -falign-functions=64 (Intel CPU), the latter which might interact with -functions sections, need to check that.

**Anux** · 26 February 2024, 05:18 AM

Originally posted by V1tol View Post

During my experiments I found that doing -O3 with -ffunction-sections -fdata-sections and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.

O3 leading to smaller binary seems wrong to me. The main thing that O3 does is more unrolling so it should always lead to bigger binaries or same size if O2 already unrolled everything.

If O3 is faster depends on your code and the CPU you use. Older or low end CPUs typically suffer from cache misses the larger your binaries get.

I'm toying around with some rust and my own code runs faster with O2 while the image crate that saves to png is faster with O3.

Announcement

XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment