Announcement

Collapse
No announcement yet.

XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

    Phoronix: XZ 5.6 Released: Sandboxing Improvements, Prefers -O2 Instead Of -O3

    XZ Utils 5.6 was released today for this general purpose data compression library that also provides the common XZ command-line utilities for .xz format handling...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).

    Comment


    • #3
      gcc enabled auto vectorization and maybe other stuff in O2 recently so it's not that surprising that in some (many/most?) cases there is no longer a benefit to going with O3.

      Comment


      • #4
        Originally posted by avis View Post
        In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).
        Wine doesn't support lto either, I think.

        Comment


        • #5
          Originally posted by avis View Post
          In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).
          It highly depends on your code base. When doing number crunching or simulations -O3 can help a lot, mostly due to certain floating point tricks that are not standard conform.

          In my experience LTO usually doesn't yield faster albeit smaller binaries. That being said, since the result was never slower in my testing, I always use LTO for release builds.
          Last edited by oleid; 25 February 2024, 02:28 AM. Reason: part about release builds added

          Comment


          • #6
            I lately switched to zstd. Slightly bigger compressed size, but way faster and fully parallelized.

            Comment


            • #7
              Originally posted by avis View Post
              In my experience -O2 -flto produces faster and smaller binaries than -O3. And PGO adds performance on top of that (but PGO is not always viable and some applications don't support it, e.g. Wine).
              I think you could do PGO with Wine, the problem would be how would you properly exercise it? It has the same problems as PGO in the Linux kernel, which I think Google has also experimented with but ultimately rejected upstream for not being viable. It's not hard to imagine how you you could devise a set of profiles that favour performance in some scenarios over others and doesn't properly exercise everything and sometimes ends up reducing performance as a result.

              Comment


              • #8
                During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.

                Comment


                • #9
                  Originally posted by V1tol View Post
                  During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.
                  Will try your flags on my system, thanks. Currently I'm using "vanilla" gentooLTO flags + -falign-functions=64 (Intel CPU), the latter which might interact with -functions sections, need to check that.
                  Last edited by binarybanana; 26 February 2024, 05:06 AM.

                  Comment


                  • #10
                    Originally posted by V1tol View Post
                    During my experiments I found that doing -O3 with -ffunction-sections​ -fdata-sections​ and -Wl,--gc-sections produces the same or even smaller binaries than -O2 while being almost always faster. Even Mesa is working using that flags, won't even try on kernel builds. Of course I use LTO whenever supported.
                    O3 leading to smaller binary seems wrong to me. The main thing that O3 does is more unrolling so it should always lead to bigger binaries or same size if O2 already unrolled everything.

                    If O3 is faster depends on your code and the CPU you use. Older or low end CPUs typically suffer from cache misses the larger your binaries get.

                    I'm toying around with some rust and my own code runs faster with O2 while the image crate that saves to png is faster with O3.

                    Comment

                    Working...
                    X