Announcement

Collapse
No announcement yet.

GCC 11 Compiler Performance Benchmarks With Various Optimization Levels, LTO

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 11 Compiler Performance Benchmarks With Various Optimization Levels, LTO

    Phoronix: GCC 11 Compiler Performance Benchmarks With Various Optimization Levels, LTO

    Given the recent forum discussion stemming from the -O3 optimization level still too unsafe for the Linux kernel (in part due to older, buggy compilers) and some users wondering about the current -O2 versus -O3 compiler optimization level impact, here is a fresh round of reference benchmarks using GCC 11.1 on Fedora Workstation 33 looking at various optimization levels and optimizations tested on dozens of different application benchmarks to see the overall impact on performance.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I wonder if we can use this to characterize apps as to what kinds of execution they perform. Mostly floating point (-Ofast), mostly integer (-02 -march=native), lots of branching (-flto)...

    Reminds me of how SPARC was designed and they mostly analyzed the X server they were using and it did a lot of nested function calls, so they decided what was really important was handling function calls quickly. That lead them to the register windows design which ultimately killed them. Sadly, it was less of a negative when highly parallel super-scalar implementations came about as register windows is just a little lump in the full up register renaming that happens in high performance designs. Good times.

    Comment


    • #3
      so as i see the best method is to run your own test case and then choose the appropriate optimization method for your own case.

      Comment


      • #4
        Thanks for the tests. I'm really looking forward to see the impact of PGO with and without LTO. ​​​​​​

        Would it be possible to add some common server applications like Apache, PHP, MariaDB and Redis?

        Thanks!

        Comment


        • #5
          There is really something wrong with -flto slowing the benchmarks down on average. Even if they are all preoptimized it should be at least no worse.

          Comment


          • #6
            Originally posted by carewolf View Post
            There is really something wrong with -flto slowing the benchmarks down on average. Even if they are all preoptimized it should be at least no worse.
            Yeah all cases where -flto is producing a slower binary must be seen as a bug in the compiler. And really the same should be seen where -march=native produces slower code.

            Comment


            • #7
              Wow I knew these damn flags mattered.

              After the last kernel thread I went back to -O2 -mach=native from -O3 -march=native but looks like I’m switching to -Ofast -march=native as it looks be a safer, speedier alternative to O3

              Huge thanks for these tests Michael!!

              Comment


              • #8
                Originally posted by F.Ultra View Post

                Yeah all cases where -flto is producing a slower binary must be seen as a bug in the compiler. And really the same should be seen where -march=native produces slower code.
                One place it happens is fine. Sometimes things that should be faster in general are slower in specific cases. It is only a problem when it happens on multiple places.

                Comment


                • #9
                  Originally posted by perpetually high View Post
                  After the last kernel thread I went back to -O2 -mach=native from -O3 -march=native but looks like I’m switching to -Ofast -march=native as it looks be a safer, speedier alternative to O3
                  Allow me to quote the manual for you:
                  ...
                  -Ofast Disregard strict standards compliance. -Ofast enables all -O3
                  optimizations. It also enables optimizations that are not valid
                  for all standard-compliant programs.
                  It turns on -ffast-math,
                  -fallow-store-data-races and the Fortran-specific -fstack-arrays,
                  unless -fmax-stack-var-size is specified, and -fno-protect-parens.
                  ...
                  Last edited by sdack; 14 June 2021, 03:56 PM.

                  Comment


                  • #10
                    Originally posted by perpetually high View Post
                    Wow I knew these damn flags mattered.

                    After the last kernel thread I went back to -O2 -mach=native from -O3 -march=native but looks like I’m switching to -Ofast -march=native as it looks be a safer, speedier alternative to O3

                    Huge thanks for these tests Michael!!
                    I hope you are joking. Well at least the kernel shouldn't be using Floating Point, so OFast probably won't break anything. But you shouldn't use OFast on any project that isn't verified for it (means they do not rely on specific FP behavior, and don't mind losing one bit of accuracy). For instance compiling a browser with OFast would make the JS engine fail the Number parts of every JS test-suite, though the accuracy is generally fine as it has always been variable on different hardware between 32, 64 or even 80bit(x87) FP accuracy .

                    Comment

                    Working...
                    X