Announcement

Collapse
No announcement yet.

New Ruby Benchmarks On GCC vs. LLVM Clang Compilers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New Ruby Benchmarks On GCC vs. LLVM Clang Compilers

    Phoronix: New Ruby Benchmarks On GCC vs. LLVM Clang Compilers

    Earlier this month were the independent benchmark results that saw Ruby built under Clang was faster than GCC when a developer running Debian was doing some basic compiler performance tests. Now another developer has done more extensive Ruby benchmarks on varying versions of GCC and Clang...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Originally posted by phoronix View Post
    Phoronix: New Ruby Benchmarks On GCC vs. LLVM Clang Compilers

    Earlier this month were the independent benchmark results that saw Ruby built under Clang was faster than GCC when a developer running Debian was doing some basic compiler performance tests. Now another developer has done more extensive Ruby benchmarks on varying versions of GCC and Clang...

    http://www.phoronix.com/vr.php?view=MTg2NTc
    Interesting, still it would be nice to see data with LTO and FDO.

    Comment


    • #3
      Originally posted by hubicka View Post
      Interesting, still it would be nice to see data with LTO and FDO.
      I have a deep respect of your work with LTO but still, I think that many distros have to enable LTO first at least for some select packages...

      I remember I just made some patches to a small(er) OSS project which "cannot be migrated to use C++ 11 because I want to support platform X ... which has GCC 4.4".

      Ruby should work by default with most compilers under the sun and the provided makefile does not add "-flto" by default because of these issues, and at the end everyone misses the optimization.

      Comment


      • #4
        Mike, don't post junk like that please.
        1. Unlike the stated conclusions, the benchmark tells nothing about how much speed advantage does a compiler X has over compiler Y because winning by being 0.00000001% faster gives exactly the same score as winning by being 90% faster.
        2. It's staged against clang in that if all versions of clang do a thing better than all versions of gcc, the best gcc variant gets a 3 point penalty. If gcc is better, the best clang variant gets a 7 point penalty.
        3. Why did the author skip O3 and O4 in clang?

        Overall, the idea of creating such benchmark is good, but the author should have used a geometric mean instead of a custom metric and the highest -O flags. Right now the benchmark is useless.

        Comment


        • #5
          Originally posted by hubicka View Post
          Interesting, still it would be nice to see data with LTO and FDO.
          LTO is often impossible or impractical on large projects, and if the code is fairly well designed it doesn't gain much.

          Comment


          • #6
            Originally posted by carewolf View Post
            LTO is often impossible or impractical on large projects, and if the code is fairly well designed it doesn't gain much.
            Well, Ruby is not that big to run into LTO scalability issues. I work on GCC's implementation of LTO and regularly test on FIrefox and Libreoffice
            This time I will write about building Firefox with LTO. Firefox is one of largest binaries on my hard drive (beaten only by Chromium) so it ...

            After Firefox , I decided to look into LibreOffice performance with link time optimizations (LTO) and profile feedback (FDO). This post was ...


            While waiting for 6 minutes for your final link of Firefox is not fun, it is not completely impossible and impractical anymore.
            I agree that there is not much of LTO can do you can't do by sturcturing your code, but in combination with profile feedback this is a different story.
            Especially for scripting languaes, like ruby, FDO is easy to use and tends to give interesting speedups.

            I tried to build Ruby and reported some numbers at http://www.phoronix.com/forums/showt...aster-Than-GCC (using geomavg). I know nothing about Ruby benchmarking. I tried the benchmark quoted in this article and -O3 to -O3 -flto -fprofile-use perofrmance difference was about 1.4% that was bit less than I expected from my bash/Python experience. But without bit of investigation I have no idea what I really measured

            Comment


            • #7
              I re-run the benchmark suite with clang version 3.6.0 (trunk 219925) (llvm/trunk 219924) and GCC 4.9.1. The following are results normalized to GCC -O3 build sorted by clang's relative performance from best to worst. Clang seems 3.67% slower in geometric average than GCC, GCC with LTO+FDO seems 1.78% faster than GCC with default flags alone.

              Comment


              • #8
                Originally posted by hubicka View Post
                Well, Ruby is not that big to run into LTO scalability issues. I work on GCC's implementation of LTO and regularly test on FIrefox and Libreoffice
                This time I will write about building Firefox with LTO. Firefox is one of largest binaries on my hard drive (beaten only by Chromium) so it ...

                After Firefox , I decided to look into LibreOffice performance with link time optimizations (LTO) and profile feedback (FDO). This post was ...


                While waiting for 6 minutes for your final link of Firefox is not fun, it is not completely impossible and impractical anymore.
                I agree that there is not much of LTO can do you can't do by sturcturing your code, but in combination with profile feedback this is a different story.
                Especially for scripting languaes, like ruby, FDO is easy to use and tends to give interesting speedups.

                I tried to build Ruby and reported some numbers at http://www.phoronix.com/forums/showt...aster-Than-GCC (using geomavg). I know nothing about Ruby benchmarking. I tried the benchmark quoted in this article and -O3 to -O3 -flto -fprofile-use perofrmance difference was about 1.4% that was bit less than I expected from my bash/Python experience. But without bit of investigation I have no idea what I really measured
                Well I am QtWebKit developer, just linking QtBase with LTO takes about half an hour, and I have never succesfully managed to link QtWebKit with it. Compared to normally being able to build all of qtbase in 5 minutes (with icecream) half an hour of linking is a long time. Plus it gained me no practical speed up last I tried.

                Comment


                • #9
                  I've never dared to try LTOing webkit, the 30min builds are long enough as is.

                  If only GNU ld would implement gold's ICF + gc-sections for C++ classes using interfaces (I have a feature request bug open for the latter with a testcase), I'd be a happy camper.

                  Comment


                  • #10
                    Originally posted by curaga View Post
                    I've never dared to try LTOing webkit, the 30min builds are long enough as is.

                    If only GNU ld would implement gold's ICF + gc-sections for C++ classes using interfaces (I have a feature request bug open for the latter with a testcase), I'd be a happy camper.
                    Both GNU LD and Gold works with LTO (GCC or LLVM). GCC 5.0 will have its own ICF at LTO time (it is not 100% replacement of Gold's - it finds different set of equivalences. Some functions are equivalent before later optimization but others can not be unified because in-compiler representation, like alias sets, differs).

                    I build Qt' webkit with LTO at time GCC 4.8 and 4.9 was releases I plan to re-try with 5.0 soon. I never had luck with Chromium because I can not get past the build system downloading binary of LLVM that is linked with wrong version of libstdc++ and crashes. I did not try terribly hard though. Also something I wan to re-try.

                    For large projects LTO takes bit effort to setup right.

                    A way to cut build time down is to use newer compiler (4.7/4.8/4.9 all significantly sped up LTO between releases, see the firefox blog post I linked above). 5.0 ought to be somewhat faster again, but I did not run full benchmarks yet because it is too early in stage3. For parallel build you should use -flto=n where n is number of your CPUs. It makes a lot of difference

                    My devel box links Firefox in about 5-6minutes with 24 threads (AMD Opteron(TM) Processor 6272, 2.1 Ghz), my notebook needs about 12 minutes with 4 threads for Firefox (1.9Ghz). Chromium, according to Martin's results, takes bit longer, but not by much. This may have changed as libxul in Firefox got bigger over time perhaps faster than Chromium grew.

                    As for runtime improvements, you can see most gains in code size and in speed of code that is not tightly optimized into one function already. For example most of javascript benchmarks spends most of time by JIT generated code and by tight loops comparing strings, handling unicode etc. Not much to speed up there. Benchmarks like SVG rendering, DOM handling or window opening benefit more.

                    If you point me to way to benchmark webkit, I can give it a try this release period.

                    Comment

                    Working...
                    X