Page 2 of 2 FirstFirst 12
Results 11 to 15 of 15

Thread: Link-Time Optimizations With GCC 4.8

  1. #11
    Join Date
    Nov 2012
    Posts
    157

    Default

    Quote Originally Posted by curaga View Post
    That depends on your toolchain - IIRC non-fat lto requires gold instead of the usual GNU ld.
    But you need to link using GCC or gold anyway to get LTO? The fat object files just makes it possible to link without using LTO. So non-fat objects also helps to check that you really are getting LTO and not just the old code through the fallback.

  2. #12
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by carewolf View Post
    No, then you get -O0 optimizations. LTO means link-time optimizations, which means the linker does the optimizations, which again means the linker needs the optimization flags, but the compiler does not.

    So
    CXXFLAGS = -flto
    LDFLAGS = -O3 -march=native -flto -fwhole-program
    Ah, yes, that makes much better sense. I just noticed that unless I passed the optimization options to the linker flags I got poor optimization (likely -O0).

    Anyway, as I said earlier I think this is what is the problem with the regressions in Michael's tests. I doubt he has passed the optimization options to the LDFLAGS in the tests where these regressions occur. LTO often doesn't yield any 'worthwhile' gains in my benchmarks but also hasn't caused any worse performance for me. The overall benefit I've noticed is that the binaries pretty much always end up quite a bit smaller (likely due to dead/duplicate code removal, more efficient code reordering etc).

    Yes that pretty much sums it up, good pointer.

  3. #13
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,029

    Default

    Quote Originally Posted by carewolf View Post
    But you need to link using GCC or gold anyway to get LTO? The fat object files just makes it possible to link without using LTO. So non-fat objects also helps to check that you really are getting LTO and not just the old code through the fallback.
    I hit that with my toolchain - I couldn't use non-fat LTO, but I could use fat LTO. I definitely got the benefits (10% smaller binaries).

    Quote from the gcc manual:
    -ffat-lto-objects
    Fat LTO objects are object files that contain both the intermediate language and the object code. This makes them usable for both LTO linking and normal linking. This option is effective only when compiling with -flto and is ignored at link time.

    -fno-fat-lto-objects improves compilation time over plain LTO, but requires the complete toolchain to be aware of LTO. It requires a linker with linker plugin support for basic functionality. Additionally, nm, ar and ranlib need to support linker plugins to allow a full-featured build environment (capable of building static libraries etc).
    (emphasis mine)

  4. #14
    Join Date
    Feb 2013
    Posts
    2

    Default

    right, and this speeds up the compilation, the "time to compile" benchmark are completely messed up

  5. #15
    Join Date
    Feb 2013
    Posts
    2

    Default

    The dhrystone benchmark is crap.

    That benchmark is a derivate of the original 1988 dry.c which was composed by two separate .c files.
    Those two files were kept separate to avoid explicitly the compiler to inline function.

    example, assume to write a tool to benchnark the integer math, so we slipt it in mul.c and div.c with this functions:

    Code:
    int mul(int a, int b)
    {
        return a * b;
    }
    Code:
    int div(int a, int b)
    {
        return a / b;
    }
    and from our main call:

    Code:
    int test(int x)
    {
        for(int i = 1; i < x; i++)
            div(mul(i, 100), 25);
    }
    inlining those two functions will generate something like:

    Code:
    int test(int x)
    {
        for(int i = 1; i < x; i++)
            (i * 100) / 25;
    }
    which the compiler optimize as

    Code:
    int test(int x)
    {
        for(int i = 1; i < x; i++)
            i * 4;
    }
    With a huge performance gain.
    While LTO is good in real use, it can fake many benchmarks, so I'll use it only on real world scenarios.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •