Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Link-Time Optimizations With GCC 4.8

  1. #1
    Join Date
    Jan 2007
    Posts
    15,130

    Default Link-Time Optimizations With GCC 4.8

    Phoronix: Link-Time Optimizations With GCC 4.8

    GCC 4.8 will feature a few improvements when it comes to LTO, a.k.a. Link-Time Optimization, but will this reflect in any greater performance for the resulting binaries?..

    http://www.phoronix.com/vr.php?view=MTI5ODE

  2. #2
    Join Date
    Feb 2012
    Posts
    70

    Default

    I saw all the linked results as well.
    Basically, a few percent improvement. About 4-5% over stock.

    Free performance is always good, but may be not at the cost of 3X time and 2.5X RAM use.

  3. #3
    Join Date
    Mar 2011
    Location
    Canada
    Posts
    97

    Default

    Quote Originally Posted by mayankleoboy1 View Post
    I saw all the linked results as well.
    Basically, a few percent improvement. About 4-5% over stock.

    Free performance is always good, but may be not at the cost of 3X time and 2.5X RAM use.
    When it becomes a more consistent win, it will make sense to use it in release builds of binaries that are redistributed. I think it's definitely worth having packaging take 3x longer if it makes the resulting binary 5% faster and strips out lots of dead code too.

  4. #4
    Join Date
    Oct 2009
    Posts
    845

    Default

    I seriously doubt Michael is using LTO correctly.

    When you are using just a single command to compile, like gcc -march=native -O3 -flto -fwhole-program ... it works fine, but when you use a makefile with separate C(XX)FLAGS and LDFLAGS you need to pass the C(XX)FLAGS along to the LDFLAGS, else the optimization will suffer greatly. So you should do something like this:

    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = $(CXXFLAGS) -Wall

    I've done many LTO comparisons and it's not always that there is any gain (alot of the benefits of LTO can be had by just defining functions as static when appropriate) but I've never come across such regressions as shown here in Michael's tests. Hence I'm thinking he is not passing the C(XX)FLAGS along to the linker through the LDFLAGS in the tests which uses a makefile with separate C(XX)FLAGS/LDFLAGS, which in turn means the C(XX)FLAG optimizations aren't being used when generating the final binary.

  5. #5
    Join Date
    Mar 2010
    Location
    Slovenia
    Posts
    390

    Default

    Quote Originally Posted by XorEaxEax View Post
    it works fine, but when you use a makefile with separate C(XX)FLAGS and LDFLAGS you need to pass the C(XX)FLAGS along to the LDFLAGS, else the optimization will suffer greatly. So you should do something like this:

    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = $(CXXFLAGS) -Wall
    Is this enought:
    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = -flto -Wall


  6. #6
    Join Date
    Oct 2009
    Posts
    845

    Default

    Quote Originally Posted by LightBit View Post
    Is this enought:
    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = -flto -Wall

    AFAIK you need to pass the optimization flags aswell, atleast I recall having to do so the last time I benchmarked LTO (which was on 4.7, not 4.8), so:

    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = -O3 -march=native -flto -fwhole-program -Wall (... and whatever other linker options you have)

    or just reference the CXXFLAGS variable as I did above:
    LDFLAGS = $(CXXFLAGS) -Wall

    I believe this is necessary due to the ability of using LTO on object files written in different languages, but I may be wrong. I haven't really dived into LTO as I haven't gotten any major gains from it for my own code, particularly when compared to PGO which pretty much always yield gains, often significant.

  7. #7
    Join Date
    Jan 2012
    Posts
    151

    Default

    Quote Originally Posted by XorEaxEax View Post
    I believe this is necessary due to the ability of using LTO on object files written in different languages, but I may be wrong. I haven't really dived into LTO as I haven't gotten any major gains from it for my own code, particularly when compared to PGO which pretty much always yield gains, often significant.
    I've never heard of PGO until now, but would love to see some recent benchmarks. Most of the articles I saw were reporting up to ~10% gains.

    Also, from man gcc:
    Code:
    To use the link-time optimizer, -flto needs to be specified at compile time and during the final link.

  8. #8
    Join Date
    Nov 2012
    Posts
    183

    Default

    Quote Originally Posted by LightBit View Post
    Is this enought:
    CXXFLAGS = -O3 -march=native -flto -fwhole-program
    LDFLAGS = -flto -Wall

    No, then you get -O0 optimizations. LTO means link-time optimizations, which means the linker does the optimizations, which again means the linker needs the optimization flags, but the compiler does not.

    So
    CXXFLAGS = -flto
    LDFLAGS = -O3 -march=native -flto -fwhole-program

    Would work, but your example would not.

    Note you can also speed up the compilation even more by disabling fat object files, by default GCC produces object files that both contain the code for LTO linking and traditional object code, the later is not needed if you are going to use LTO anyway on the final link. Edit: Using -fno-fat-lto-objects as a compile time flag.
    Last edited by carewolf; 02-10-2013 at 02:21 PM.

  9. #9
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,187

    Default

    Quote Originally Posted by carewolf View Post
    Note you can also speed up the compilation even more by disabling fat object files, by default GCC produces object files that both contain the code for LTO linking and traditional object code, the later is not needed if you are going to use LTO anyway on the final link. Edit: Using -fno-fat-lto-objects as a compile time flag.
    That depends on your toolchain - IIRC non-fat lto requires gold instead of the usual GNU ld.

  10. #10
    Join Date
    Mar 2010
    Location
    Slovenia
    Posts
    390

    Default

    Additionally, the optimization flags used to compile individual files are not necessarily related to those used at link time. For instance,

    gcc -c -O0 -flto foo.c
    gcc -c -O0 -flto bar.c
    gcc -o myprog -flto -O3 foo.o bar.o


    This produces individual object files with unoptimized assembler code, but the resulting binary myprog is optimized at -O3. If, instead, the final binary is generated without -flto, then myprog is not optimized.
    http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •