Quote Originally Posted by ssam View Post
agreed. I am impressed that its only 3 times longer. if there are 100 source files then the optimiser has 100 times as much stuff to think about at once. I guess the real slow down is when that makes you hit swap.
Sorry, your math doesn't add up. Let's say you have 100 files that compiled without LTO will take 100 seconds and with LTO a theoretical 300 seconds.
So for how many times should be as slow?
From the 100 seconds, the preprocessor of the C/C++ compiler will have to expand the includes, the scanner have to tokenize the file, the parser generates an AST (a tree that describe the source code). The AST is visited and is written into GIMPLE. The GCC optimizer optimizes the GIMPLE representation (using every file individually) and later it uses registry allocation and generates the .o (object file) for every source file. At the end, the linking step is made.
All added up to 100 seconds.
In LTO mode, things happen a bit differently: the compiler writes to GIMPLE, and there is no optimization upfront. But later, in the linking step.
As header expansion, template expansion, parsing, linking happen in both cases, it seems that optimizing the 100 files individually (from GIMPLE to .o files) will take let's say 50 seconds. So the non-optimizing step is the other 50 seconds.
And with LTO will take the 300 seconds - parsing/expanding,etc. (=50 seconds) = 250 seconds.
So a program with LTO will basically work like for 5 times as slow (the numbers are fictitious but close to reality).
Anyway, why just 5 times as slow and not 100!? Because LTO is not a naive implementation: before optimizing, is very easy to generate a call-graph based on GIMPLE (it is done by both LTO and non-LTO optimizer). This call graph is the scale of how much LTO "knows many files in the same time". So if a function does its own math routine, and at the end executes a "printf", it doesn't need to think to any other external function than printf. Another part is that LTO has more information to do inlining, so it will do inlining in an extensive scale. Also, it can consolidate more functions (like static constructors), as it knows that at runtime which constructors are and which are their dependency graph.
So LTO is it some times slower than the entire compilation process (I think that the optimizing time, is around 30%, but I don't have real numbers, but if it would be 30%, it would mean that LTO will be a 7 times slower optimizer, with sometimes great effect, sometimes not so much).
At the end I think that LTO has a great value for desktop applications: they have a lot of code that is unused, because most of big applications are loading a lot of frameworks at startup. And I hope that LTO will be a combination that reduce the startup time (as LTO can prove that parts of a function are not used at all, so can be safely removed, and combining of the static constructors told earlier). Performance wise, is a hard thing to say, excluding the developer is not aware of how C++ works. Because in most times the code is using templates (like STL) and this expanded code is already well inlined, so there is no way that LTO gives more information than already specialized templates are.