It's a lot easier to improve when you start out with nothing. There is absolutely no reason to suspect that improving from completely unoptimized, barely-working codegen (which is, necessarily, how LLVM/Clang or any compiler starts out its life), to matching 75-80% of gcc's performance, is an indicator that they'll achieve the last 20-25% delta and even pull ahead.
In all of the benchmarks I've seen so far, all of LLVM's so-called "wins" are statistically insignificant, i.e. within 5%. Furthermore, not all of gcc's wins are due solely to the absence of OpenMP on LLVM; some of the benchmarked programs don't use OpenMP at all, yet still exhibit a 20% delta favoring gcc. Clearly, llvm has a long way to go. The problem is that the last mile is 10 times harder than the previous hundred miles; the last 10 feet are 10 times harder than the last mile; the last inch is 10 times harder than the last foot; and so on (the analogy works better with the metric system but I'm too lazy to edit what I typed).
Unless llvm literally copy and pastes much of the micro optimization stuff from gcc, there is no reason to think that they will implement those expensive optimizations in any sort of reasonable timeframe. Look how long it took gcc to develop them. Moreover, there wasn't really any other good open source competition to gcc at the time that the gcc devs were developing those optimizations, so they pretty much had to do them from scratch. Now, you might say that llvm developers could just study the overall algorithmic approach that gcc takes to optimizing so well, and base their own optimizations on that. But by your own admission, the internal architecture of gcc and llvm are wildly different. So llvm will not be able to easily copy and paste from gcc, even if they were permitted to do so by the license, because of the difference in architecture. In short, llvm will have to do things "mostly" from scratch, whereas gcc had to invent optimizations "entirely" from scratch.
I can perhaps understand Google working with GCC's internals. But why in God's name would an application development company like Facebook have to even look at the GCC source code? I mean, come on -- as the author of C, C++, Vala, C# and Java codebases spanning upwards of 50k SLOC, I've never once encountered some programming problem or compiler error within my program and thought "gee, I'd better try and hack on the compiler". You have to be writing some pretty edge-case, non-portable code to even reach that point. Stick with well-supported constructs and paradigms (design patterns) and you can type out hundreds of lines of code at a time with an extremely small warning/error rate, without even using an IDE. It's not rocket science.
Anyway, the whole point of a compiler is that it is a tool; it's not the end product in itself. Having complex internals vs. well-documented and elegant internals does not add any sort of value to the end product. Having a compiler that produces fast code (or small code, depending on your needs) is a value-added attribute. Having a compiler that is simply well-designed is not, by itself, a value-added attribute. If forced to choose between a compiler that has more/better value-added attributes versus one that does not, it should be a no-brainer for anybody who has ever taken a business class, or even anyone whose goal is to deliver high quality products to whomever their customers are.
I'm a big fan of developer tools and anything that enhances productivity. But I've seen some pretty incredible stuff with Eclipse CDT's gcc integration, and even more incredible stuff with Visual Studio's integration with the Microsoft C++ compiler. But neither gcc nor the Microsoft C++ compiler have "tool support" in the same way that Clang/LLVM do. So how is it possible that both the open source community and a proprietary company have used inferior toolchains to produce superior IDEs? Maybe I'm missing something, but I can refactor C++ code pretty damn well with Eclipse CDT or Visual Studio.
I recall Michael posting an article/video a few months ago about some developer working on advanced tooling using LLVM, and I remember being pretty impressed. I mean, if you take this kind of thing to its logical extreme, C++ could almost start to approach the maintainability and productivity of Java, which is a huge feat for such a terrible language. So why don't we invest all those man-hours into the slow LLVM/Clang to make it as easy to develop with as Java, just so we can say C++ is the best? Meanwhile our "C++" will be almost as slow as Java because we aren't using a compiler that has fully explored runtime performance optimization.
For certain classes of program, sure, 10% performance doesn't matter. Common desktop software is to the point that it barely scratches the surface of what a modern CPU is capable of. But anything computationally-intensive is going to care a lot about even a 1% performance delta, to say nothing of 10%. You may not be able to notice the difference between Clang and gcc for Firefox or GNOME, but everything from video processing to scientific applications is extremely performance-sensitive, and slow-performing code can impact schedules by hours, days or even weeks, or require hardware upgrades that wouldn't be necessary otherwise.
"Having both is good" is the first sane thing you've said. And of course good error messages matter for large scale applications where triaging a problem is prohibitively difficult. But you're overlooking one thing.
Now, more than ever, gcc is poised to be able to produce better diagnostics than it has in the past. With the introduction of C++ into gcc's source code, internal APIs are being rewritten in an object-oriented manner, replacing old spaghetti code with a layered architecture that at least belongs in the same discussion as LLVM's architecture, even if LLVM is "even more layered" or "even more well-designed".
The point is, while LLVM is trying to catch up to gcc's performance, gcc is trying to catch up to LLVM's usefulness to developers. People are working on both sides to make both compilers better in their weaknesses. Claiming that gcc's complex/unmaintainable internals make it unable to match LLVM in the long run is plain wrong, if for no other reason than the fact that gcc's internals are actively being rewritten as we speak. But (hopefully) they'll be maintaining all of the optimizations they have today, just sitting them on top of more object boxes, separating it out into more shared libraries, etc. to make the code more maintainable.
I disagree, however, that the point of a compiler is to provide good diagnostics. I would rather that the compiler focus on what it does best -- compiling -- and run a different, separate tool that tells me why my code is bad.
Actually, I'm a huge fan of clang analyzer. That is exactly the kind of application that LLVM is best at. All hail the open source equivalent to Coverity, which I hope will in time produce even better diagnostics than Coverity itself, and then some.
In my perfect world, clang would -- as you said somewhere above -- be unable to actually codegen a built and linked binary. Its sole focus would be on helping developers improve their code by eliminating incorrect code (compiler errors, inadvisable practices, slow code, non-standard-compliant code, and so on). Clang could very easily fit into the open source ecosystem this way.
If Clang DOES fully catch up with gcc on performance of compiled code, that's great -- but I think it unlikely because clang is much more valuable if efforts are concentrated on its diagnostics, which you yourself admitted are the most important aspect of a developer tool (not to be confused with a compiler). Clang seems more focused on being a developer tool from the get-go, so why not just push that angle and leave the release builds to gcc?
This is an honest question, because I don't know the answer for certain; but what in the world does Clang have to do with the Linux graphics stack? AFAIK only the core LLVM libraries are used. Sure, you can compile Mesa with clang; but it'll be needlessly slower than compiling it with gcc. I swear I've built and linked many a build of r600g and nouveau with the LLVM libraries installed but without the clang binary installed on my system. Maybe you're alluding to the fact that developers are using clang analyzer to help them diagnose their code? That's fine, for a developer tool, but that still doesn't make it a good compiler for the purpose of fast codegen.
Okay, smartass. Insulting everyone who uses gcc is just as stupid as insulting everyone who uses LLVM. As it stands right now, they serve vastly different purposes, which doesn't make them incompatible, and there's no reason not to use both of them as the situation demands. The truly intelligent engineer won't just blindly migrate to the latest fad project because it's new and cool; they will use whatever best fits the situation at hand. So let's TL;DR this whole discussion:
First, as a premise: the year is 2012. Let's not get ahead of ourselves here.
- Are you compiling a release build? Then use GCC!
- Are you already familiar with GCC's error messages and know what to do when you see one? Then use GCC!
- Are you stumped by an error, or looking to improve the quality of your code? Then use clang or clang-analyzer!
- Are you writing a graphics driver and don't have the manpower / time to develop your own optimizing shader compiler? Then use LLVM!
Now, the year is 2014. Oh, wait -- I was about to make a similar list as above but with what the situation will be in 2014, but then I realized that I misplaced my crystal ball. Maybe you have it over there and can find it for me, elanthis?![]()


Reply With Quote



