You're fully pointing out the issues: AVX is just for spots of code where it can use it's double wide bandwidth. Also at least AMD said that first gen AVX will be implemented internally in microcode as two SSE calls, and as I do not have any Intel info about how they did it, probably even hitting AVX optimizations will not show that dramatic gains.
Originally Posted by elanthis
At the end I just hope that benchmarks will focus more to extrapolate those gains using to maximum those gains.
For example FFMPEG permits to be compiled with no ASM, and probably if it will touch some autovectorize compiler patterns, will likely get some speedup. Similar with a renderer or scientific code.
As Phoronix uses Linux, I think that the main speedup will unlikely be noticed that whole desktop works with just SSE2 that Atom CPU support, as even some components are written in Python and so on.
Also, as results get fairly predictable, it will be better just to benchmark for example when a kernel will pick a new scheduling strategy (as was BFS), to test it. Elsewhere most of those results will be just noise and at large I personally think that will hurt the compiling and the hardwork of GCC team.
I found lately much more fun to test for myself the JS performance of Firefox that those benchmarks. And much more people will be impacted to see how a real browser will work.
Mono have an LLVM JITting support. How much the start-time of a big app (MonoDevelop comes in my mind) is impacted. What about to test its raw number performance compared with GCC/C++ port of some code or other kind of code like this.
"GCC 4.6 also can be built with the --with-fpmath=avx flag, which will allow the GNU compiler to use AVX floating-point arithmetic."
IIRC, binaries made by compilers compiled themselves by a new compiler benefit too.
The idea is as outlined below.
Binary B1 made by compiler C1 itself compiled with C1 gives performance P1.
Binary B2 made by compiler C2 itself compiled with C1 gives performance P2, where P2 > P1.
Binary B3 made by compiler C2 itself compiled with C2 gives performance P3, where P3 > P2 > P1.
Is that true, or is P2 = P3?
Thanks for any insightful comments!
You make "quote mining" and Michael statement was not about performance (either that compiler will support AVX or not) but simply that if you at configure step from configure, make, make install step, you will set a flag of compilation, the resulting compiler can enable AVX instruction generation.
Originally Posted by sabriah
Also it does not state anything about performance, as you can do cross compiling so you can compile supposedly on an Atom CPU or 486 or PowerPC CPU and you will get the same performing binary.
Also I think you misunderstand not only the final binary performance, but if compiler will be using or not some instruction what are the benefit. Mostly the compiler struggle to make your binary to use minimum registers and the code to enter in L1 cache. Here are the main benefits that your compiler may benefit.
AVX (starting from MMX era) are SIMD instruction, which means that if you have somewhat parallel processing data that you have to do it in a block, like for example a matrix multiplication, you can get benefits there. So if you can feed your program that your compiler will see those patterns, the resulting instructions will benefit of this explicit parallelism and here are the gains. They mostly combine with loop-unroll optimization.
Also most of those instructions benefit in floating point code, which is also an interesting point, because you get fairly good performance without any AVX for final binaries in regular applications.
That wasn't the intention, but I get the point. Thanks for the explanation!
Originally Posted by ciplogic
GCC is built in a three-stage bootstrap, meaning the compiler that gets installed is _always_ compiled with itself.
Originally Posted by sabriah
Let's say you have 4.5.2 installed and are building 4.6.0. In the first stage, the 4.6.0 compiler is built with GCC 4.5.2. The second stage then rebuilds 4.6.0 with the compiler that was just built in stage 1. Finally, the third stage uses the compiler built in stage 2 to build itself one more time. The stage 2 and stage 3 compilers are then compared to ensure they are identical.
Thanks for that!
Originally Posted by dirtyepic
Lots of regressions, but it have big potential of providigin some really good improvements in some areas. Even 10% in some tests which is very good result!
Tags for this Thread