Okay, if you are testing O3 then you need to update the caption below the charts. The current captions describe some g++ flags.
Also, why did you choose these workloads ? I am puzzled by the odd benchmark selection. When you benchmarked the auto-vectorizerd you decided to run some database program, and for this you decided to focus on cryptography (John and Botan). You should standardize your workload set and keep it the same for all of your reviews.
Usually the set of tests is the same, but only interesting changes get shown in the articles. For the full set there's usually the openbenchmarking link.
One thing I would like to see, with all these benchmarks, is some analysis of why we are seeing what we are seeing.
For example, in the aggressive improvements we have seen, is there reason to believe that the results are
- because of the auto-vectorizing? OR
- because of AVX (ie the 3.2 vectorizing worked well, just didn't target AVX)? OR
- recognition of idioms and automatic use of the Intel crypto instructions from C? OR
- polyhedral optimization (automatic refactoring array code for better memory usage --- been demoed for a while, but I don't know if it has been mainlined yet or not)? OR
- automatic use of OpenMP? (this has been an ongoing difference between GCC and LLVM, to the extent that many of the benchmarks we have seen are worthless if you want to UNDERSTAND the compilers, as opposed to simply wanting to spout your mouth off). I also don't know if automatic use of OpenMP has made it into mainline LLVM 3.3
Point is, with these sorts of details these articles could become so much more valuable.