Weird, why would PGO help more with the asm version? Should be the ecaxt opposite imo since the compiler can't optimize that hand-written assembly in any way, but it should atleast be able to do some optimizing with the c code. Anyway I can understand why you wouldn't want to redo the tests since the argument was regarding hand-optimized assembly vs compiler generated code, maybe I'll give it a shot myself since I'm curious as to what Shikari said. Again thanks for the benchmarks!
I believe this test are needed to be rebuild with recent gcc and clang versions like
gcc4.7 vs clang3.2
gcc4.8svn vs clang3.3svn
I wonder the results.