
Originally Posted by
highlandsun
Speaking of which - I'm always annoyed by the lack of analysis in these articles. "We ran foo and it yielded this number X. Next."
Articles like this teach readers next to nothing, it offers pretty much zero enhancement to understanding.
I would look at results for, e.g. GraphicsMagick and ask myself "why aren't the BD-specific optimizations helping?"
And then I'd re-run the test using valgrind cachegrind and see what the code that the compiler generated is actually doing, in an instruction-level profile, and look at the cache hits and misses. (Of course, this assumes that you've built a new enough valgrind that has already been updated to support the new AVX etc. instructions....)