The 2x difference doesn't surprise me, although some of the 10x differences do since IIRC the corresponding number on discrete GPUs is more like 3-5x than 10x. THe difference may be memory bandwidth sensitivity (optimizing to reduce bandwidth consumption is one of the more complex aspects of driver optimization) or just that the default clock state for APUs is even lower than I remembered.
As others have said it would be good to keep in mind which performance features in the HW are enabled and which are still being worked on. Looks like the article ran with driver defaults - Michael is that correct ? Any idea what clocks were being used (ie what the VBIOS power state called for) ?
Testing with defaults (ie ignoring some WIP improvements which are still off by deafult) seems reasonable to me as long as you aren't drawing "it's gonna be years" conclusions from the results.
MLAA. The hand-optimized TGSI can be found in any current Mesa. And yes, it's a big shader (three passes, the second pass is the biggest).What was your shader ? My experiment showed allmost no win but i didn't do big shader (even doom3 don't have that big shader).
So we all agree that the very poor benchmarks were due to the clocks being low by default on Fusion?
Then what's left to be desired is proper 'idiot proof' dynamic power management, enabled by default, so phoronix won't draw the wrong conclusions after benchmarking