Here are my 2 cents:

First, Michael, could you please also create normalized versions of the graphs on page 4 and 5 of how the panda board cluster scales? This would be most helpful, since the cluster's parallelization would be more apparent in this format.

Second, looking at the fusion benchmarks (this is just the CPU I happened to choose to do a quick analysis for), while the panda board cluster is indeed ~3x faster, I think there is more to the story. What if we had a similar cluster composed of 4 amd fusion systems?

Looking a the costs on ebay, we could build a bare-bones cluster:
- amd fusion E-350 + Asus E35M1 PRO motherboard: $120 (there's a $20 rebate, but I'm leaving this out, so it could potential be $100)
- 4 gb ram: $50
- 64gb ocz ssd: $60

TOTAL: $230. Four of these would cost $920 and put the cluster's throughput above the panda board cluster. However, lets suppose that the parallelization is quite poor and scales to the exact same throughput as the panda board cluster (makes the following calculations easier).

The difference in cost between the two systems is $280 and for the NAS parallel EPC benchmark, the amd systems would be at 180W while the panda board was at 30W, difference of 150W. How long would you have to run these systems, continuously before it makes sense to by the panda board cluster (supposing 20c/kwh)

280 * 100 * 1000 / (20 * 150) = 9333 hrs ~ 389 days.

I'm not trying to say that panda board is better/worse than the other systems. I'm really only trying to show that in some cases the cost to become efficient outweighs the gains from the efficiency and this, for me, it also a very important quantity.

I don't know how the powervr graphics compares to the fusion graphics card, but if you were doing opencl/gpu computations, this would also add another factor to which system you would go for.