There are other issues to solve, certainly. Memory bandwidth is one of the big ones, and one that isn't getting solved particularly quickly. Throwing more processing into a package that is starved for data or bottlenecked in writing out data is not going to help, for sure.
Also, your assumption that you can just plug in extra APUs for more power doesn't seem like a good solution to me - look at crossfire and SLI - even with 2 GPUs it doesn't always scale very well. Stick in 4 GPUs and watch scaling go way down. Just being APUs won't fix the scaling problem, at least not all the way.