Quote Originally Posted by Dresdenboy View Post
Uh, I see. This has been linked one page before.

Anyway - recompiled code seems to show a significant advantage for BD. Same in HT4U's cray test (German).

BD has strong rules for instruction grouping, which might lower decode bandwidth to 1-2 inst/cycle, thus not only limiting performance of one thread but indirectly reducing decode throughput of the second thread.

So far if AMD includes their "Branch Redirect Recovery Cache" (a ľop buffer) in Trinity, this might even help legacy code after they surpassed the decode stage bottleneck.
If 4 core cpu's came out in 2003 they would have flopped as bad. The problem is 8 cores is not twice as hard to get out of lockdown. It's orders of magnitude harder. It'll eventually get better and bulldozer could probably run 70 percent faster than 4 cores but most likely only 50 percent faster. Everyone on this thread will be dead before 16 core desktops actually work well in general computing environments.
Look at gpu's. As the core counts go up on them. They keep simultaneously forcing higher and higher resolution screens. Because they can't improve performance much on normal resolution screens but they can give the gpu's more to do on each frame.

Computer manufactures are stuck in 2 strategies. Doing what is feasible and workable and hoping it's accepted by buyers. Or working towards nearly unachievable goals and simply lying about progress as they go with customers buying and giving a son i'm dissapoint reaction at every stage of it. So it's either 2560x1600 3d screens that give you headaches or nihilism and converting your early adopter groups into wait and see groups over time.

The problem is they went many cores to get around the clock speed problem. Now they need clock speed to get around the many cores problem because only speed will bring the busses out of lock out faster. So you need what you can't get to fix what you got but can't get.

http://www.youtube.com/watch?v=FTeWGD4Q9T4