Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 21

Thread: Multi-Core Scaling Performance Of AMD's Bulldozer

  1. #11
    Join Date
    Sep 2009
    Posts
    357

    Default This isn't really that bad.

    Some of the initial performance reports where very negative but this looks very good to me for a generation one processor. There is good reason now to save a few bucks going AMD as the performance penalty isn't overwhelming

    It will be especially interesting to see more testing with different configurations of the processors. We simply don't have the experience to imply anything about cache trade offs as the architecture is so new.

    The other thing to realize is that there has likely been little in the way of Bulldozer specific optimizations in these tests. That would be switches for the compilers. Even though some OS optimization has been done that doesnt preclude any other improvements specific to Bulldozer. In the end AMD could be sitting pretty with a hardware revision and some optimized compiler technologies.

  2. #12
    Join Date
    Oct 2008
    Posts
    3,135

    Default

    Quote Originally Posted by rohcQaH View Post
    In any case, a direct comparison of "4 threads across 4 modules" against "4 threads crammed into 2 modules" might be interesting to see how much Bulldozer's modules actually lose over discrete cores by sharing certain parts of the CPU pipeline.
    Yes, that's all i was getting at. Maybe scaling would be worse, or maybe better, but it would have been an interesting test to see exactly what does happen. It might even tell us something about the BD architecture.

  3. #13
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,105

    Default

    I see my div suggestion was applied. Nice to have a scrollable specs table.

  4. #14
    Join Date
    Dec 2008
    Location
    San Bernardino, CA
    Posts
    232

    Default

    Quote Originally Posted by nepwk View Post
    Good article. You did a great job of showing the difference between 8 semi-real cores vs. hyperthreading.
    Agreed. Thank you Michael, this was a very informative article!

  5. #15
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    people wondering about the bad scaling on 8 threats...

    but remember the bulldozer is a 4 core cpu not a 8core...

    the speed in comparison on an 4 core with 4 threats is really high.

    and no 4core+ CMP isn't a 8core cpu.

    its a 4 core with CMP (like 4core+Hyperthreating on intel side.)

  6. #16
    Join Date
    Feb 2010
    Posts
    519

    Default

    Quote Originally Posted by Qaridarium View Post
    people wondering about the bad scaling on 8 threats...

    but remember the bulldozer is a 4 core cpu not a 8core...

    the speed in comparison on an 4 core with 4 threats is really high.

    and no 4core+ CMP isn't a 8core cpu.

    its a 4 core with CMP (like 4core+Hyperthreating on intel side.)
    Oh f*ck me... Not again!?!?

    In any case... Nice article, though the CLOMP results for BD keep bugging me. Can anyone try and explain what's going on and what are the implications?

  7. #17
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    Quote Originally Posted by PsynoKhi0 View Post
    Oh f*ck me... Not again!?!?
    CMP is the stupid idea AMD ever bring to life but only because the humanity isn't ready yet.
    they should better build native 8core cpus like the AMD Opteron 6128, 8x 2.00GHz for the am3 socket.
    the Humans are not ready yet for the Truth about SMP vs CMP vs HT
    in my point of view the humans are not ready for out of order cpus.
    we need a LAW to Force "in order" cpu architecture to Stop the stupidness of the humans.

    very sad.

  8. #18
    Join Date
    Oct 2011
    Posts
    2

    Default Scaling with more threads than actual cores

    Sometimes one can squeeze more performance when running the test with more threads than the actual core of the processor.

    Here is an example:
    http://openbenchmarking.org/result/1...IV-1090TX26430

    X264 performance peaked at 18 threads for my 1090T.

    I think thread count should be a value that can be determined by the actual tests (some other tests have no justification for such manipulation) AND the processor since some processor improve their results when loading more threads and some others don't

  9. #19
    Join Date
    Jan 2008
    Posts
    187

    Default

    Quote Originally Posted by EyalBD View Post
    Sometimes one can squeeze more performance when running the test with more threads than the actual core of the processor.

    Here is an example:
    http://openbenchmarking.org/result/1...IV-1090TX26430

    X264 performance peaked at 18 threads for my 1090T.

    I think thread count should be a value that can be determined by the actual tests (some other tests have no justification for such manipulation) AND the processor since some processor improve their results when loading more threads and some others don't
    Whenever you see results like that it means there is a problem with the benchmark. Either the code is written poorly, or the threads are blocked by I/O. In the latter case, obviously changing the disk subsystem will change the result, and then you no longer have a meaningful benchmark of CPU performance. In the former case, you have some other block on a shared resource. It might be valid still as a benchmark, if the contended resource is the same on all test platforms.

  10. #20
    Join Date
    Oct 2010
    Posts
    311

    Default There is an error in the article

    Last sentence on page 3 says:
    "With eight threads (fully utilizing the FX-8150), the improvement was 6.05x over the single-core result while the Opteron 2394 was at 8.02x and the Core i7 990X at 6.11x."

    However, 6.05x is the improvement for 6 threads, the one for all 8 threads was 7.44x

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •