Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: AMD Opteron 2356 Dual Quad-Core

  1. #1
    Join Date
    Jan 2007
    Posts
    14,321

    Default AMD Opteron 2356 Dual Quad-Core

    Phoronix: AMD Opteron 2356 Dual Quad-Core

    When looking at the AMD Phenom 9500 under Linux, we had found this processor had posed a number of issues from kernel panics to other troubles when running Ubuntu 7.10 with the Linux 2.6.22 kernel. Once, however, upgrading to Ubuntu 8.04 with the Linux 2.6.24 kernel these problems had vanished and we were pleased by this native quad-core desktop processor from AMD. Released a month prior to the first Phenom desktop CPUs were the quad-core Opteron 2300 "Barcelona" processors. We hadn't looked at any AMD Barcelona processors at that time, but today we finally have our hands on two of the new AMD Opteron 2356 server/workstation processors. The Opteron 2356 CPUs come clocked at 2.30GHz, and is a revision B3 Opteron meaning that it has a proper fix for the TLB erratum -- this model was introduced only earlier this month. We have benchmarked the new Opteron 2356 in both single and dual CPU configurations and have compared the results -- under Linux -- to two of Intel's quad-core Xeon processors.

    http://www.phoronix.com/vr.php?view=12208

  2. #2
    Join Date
    Sep 2007
    Posts
    63

    Default

    hmm, seems like amd should go more for pure cpu power now, not tech, cause they BASH intel at tech .

  3. #3
    Join Date
    Feb 2008
    Posts
    88

    Default

    As for the lower Nexuiz score with two CPUs: I think that's mostly thanks to NUMA (Non Uniform Memory Architecture). In theory the system should allocate memory in memory areas attached to the memory controller built into the CPU the thread/process is supposed to run on (AMD CPUs have an integrated memory controller, just as reminder). However, if the memory gets allocated on CPU A, but the thread is moved to a core on CPU B, all memory accesses have to pass through the HyperTransport connection to CPU A, inducing additional latency and smaller bandwidth (you can see the effect in the memory benchmarks, too).

    The scheduler should (if possible) take care to not move threads to a CPU with only remote memory access. Is the Ubuntu 8.04 standard kernel NUMA-aware?
    Last edited by SavageX; 04-15-2008 at 09:28 AM.

  4. #4
    Join Date
    Sep 2007
    Location
    Europe
    Posts
    4

    Default

    sad, that we do not see a

    cat /proc/cpuinfo

    and a

    cat /proc/interrupts

    of this baby!

  5. #5

    Default

    processor : 0
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 0
    siblings : 4
    core id : 0
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4609.12
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 1
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 0
    siblings : 4
    core id : 1
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4605.75
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 2
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 0
    siblings : 4
    core id : 2
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.42
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 3
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 0
    siblings : 4
    core id : 3
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.44
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 4
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 1
    siblings : 4
    core id : 0
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.31
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 5
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 1
    siblings : 4
    core id : 1
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.32
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 6
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 1
    siblings : 4
    core id : 2
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.33
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

    processor : 7
    vendor_id : AuthenticAMD
    cpu family : 16
    model : 2
    model name : Quad-Core AMD Opteron(tm) Processor 2356
    stepping : 3
    cpu MHz : 2300.093
    cache size : 512 KB
    physical id : 1
    siblings : 4
    core id : 3
    cpu cores : 4
    fpu : yes
    fpu_exception : yes
    cpuid level : 5
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
    bogomips : 4600.32
    TLB size : 1024 4K pages
    clflush size : 64
    cache_alignment : 64
    address sizes : 48 bits physical, 48 bits virtual
    power management: ts ttp tm stc 100mhzsteps hwpstate

  6. #6

    Default

    Quote Originally Posted by SavageX View Post
    As for the lower Nexuiz score with two CPUs: I think that's mostly thanks to NUMA (Non Uniform Memory Architecture). In theory the system should allocate memory in memory areas attached to the memory controller built into the CPU the thread/process is supposed to run on (AMD CPUs have an integrated memory controller, just as reminder). However, if the memory gets allocated on CPU A, but the thread is moved to a core on CPU B, all memory accesses have to pass through the HyperTransport connection to CPU A, inducing additional latency and smaller bandwidth (you can see the effect in the memory benchmarks, too).

    The scheduler should (if possible) take care to not move threads to a CPU with only remote memory access. Is the Ubuntu 8.04 standard kernel NUMA-aware?
    I'm wondering how many threads do Nexuiz have? Does it spawn as many threads as the number of processing cores the system have?

    Micheal, probably u can verify it by disabling 2 cores in both CPU. Then compare the result with 1 CPU 4 cores. If 2+2 cores is still slower than 4+0, then I think it might be NUMA. (probably can check the kernel's make menuconfig too?)
    Last edited by davidletterboyz; 04-16-2008 at 01:32 PM.

  7. #7
    Join Date
    Feb 2008
    Posts
    88

    Default

    Quote Originally Posted by davidletterboyz View Post
    I'm wondering how many threads do Nexuiz have? Does it spawn as many threads as the number of processing cores the system have?
    It spawns exactly 1 (one) thread. There may be a few tasks during rendering which could potentially be moved into parallel theads, but as of now nothing of that sort materialized.

    Upside of that: On an eight core system power management can put seven cores to sleep. Green open-source gaming fun!

  8. #8

    Default

    Quote Originally Posted by SavageX View Post
    It spawns exactly 1 (one) thread. There may be a few tasks during rendering which could potentially be moved into parallel theads, but as of now nothing of that sort materialized.

    Upside of that: On an eight core system power management can put seven cores to sleep. Green open-source gaming fun!
    Oh I see. But then, it still could be the load balancing penalty that caused the 8 cores system to slow down a bit.
    Last edited by davidletterboyz; 04-17-2008 at 12:31 PM.

  9. #9
    Join Date
    Apr 2008
    Location
    /dev/random
    Posts
    218

    Lightbulb fair???

    I don't think this is a fair test, ubuntu is binary-based, for a truly fair test, a source-based distro should be used to get optimized performance (ie gentoo/LSF)

  10. #10
    Join Date
    Feb 2008
    Posts
    88

    Default

    Quote Originally Posted by some-guy View Post
    I don't think this is a fair test, ubuntu is binary-based, for a truly fair test, a source-based distro should be used to get optimized performance (ie gentoo/LSF)
    Depending on your environment your CPU may never see "optimized" code during its whole live time. Thus testing plain-vanilla code is relevant for many users.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •