Benchmarking Numascale's NumaConnect: 256 AMD Cores Connected

Written by Michael Larabel in Hardware on 29 March 2014 at 11:36 AM EDT. Add A Comment

Numascale's NumaConnect technology allows organizations to build scalable, shared-memory systems from standard AMD servers using the Norwegian company's high-speed interconnect interface.

Numascale works towards various solutions on building shared-memory super-computers around AMD Opteron servers with Hyper Transport technology. NumaConnect is primarily focused on Linux support but also works with Windows and Solaris systems too.

As explained at Numascale.com, "The big differentiator for NumaChip compared to other high-speed interconnect technologies is the shared memory and cache coherency mechanisms. These features allow programs to access any memory location and any memory mapped I/O device in a multiprocessor system with high degree of efficiency. It provides scalable systems with a unified programming model that stays the same from the small multi-core machines used in laptops and desktops to the largest imaginable single system image machines that may contain thousands of processors. The architecture is commonly classified as ccNuma or Numa but the interconnect system can alternatively be used as low latency clustering interconnect."

Compared to other super-computing solutions, NumaConnect is advertised as costing a fraction of these other high-speed interconnect methods while being just as versatile. For a while I had access to a NumaConnect super-computer that amounted to 32 AMD Opteron processors that exposed 256 x86_64 CPU cores. The access was granted thanks to Phoronix Test Suite work being funded by Numascale, primarily around HPC Challenge and Open Porous Media automated testing work. I was recently told that I'm able to publicly share the results I gathered while trying out the NumaConnect test system, so here they are.

1312315-SO-HPCCHALLE47 - Here's some HPC Challenge results from the 32 AMD processor system via NumaConnect with both a "normal run" and then just running one process per node.

1401114-SO-NUMAHPC6216 - Some more tests -- both of a standard configuration and then one process per node -- when running with the ACML 5.3 math library and then the CentOS 6.5 ATLAS library. ACML ends up being a big performance win.

There's also some other older Numascale performance results generated in a fully automated and reproducible manner using the open-source Phoronix Test Suite results. From the OpenBenchmarking.org data you can see all of the system details, the results, and reproducing the tests on your own using our open-source, automated benchmarking software. Thanks to Numascale for these results being permitted to be shared and for their engineering engagement with the Phoronix Test Suite.

More details on this high-speed interconnect technology for AMD Opteron servers can be found at Numascale.com.

Add A Comment