SNC/NPS Tuning For Ryzen Threadripper 7000 Series To Further Boost Performance
The AMD Ryzen Threadripper 7000 series offer great performance out-of-the-box for Linux desktop/workstation users as shown in my Ryzen Threadripper 7970X and 7980X benchmarks along with the Threadripper PRO 7995WX. While a more common tunable on the EPYC side, the Threadripper 7000 series can also benefit from Nodes Per Socket (NPS) / Sub-NUMA Clustering (SNC) tuning for enhancing the performance of some workloads. In this article is a look at dozens of benchmarks while looking at the performance impact of SNC2/SNC4 adjustments for the Zen 4 Threadripper.
From the system BIOS the SNC2/SNC4 NUMA settings can be adjusted while most workstations/motherboards are likely defaulting to it off. In the case of the HP Z6 G5 A workstation testing with the Ryzen Threadripper PRO 7995WX, it was using the Intel terminology of Sub-NUMA Clustering with SNC2/SNC4 rather than the AMD Nodes Per Socket terminology of NPS2/NPS4.
The default behavior is to run disabled with the entire CPU set as a single NUMA domain while moving to NPS2/SNC2 splits the CPU into two NUMA domains with half the cores and half the memory per domain while interleaving half the number of memory channels into each domain. SNC4 (NPS4) meanwhile breaks down each quadrant into a NUMA domain and the memory is interleaved with two memory channels for each quadrant if running Threadripper PRO with eight memory channels.
It depends upon the workload for which NPS/SNC setting is the best depending upon how NUMA-aware or optimized the software is for the different NUMA topologies. For this article I ran dozens of benchmarks to provide some reference for those wondering about the performance impact given that all the benchmarks to date has been at the default (disabled) and NPS/SNC benchmarks and information for the Ryzen Threadripper isn't as common as on the EPYC server side where it's a more well known feature.
For this article the HP Z6 G5 A currently being reviewed at Phoronix was used for this testing with the AMD Ryzen Threadripper 7995WX. This 96-core Zen 4 workstation processor with 8 x 16GB DDR5-5200 memory was tested in its default state and then from the HP BIOS repeating all of the benchmarks in SNC2 and then SNC4. No other changes to the BIOS or system software configuration were made besides adjusting the SNC tunable within the BIOS.