The Performance Impact Of Genoa-X's 3D V-Cache With The AMD EPYC 9684X

Written by Michael Larabel in Processors on 24 July 2023 at 01:30 PM EDT. Page 2 of 5. 30 Comments.
OpenFOAM benchmark with settings of Input: drivaerFastback, Medium Mesh Size, Execution Time. Default was the fastest.

One of the real-world workloads with the most significant performance impact from AMD 3D V-Cache with Genoa-X is, of course, the OpenFOAM computational fluid dynamics software. The shear difference from toggling 3D V-Cache for OpenFOAM was very profound and speaks to the possibilities of 3D V-Cache. Other CFD software can also benefit too from 3D V-Cache while of course my benchmarking is focused on the open-source workloads.

OpenFOAM benchmark with settings of Input: drivaerFastback, Medium Mesh Size, Execution Time. Default was the fastest.

Having 3D V-Cache enabled (default) led to a CPU power consumption increase by around 10 Watts on average, or about 3% more which isn't bad at all considering the significant time savings.

OpenFOAM benchmark with settings of Input: drivaerFastback, Large Mesh Size, Mesh Time. Default was the fastest.
OpenFOAM benchmark with settings of Input: drivaerFastback, Large Mesh Size, Execution Time. Default was the fastest.
OpenFOAM benchmark with settings of Input: drivaerFastback, Large Mesh Size, Execution Time. Default was the fastest.

With the large mesh size tested with OpenFOAM there can be significant time savings from the 3D V-Cache.

libxsmm benchmark with settings of M N K: 64. Default was the fastest.

The libxsmm library for specialized dense and sparse matrix operations and deep learning primitives saw huge uplift too from AMD 3D V-Cache, similar to the nice improvements observed too during the Intel Xeon Max testing with HBM2e.

libxsmm benchmark with settings of M N K: 64. Default was the fastest.
libxsmm benchmark with settings of M N K: 64. Default was the fastest.

The AMD EPYC 9684X CPU power consumption was also slightly lower when leveraging the 3D V-Cache for this benchmark.

libxsmm benchmark with settings of M N K: 32. Default was the fastest.
libxsmm benchmark with settings of M N K: 128. Default was the fastest.
libxsmm benchmark with settings of M N K: 128. Default was the fastest.
libxsmm benchmark with settings of M N K: 128. Default was the fastest.

For other data-set sizes tested, libxsmm continued to show nice benefits to the 3D V-Cache.

HeFFTe - Highly Efficient FFT for Exascale benchmark with settings of Test: c2c, Backend: FFTW, Precision: float, X Y Z: 512. Default was the fastest.
HeFFTe - Highly Efficient FFT for Exascale benchmark with settings of Test: r2c, Backend: FFTW, Precision: float, X Y Z: 512. Default was the fastest.
HeFFTe - Highly Efficient FFT for Exascale benchmark with settings of Test: c2c, Backend: FFTW, Precision: double, X Y Z: 256. Default was the fastest.
HeFFTe - Highly Efficient FFT for Exascale benchmark with settings of Test: r2c, Backend: FFTW, Precision: double, X Y Z: 512. Default was the fastest.

The HeFFTe library also showed the very significant difference from toggling 3D V-Cache.


Related Articles