OpenBLAS 0.3.14 Released With Performance Improvements For AMD Ryzen, POWER10
OpenBLAS 0.3.14 is out today as the newest version of this open-source BLAS (Basic Linear Algebra Subprograms) library that continues to work on maximizing the performance for x86_64 and other architectures.
OpenBLAS 0.3.14 on the x86_64 has an optimized BFloat16 GEMM kernel for Intel Cooper Lake processors, auto-detection is added for Rocket Lake and Tiger Lake, and AMD Ryzen processors are enjoying improved performance for SASUM / DASUM / SROT / DROT kernels. The OpenBLAS x86_64 code also has fixed its detection of AMD's Clang-based AOCC compiler, support for BLAS/CBLAS tests on Windows, and other fixes.
Outside of x86_64, on the POWER front there is now optimized POWER10 kernels for SSCAL / DSCAL / CSCAL / ZSCAL / SROT / DROT / CDOT / SASUM / DASUM. There are also improved performance for other existing kernels on IBM POWER10 too. The POWER code also now can be compiled by NVIDIA's HPC compiler.
On the ARM64 front there is support for compiling with the NVIDIA HPC and NAG Fortran compilers. A RISC-V compilation fix, several new CBLAS interfaces (CROTG, ZROTG, CSROT, and ZDROT), and other various compiler fixes round out this release.
More details and downloads for the OpenBLAS 0.3.14 release via GitHub.
OpenBLAS 0.3.14 on the x86_64 has an optimized BFloat16 GEMM kernel for Intel Cooper Lake processors, auto-detection is added for Rocket Lake and Tiger Lake, and AMD Ryzen processors are enjoying improved performance for SASUM / DASUM / SROT / DROT kernels. The OpenBLAS x86_64 code also has fixed its detection of AMD's Clang-based AOCC compiler, support for BLAS/CBLAS tests on Windows, and other fixes.
Outside of x86_64, on the POWER front there is now optimized POWER10 kernels for SSCAL / DSCAL / CSCAL / ZSCAL / SROT / DROT / CDOT / SASUM / DASUM. There are also improved performance for other existing kernels on IBM POWER10 too. The POWER code also now can be compiled by NVIDIA's HPC compiler.
On the ARM64 front there is support for compiling with the NVIDIA HPC and NAG Fortran compilers. A RISC-V compilation fix, several new CBLAS interfaces (CROTG, ZROTG, CSROT, and ZDROT), and other various compiler fixes round out this release.
More details and downloads for the OpenBLAS 0.3.14 release via GitHub.
3 Comments