OpenBLAS 0.3.10 Released With Initial BFloat16 Support, x86_64 Optimizations

Written by Michael Larabel in Programming on 15 June 2020 at 12:32 AM EDT. 4 Comments
PROGRAMMING
A new feature release is now available for this leading open-source BLAS linear algebra library.

With this Sunday's release of OpenBLAS 0.3.10 there is initial BFloat16 (BF16) support and initial implementation in SHGEMM, imported various LAPACK bug fixes, thread locking improvements, an API for setting thread affinity on Linux via OpenBLAS, CMake build system improvements, support for MIPS 24K/24KE processors based on P5600 kernels, optimized SGEMM kernel for Cortex-A53, improved ThunderX2 performance, various performance improvements for recent x86_64 CPUs, AVX-512 fixes, and other fixes throughout and various optimizations.

From our perspective, most exciting is the initial BFloat16 support given the Intel and Arm CPUs coming to market with supporting this half-precision floating point format as well as the x86_64 optimizations. BFloat16 is important for machine learning / AI and we're anticipating more OpenBLAS BF16 support moving forward. With the x86_64 optimizations there is better DGEMM performance on Skylake-X, better STRSM performance for Haswell / Skylake X / Ryzen, and other fixes/improvements.

The full list of OpenBLAS 0.3.10 changes via the project's GitHub.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week