OpenJDK Merges Intel's x86-simd-sort For Speeding Up Data Sorting 7~15x

Written by Michael Larabel in Intel on 6 October 2023 at 05:10 PM EDT. 38 Comments
INTEL
Earlier this year Intel posted x86-simd-sort as a blazing fast sorting library that makes use of AVX-512. When the popular Numpy began using it they found up to 10~17x faster sorts for 16-bit to 64-bit data types. Today Intel software engineers released x86-simd-sort 3.0 and it also comes minutes after OpenJDK merged a modified version of this speeding sorting code into that reference JDK codebase.

x86-simd-sort 3.0 adds a new "avx512_argselect" method to compute the arg nth_element that returns an array of indices that would partition the data array. The x86-simd-sort 3.0 release also has improvements to its benchmarks, now uses __builtin_cpu_supports rather than querying cpuinfo, and various other changes.

With x86-simd-sort 3.0 in Numpy, they are seeing the "ng.partition" speed-ups by up to 25x for 16-bit, 17x for 32-bit data types, and 8x for 64-bit data types. The numpy np.argpartition is up to 6.5x faster with the new avx512_argselect method.

Intel Sapphire Rapids CPUs


Meanwhile merged this afternoon is a slightly modified version of x86-simd-sort within OpenJDK. With this sorting code merged, 32-bit data sorting is up to 15x faster and around 7x faster for 64-bit data.

More details on x86-simd-sort 3.0 for speedy AVX-512 sorting via GitHub.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week