Announcement

**ctlansdown** · 12 October 2021, 03:13 PM

Why would it provide such a massive speedup over using AVX2? Or did Numpy not use any SIMD instructions at all?

**jabl** · 12 October 2021, 03:26 PM

Originally posted by ctlansdown View Post

Why would it provide such a massive speedup over using AVX2? Or did Numpy not use any SIMD instructions at all?

It's about using SVML, which provides very fast and vectorized (including avx-512 it appears) versions of math functions like sin(), cos(), log(), gamma() etc. etc.

Unfortunately SVML isn't open source, so you're not gonna see this performance with the out of the box numpy on your Linux distro.

**Jannik2099** · 12 October 2021, 03:32 PM

Originally posted by jabl View Post

Unfortunately SVML isn't open source, so you're not gonna see this performance with the out of the box numpy on your Linux distro.

> This open-source AVX-512 code originates from the Intel Short Vector Math Library (SVML) that they open-sourced the code from
read again? This is fully in upstream numpy.

**Jannik2099** · 12 October 2021, 03:33 PM

Originally posted by ctlansdown View Post

Why would it provide such a massive speedup over using AVX2? Or did Numpy not use any SIMD instructions at all?

AVX512 is not just twice as wide AVX2. It also introduces some really cool bitmask operations and it allows you to put control flow decisions into the avx512 operations (to some degree) - this means you have to leave the FPU less often and can provide significant speedups

**smitty3268** · 12 October 2021, 03:34 PM

Originally posted by jabl View Post

It's about using SVML, which provides very fast and vectorized (including avx-512 it appears) versions of math functions like sin(), cos(), log(), gamma() etc. etc.

Unfortunately SVML isn't open source, so you're not gonna see this performance with the out of the box numpy on your Linux distro.

It looks like the code this is using is BSD licensed, available here: https://github.com/numpy/svml

Anyway, I believe ctlansdown is right and numpy must not have been using vectorized instructions at all on certain operations in order to get this kind of speedup.

**RedEyed** · 12 October 2021, 03:36 PM

Unfortunately, these who buy hardware explicitly for computation tasks, chose Intel.
It would be nice if AMD contribute to compute libraries such as numpy at least

**coder** · 12 October 2021, 03:51 PM

If Zen 4 has AVX-512 as rumored, AMD should send Intel a "thank you" cake. Considering Alder Lake removed AVX-512, this is a double-win for AMD.

**numacross** · 12 October 2021, 03:54 PM

Originally posted by RedEyed View Post

Unfortunately, these who buy hardware explicitly for computation tasks, chose Intel.

Don't you mean Nvidia?

**RedEyed** · 12 October 2021, 04:03 PM

Originally posted by numacross View Post

Don't you mean Nvidia?

I was talking about CPU

Announcement

Intel Contributes AVX-512 Optimizations To Numpy, Yields Massive Speedups

Intel Contributes AVX-512 Optimizations To Numpy, Yields Massive Speedups

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment