Announcement

**kobblestown** · 25 January 2023, 11:00 AM

I understand avoiding AVX512 but is the frequency drop from using 256 bit AVX so signifficant that anyone would prefer 128bit?

**carewolf** · 25 January 2023, 11:55 AM

Well just the AVX three argument versions of the SSE instructions speed FP calculations up with 5-10% and that is just different instruction encodings.

**direc85** · 25 January 2023, 12:01 PM

The compiler flag enables certain 256-bit AVX funtionality on top of 128-bit AVX. There was no frequency drop with the "light AVX" enabled.

Yeah. As Linus said about AVX-512, "people use it for memcpy". x86_64 instruction set is a mess: https://m.youtube.com/watch?v=g9_FYRAfyqQ

**Azrael5** · 25 January 2023, 12:54 PM

What about AMD?

**Spacefish** · 25 January 2023, 08:34 PM

Originally posted by Azrael5 View Post

What about AMD?

AMD started supporting AVX-512 in Zen 4 and there is no such frequency drop as with intel processors. AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.
Compared to recent Intel Designs, you have to look at an instruction level, for example the Shuffle instructions are really fast on AMD as they spent a lot of silicon for them (shuffle unit is native 512bit wide and can run most shuffle instructions in 1 cycle).

**Azrael5** · 30 January 2023, 08:47 AM

Originally posted by Spacefish View Post

AMD started supporting AVX-512 in Zen 4 and there is no such frequency drop as with intel processors. AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.
Compared to recent Intel Designs, you have to look at an instruction level, for example the Shuffle instructions are really fast on AMD as they spent a lot of silicon for them (shuffle unit is native 512bit wide and can run most shuffle instructions in 1 cycle).

Many thanks for the answer, very gentle.

**coder** · 15 February 2023, 06:36 AM

I really like the idea, here. I also wonder if the set of "free" instructions also extends to trivial operations, like bitwise arithmetic. Furthermore, whether the developer is looking only at Skylake CPUs or also newer AMD designs, like Zen 2 and later.

**coder** · 15 February 2023, 06:44 AM

Originally posted by Spacefish View Post

AMD started supporting AVX-512

The article & LLVM work seems focused on plain AVX, not AVX-512.

That also makes more sense for LLVM to exploit, since a lot more CPUs in the wild support AVX than AVX-512.

Originally posted by Spacefish View Post

in Zen 4 and there is no such frequency drop as with intel processors.

I'm sure there is. It's just not as prominent. Michael doesn't publish enough data that we can rule it out, as the only clock speed data he shows is that of the highest-clocked core.

Originally posted by Spacefish View Post

AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.

Zen 4 can only issue one AVX-512 FMA per cycle, whereas Intel was shipping dual-FMA cores even as far back as Skylake SP. So, for FMA-heavy workloads, Intel's implementations still have higher theoretical throughput.

However, if by "faster" you don't just mean in terms of clock cycles, but actually realtime performance, then obviously AMD wins against everything but Sapphire Rapids (assuming comparable core counts), due to the aforementioned clock speed penalties affecting the older Intel CPUs. That said, now you're comparing CPUs of very different vintages and manufacturing tech, so we can't simply chalk this up to AMD having a superior design (although that could be part of it).

Announcement

Google Engineer Introduces "Light AVX" Support Within LLVM

Google Engineer Introduces "Light AVX" Support Within LLVM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment