Announcement

**coder** · 17 August 2023, 06:30 AM

Someone please correct me if I'm wrong, but I did actually try looking at the AVX10.1 Architecture Specification and it seemed to me that AVX10/512 is the same as legacy AVX-512. It's only AVX10/256 where the opmask register got shortened from 64-bit to 32-bit (and obviously you can't use 512-bit operands). Is that right?

So, the only other thing I believe is in AVX10.1 is just the CPUID extension. Correct?

**user556** · 17 August 2023, 06:51 AM

Yeah, call it what it really is - rebranding ... with more fragmentation added.

**chuckula** · 17 August 2023, 08:45 AM

Originally posted by coder View Post

Someone please correct me if I'm wrong, but I did actually try looking at the AVX10.1 Architecture Specification and it seemed to me that AVX10/512 is the same as legacy AVX-512. It's only AVX10/256 where the opmask register got shortened from 64-bit to 32-bit (and obviously you can't use 512-bit operands). Is that right?

So, the only other thing I believe is in AVX10.1 is just the CPUID extension. Correct?

I wouldn't call that "legacy" I'd call it a major update to 256 bit AVX since AVX 512 includes a large number of powerful instructions that were only available in 512 bit versions before now.

https://news.ycombinator.com/item?id=36396891

**coder** · 17 August 2023, 10:00 AM

Originally posted by chuckula View Post

I wouldn't call that "legacy" I'd call it a major update to 256 bit AVX since AVX 512 includes a large number of powerful instructions that were only available in 512 bit versions before now.

https://news.ycombinator.com/item?id=36396891

No, you're conflating two different things. AVX/AVX2 is limited to 256-bit vectors, but AVX-512 could already support 256-bit operands. AVX10.1/256 doesn't implement any operations on 256-bit vectors that you couldn't already do on 256-bit operands using the latest CPUs supporting AVX-512.

AVX-512 can operate on 128-bit, 256-bit, and 512-bit operands. With AVX10/256, all they're doing is removing support for the 512-bit length.

**AdrianBc** · 18 August 2023, 03:02 AM

Originally posted by coder View Post

Someone please correct me if I'm wrong, but I did actually try looking at the AVX10.1 Architecture Specification and it seemed to me that AVX10/512 is the same as legacy AVX-512. It's only AVX10/256 where the opmask register got shortened from 64-bit to 32-bit (and obviously you can't use 512-bit operands). Is that right?

So, the only other thing I believe is in AVX10.1 is just the CPUID extension. Correct?

AVX10 is a rebranding of AVX-512, which was created in order to enable the support of a 256-bit subset of AVX-512 on the Intel E-cores and consumer P-cores, in the models that will be launched in 2025.

Intel has used this opportunity to simplify the identification of the AVX-512 features with CPUID. Currently there are a huge number of different feature flags that must be checked. In the future only the version number of AVX10 will need to be checked, because it will be guaranteed that later versions will include all the features of earlier versions. So AVX10.1 will be the first version, AVX10.2 will be the second version, AVX10.3 will be the third version and so on.

Besides checking the AVX10 version number, an application will have to check whether 512-bit instructions are supported. Instructions with 256-bit or shorter operands will be supported on all cores, while 512-bit instructions will be supported only on certain server CPU models having only P-cores.

AVX10.1 will be the AVX-512 version supported by the Intel Granite Rapids server CPUs, which will be launched in 2024. They will support the full 512-bit instruction set.

AVX10.2 will be supported by the next generation of products expected in 2025, most of which will support only the 256-bit subset of it (i.e. only the successor of Granite Rapids will support 512-bit instructions).

All the Intel products that include E-cores and that are expected in 2024, i.e. Arrow Lake, Arrow Lake S, Lunar Lake, Sierra Forest and Grand Ridge, will support only AVX2 augmented with a few AVX-512 instructions re-encoded with the VEX instruction prefix of AVX.

**coder** · 18 August 2023, 03:07 AM

Thanks. That all basically aligns with my take, as well.

Originally posted by AdrianBc View Post

AVX10.1 will be the AVX-512 version supported by the Intel Granite Rapids server CPUs, which will be launched in 2024. They will support the full 512-bit instruction set.

Are we sure no client processor, like Lunar Lake, will support AVX10.1/256?

**AdrianBc** · 18 August 2023, 04:24 AM

Originally posted by coder View Post

Thanks. That all basically aligns with my take, as well.

Are we sure no client processor, like Lunar Lake, will support AVX10.1/256?

If we trust what Intel says, we are certain.

The instructions supported by all the Intel products that will be launched in 2024 are listed in "Intel® Architecture Instruction Set Extensions and Future Features":

https://cdrdv2-public.intel.com/782879/architecture-instruction-set-extensions-programming-reference.pdf

This document was released in June, before the AVX10 document, so this name is not mentioned in it.

In the AVX10 document, it is stated that AVX10.1 is the same with the AVX-512 of Granite Rapids.

Moreover, it is impossible for AVX10.1 to implement only the 256-bit subset.

In the original AVX-512 instruction format, when 128-bit and 256-bit instructions have been added in Skylake Server, they have reused for encoding the operand size a couple of bits that previously encoded the operand type for certain kinds of conversion instructions.

Because of this dual-use of those instruction bits, these conversion instructions could be encoded only for 512-bit registers.

So to allow the removal of the 512-bit instructions without also removing these conversion instructions, a new encoding for them was required. This new encoding has been added in AVX10.2, so that is the first version of AVX-512 where it is possible to enable only the 256-bit or shorter instructions, without the 512-bit instructions.

**coder** · 18 August 2023, 07:25 AM

Originally posted by AdrianBc View Post

Moreover, it is impossible for AVX10.1 to implement only the 256-bit subset.

In the original AVX-512 instruction format, when 128-bit and 256-bit instructions have been added in Skylake Server, they have reused for encoding the operand size a couple of bits that previously encoded the operand type for certain kinds of conversion instructions.

Because of this dual-use of those instruction bits, these conversion instructions could be encoded only for 512-bit registers.

So to allow the removal of the 512-bit instructions without also removing these conversion instructions, a new encoding for them was required. This new encoding has been added in AVX10.2, so that is the first version of AVX-512 where it is possible to enable only the 256-bit or shorter instructions, without the 512-bit instructions.

Wow! I didn't see that! I just saw a statement that AVX-512 instructions with 256-bit operands would be incompatible if they used 64-bit opmask operands (instead of 32-bit ones). But, that doesn't rule out what you said.

So, answer me this: should we not expect to see 512-bit code become scarce? If someone is writing code meant to run on client processors, as well as servers, why would they bother to maintain a separate 512-bit codepath, unless they absolutely needed every last ounce of performance a server CPU could provide? The disincentive becomes even greater, if AMD holds their implementation at 256-bits, since it means 512-bit operands would only get you more actual throughput on Intel Xeons.

**AdrianBc** · 19 August 2023, 01:51 AM

Originally posted by coder View Post

Wow! I didn't see that! I just saw a statement that AVX-512 instructions with 256-bit operands would be incompatible if they used 64-bit opmask operands (instead of 32-bit ones). But, that doesn't rule out what you said.

So, answer me this: should we not expect to see 512-bit code become scarce? If someone is writing code meant to run on client processors, as well as servers, why would they bother to maintain a separate 512-bit codepath, unless they absolutely needed every last ounce of performance a server CPU could provide? The disincentive becomes even greater, if AMD holds their implementation at 256-bits, since it means 512-bit operands would only get you more actual throughput on Intel Xeons.

Like I have said, the encoding problem solved only in AVX10.2 is that there are 2 bits in the EVEX prefix, EVEX.L’L, which in the previous versions of AVX-512 can be interpreted as either vector length or as rounding control, and the latter interpretation is possible only when the instruction uses 512-bit registers, so disabling the 512-bit instructions also disables the embedded rounding control.

Whether 512-bit code will become scarce depends on how AMD will react to AVX10.

There is no need for the width of the SIMD registers to be matched to the size of the SIMD execution units, and in most older CPUs and GPUs they were not matched. Most AMD GPUs have registers that are wider than the execution units, so AMD had experience with this when they have designed Zen 4, and they also had the experience of designing Zen 1, which also had different sizes for the SIMD registers and SIMD execution units.

So if AMD will continue to implement the complete 512-bit AVX10 ISA in all their cores, but with 256-bit execution units, like in Zen 4, they will continue to have smaller design costs than Intel, which must design both 256-bit E-cores and 512-bit P-cores for server CPUs, and they will have only very slightly greater production costs than Intel, due to a slightly more complex instruction decoder, while having a significant competitive advantage versus Intel.

AMD's strategy of having E-cores that are identical logically to the P-cores, and which differ only by having a smaller cache memory and by having a different physical design, with a lower clock frequency target, is a much better choice than Intel's, both for AMD due to lower design costs, and for its customers, due to better performance of the E-cores and to lower software market fragmentation.

So now the fate of 512-bit code depends on AMD. Zen 4 has excellent gains from 512-bit code, because its performance is usually limited by instruction fetching and decoding (up to 4 IPC), or by micro-operation dispatching (up to 6 IPC), not by the execution units. Future AMD CPUs may have wider front-ends, so they may have smaller gains from 512-bit code, but nonetheless, wider front-ends are very expensive in die area, so it is likely that 512-bit code will always be a better choice for improved performance.

So I believe that AMD will keep implementing the 512-bit ISA, in which case the software developers will be unable to ignore this option, so all code where performance is important will need to be able to use 512-bit instructions, where available.

Announcement

Initial AVX10.1 Support Merged Into The GCC Compiler

Initial AVX10.1 Support Merged Into The GCC Compiler

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment