Announcement

**brucethemoose** · 08 June 2022, 03:08 PM

As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.

**jacob** · 09 June 2022, 12:04 AM

Originally posted by brucethemoose View Post

As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.

It remains to be seen how much of an edge it's really going to be. There are still no ARM cores that can even compete with x86 in absolute performance.

**carewolf** · 09 June 2022, 09:04 AM

Originally posted by brucethemoose View Post

As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.

This implementation is 32bytes at a time, so 32 * 8 = 256bit, so similar to AVX2.

**coder** · 09 June 2022, 12:19 PM

Originally posted by brucethemoose View Post

... and TME's ubiquity is going to give ARM a huge edge over x86, ... (and nothing like TSX) for the foreseeable future.

I'm trying to understand your point about TSX. So, you agree that it was comparable to TME, but you're concerned that it's gone and doesn't appear to be coming back?

**WorBlux** · 09 June 2022, 12:22 PM

Originally posted by atomsymbol

Zen 4 (to be released this year) with AVX-512 support will most likely force Intel to add AVX-512 to most of their desktop CPUs. Given the fact that operating systems (Linux, Windows) are incapable of adding any kind of good support for hetero-ISA CPUs, it is probable that even Intel's E-cores will start having AVX-512 sometime in the future (maybe implemented using 256-bit FMA ALUs). According to my short communication with Torvalds on RWT forums, he firmly believes the invalid idea that supporting hetero-ISA CPUs in Linux can be implemented by something close to a "single-line patch" somewhere in the kernel and puts the blame for Linux not being able to support hetero-ISA CPUs solely on Intel.

Researchers have built Linux systems with completely heterogeneous cores (mips, arm, and x86) and it worked fine and were more power-effecient than the homogeneous systems they tested against. . I agree with Linus that slightly different ISA's in the same family is barely an inconvenience. Not a single line patch, but still well withing the expertise and ability of the Linux maintainers. But inel released micro-code that shut it off before anyone really had a chance to play with it.

**coder** · 09 June 2022, 12:23 PM

Originally posted by carewolf View Post

This implementation is 32bytes at a time, so 32 * 8 = 256bit, so similar to AVX2.

You mean the Neoverse V1 cores? Yes, I was surprised they went with only 256-bit. Fujitsu's A64FX, of 3 years ago, already used SVE @ 512-bit.

**coder** · 09 June 2022, 12:29 PM

Originally posted by atomsymbol

According to my short communication with Torvalds on RWT forums, he firmly believes the invalid idea that supporting hetero-ISA CPUs in Linux can be implemented by something close to a "single-line patch" somewhere in the kernel

It could be made to work, but would perform like garbage on hetero-naive software. And for software not naive of it, it puts a lot of complexity on applications that they really shouldn't have to manage.

**coder** · 09 June 2022, 12:34 PM

Originally posted by WorBlux View Post

I agree with Linus that slightly different ISA's in the same family is barely an inconvenience. Not a single line patch, but still well withing the expertise and ability of the Linux maintainers.

I disagree. You & Linus aren't thinking hard enough about the practical realities of software, on such a system. What's going to happen is that software will spawn too many threads, they'll get faulted off of the weak cores, and will simply contend for time on the more capable cores.

Often, apps are ignorant of what ISA extensions the libraries they're using even employ. So, putting the burden on the app developer to manage threads and affinities based on core capabilities is unreasonable and unrealistic.

**carewolf** · 09 June 2022, 12:51 PM

Originally posted by coder View Post

You mean the Neoverse V1 cores? Yes, I was surprised they went with only 256-bit. Fujitsu's A64FX, of 3 years ago, already used SVE @ 512-bit.

No, I mean this code the story is about. It operates on 32 bytes at a time aka 256bits, so it won't run faster on a 512bit implementation (well at least according to this summary).

Announcement

Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance

Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment