Announcement

Collapse
No announcement yet.

Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance

    Phoronix: Glibc Adds Arm SVE-Optimized Memory Copy - Can "Significantly" Help Performance

    The GNU C Library (Glibc) now has a memory copy (memcpy) implementation optimized for Arm's Scalable Vector Extension (SVE) that can "significantly" improve performance...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.

    Comment


    • #3
      Originally posted by brucethemoose View Post
      As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.
      It remains to be seen how much of an edge it's really going to be. There are still no ARM cores that can even compete with x86 in absolute performance.

      Comment


      • #4
        Originally posted by brucethemoose View Post
        As I've been saying, SVE2's and TME's ubiquity is going to give ARM a huge edge over x86, which is more or less stuck at an AVX2 baseline (and nothing like TSX) for the foreseeable future.
        This implementation is 32bytes at a time, so 32 * 8 = 256bit, so similar to AVX2.

        Comment


        • #5
          Originally posted by brucethemoose View Post
          ... and TME's ubiquity is going to give ARM a huge edge over x86, ... (and nothing like TSX) for the foreseeable future.
          I'm trying to understand your point about TSX. So, you agree that it was comparable to TME, but you're concerned that it's gone and doesn't appear to be coming back?

          Comment


          • #6
            Originally posted by atomsymbol

            Zen 4 (to be released this year) with AVX-512 support will most likely force Intel to add AVX-512 to most of their desktop CPUs. Given the fact that operating systems (Linux, Windows) are incapable of adding any kind of good support for hetero-ISA CPUs, it is probable that even Intel's E-cores will start having AVX-512 sometime in the future (maybe implemented using 256-bit FMA ALUs). According to my short communication with Torvalds on RWT forums, he firmly believes the invalid idea that supporting hetero-ISA CPUs in Linux can be implemented by something close to a "single-line patch" somewhere in the kernel and puts the blame for Linux not being able to support hetero-ISA CPUs solely on Intel.
            Researchers have built Linux systems with completely heterogeneous cores (mips, arm, and x86) and it worked fine and were more power-effecient than the homogeneous systems they tested against. . I agree with Linus that slightly different ISA's in the same family is barely an inconvenience. Not a single line patch, but still well withing the expertise and ability of the Linux maintainers. But inel released micro-code that shut it off before anyone really had a chance to play with it.

            Comment


            • #7
              Originally posted by carewolf View Post
              This implementation is 32bytes at a time, so 32 * 8 = 256bit, so similar to AVX2.
              You mean the Neoverse V1 cores? Yes, I was surprised they went with only 256-bit. Fujitsu's A64FX, of 3 years ago, already used SVE @ 512-bit.

              Comment


              • #8
                Originally posted by atomsymbol
                According to my short communication with Torvalds on RWT forums, he firmly believes the invalid idea that supporting hetero-ISA CPUs in Linux can be implemented by something close to a "single-line patch" somewhere in the kernel
                It could be made to work, but would perform like garbage on hetero-naive software. And for software not naive of it, it puts a lot of complexity on applications that they really shouldn't have to manage.

                Comment


                • #9
                  Originally posted by WorBlux View Post
                  I agree with Linus that slightly different ISA's in the same family is barely an inconvenience. Not a single line patch, but still well withing the expertise and ability of the Linux maintainers.
                  I disagree. You & Linus aren't thinking hard enough about the practical realities of software, on such a system. What's going to happen is that software will spawn too many threads, they'll get faulted off of the weak cores, and will simply contend for time on the more capable cores.

                  Often, apps are ignorant of what ISA extensions the libraries they're using even employ. So, putting the burden on the app developer to manage threads and affinities based on core capabilities is unreasonable and unrealistic.

                  Comment


                  • #10
                    Originally posted by coder View Post
                    You mean the Neoverse V1 cores? Yes, I was surprised they went with only 256-bit. Fujitsu's A64FX, of 3 years ago, already used SVE @ 512-bit.
                    No, I mean this code the story is about. It operates on 32 bytes at a time aka 256bits, so it won't run faster on a 512bit implementation (well at least according to this summary).

                    Comment

                    Working...
                    X