Announcement

Collapse
No announcement yet.

Google Engineer Introduces "Light AVX" Support Within LLVM

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Google Engineer Introduces "Light AVX" Support Within LLVM

    Phoronix: Google Engineer Introduces "Light AVX" Support Within LLVM

    Google engineer Ilya Tocar has introduced the notion of "light" AVX support within the LLVM compiler infrastructure for utilizing some benefits of Advanced Vector Extensions (AVX) but trying to avoid the power/frequency impact that AVX-512 use has on older generations of Intel processors...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I understand avoiding AVX512 but is the frequency drop from using 256 bit AVX so signifficant that anyone would prefer 128bit?

    Comment


    • #3
      Well just the AVX three argument versions of the SSE instructions speed FP calculations up with 5-10% and that is just different instruction encodings.
      Last edited by carewolf; 26 January 2023, 10:09 PM.

      Comment


      • #4
        The compiler flag enables certain 256-bit AVX funtionality on top of 128-bit AVX. There was no frequency drop with the "light AVX" enabled.

        Yeah. As Linus said about AVX-512, "people use it for memcpy". x86_64 instruction set is a mess: https://m.youtube.com/watch?v=g9_FYRAfyqQ

        Comment


        • #5
          What about AMD?

          Comment


          • #6
            Originally posted by Azrael5 View Post
            What about AMD?
            AMD started supporting AVX-512 in Zen 4 and there is no such frequency drop as with intel processors. AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.
            Compared to recent Intel Designs, you have to look at an instruction level, for example the Shuffle instructions are really fast on AMD as they spent a lot of silicon for them (shuffle unit is native 512bit wide and can run most shuffle instructions in 1 cycle).

            Comment


            • #7
              Originally posted by Spacefish View Post

              AMD started supporting AVX-512 in Zen 4 and there is no such frequency drop as with intel processors. AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.
              Compared to recent Intel Designs, you have to look at an instruction level, for example the Shuffle instructions are really fast on AMD as they spent a lot of silicon for them (shuffle unit is native 512bit wide and can run most shuffle instructions in 1 cycle).
              Many thanks for the answer, very gentle.

              Comment


              • #8
                I really like the idea, here. I also wonder if the set of "free" instructions also extends to trivial operations, like bitwise arithmetic. Furthermore, whether the developer is looking only at Skylake CPUs or also newer AMD designs, like Zen 2 and later.

                Comment


                • #9
                  Originally posted by Spacefish View Post
                  AMD started supporting AVX-512
                  The article & LLVM work seems focused on plain AVX, not AVX-512.

                  That also makes more sense for LLVM to exploit, since a lot more CPUs in the wild support AVX than AVX-512.

                  Originally posted by Spacefish View Post
                  in Zen 4 and there is no such frequency drop as with intel processors.
                  I'm sure there is. It's just not as prominent. Michael doesn't publish enough data that we can rule it out, as the only clock speed data he shows is that of the highest-clocked core.

                  Originally posted by Spacefish View Post
                  AMD uses a dual-issue internally for most operations (2x 256bit wide operations) but it´s still faster than intels implementation up to icelake.
                  Zen 4 can only issue one AVX-512 FMA per cycle, whereas Intel was shipping dual-FMA cores even as far back as Skylake SP. So, for FMA-heavy workloads, Intel's implementations still have higher theoretical throughput.

                  However, if by "faster" you don't just mean in terms of clock cycles, but actually realtime performance, then obviously AMD wins against everything but Sapphire Rapids (assuming comparable core counts), due to the aforementioned clock speed penalties affecting the older Intel CPUs. That said, now you're comparing CPUs of very different vintages and manufacturing tech, so we can't simply chalk this up to AMD having a superior design (although that could be part of it).

                  Comment

                  Working...
                  X