Announcement

Collapse
No announcement yet.

AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA

    Phoronix: AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA

    Last week I noted "GFX90A" appearing in the AMD LLVM back-end and now the AMDGPU Linux kernel driver patches have appeared for "Aldebaran" that appear to be the codename for the next-generation CDNA part making use of GFX90A...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.

    Comment


    • #3
      Originally posted by AlB80 View Post
      Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.
      What do you specifically mean? According to a spec sheet the 32-bit floating point rate is still double of the 64-bit rate....
      Or are you talking about integer?


      Originally posted by atomsymbol
      Computation MI100 GPU (Peak TFLOPS)
      Matrix FP16 184.6
      Matrix bf16 92.3
      Matrix FP32 46.1
      Vector FP32 23.1
      Vector FP64 11.5
      https://www.techpowerup.com/gpu-spec...chitecture.pdf

      Comment


      • #4
        Originally posted by AlB80 View Post
        Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.
        If only it were so simple: https://stackoverflow.com/q/29344800/122792.

        Yes if you build in 64 bit floats they take 2x slower than 32s. But if you don't build them, going back the other way is not possible at 1/2 speed. This feature is not commutative.

        So the GPU maker is giving something here that's not completely trivial. For him it is, for you it's not. The GPU maker can arguably rightly charge more given this asymmetric power relationship in his favour. The real problem is, this has historically been used to rip the customer off and is an example of market failure.

        So basically, given the market situation we find ourselves in, depending on pricing, this might indeed be an exciting feature.
        Last edited by vegabook; 24 February 2021, 10:09 PM.

        Comment


        • #5
          per-die basis on aldebaran
          MCM confirmed. CDNA GPUs will be composed of chiplets.

          Comment


          • #6
            Originally posted by Filiprino View Post
            MCM confirmed. CDNA GPUs will be composed of chiplets.
            Yeah its pretty much been a given that CNDA would get chiplets first, since its easier to implement for compute architectures.

            Comment


            • #7
              Originally posted by Filiprino View Post
              MCM confirmed. CDNA GPUs will be composed of chiplets.
              o yes... i like the chiplet design.... this will totally rip nvidia into pices.
              Phantom circuit Sequence Reducer Dyslexia

              Comment


              • #8
                Originally posted by Qaridarium View Post
                o yes... i like the chiplet design.... this will totally rip nvidia into pices.
                1.5 years ago, Nvidia already presented a prototype AI accelerator built with chiplets:



                And I would remind you that while AMD was first to use HBM (in Fury), Nvidia had a lot more financial success with their P100.

                Comment


                • #9
                  Originally posted by AlB80 View Post
                  Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.
                  How do you figure that? Sure, if you're reusing the same vector registers, you can pack in 2x as many 32-bit or 4x as many 16-bit values, but maybe they increased the width of each SIMD lane to 64-bits, without adding the ability to bifurcate them to hold 2x as many fp32? It's possible!

                  If the demand from HPC customers is for fp64, then more fp32 throughput could just go to waste! Keep in mind that GPUs generally don't implement denormals, so fp32 doesn't get you as far as on something like a x87 FPU. That strongly pushes anyone doing things like serious simulation or modelling to use fp64.

                  Heck, even doing matrix-inversions, you're likely to encounter numerical stability problems with non-denormal fp32.

                  Comment


                  • #10
                    Originally posted by vegabook View Post
                    So the GPU maker is giving something here that's not completely trivial. For him it is, for you it's not. The GPU maker can arguably rightly charge more given this asymmetric power relationship in his favour. The real problem is, this has historically been used to rip the customer off and is an example of market failure.
                    Woah, chill out, bruh. Cool yer nerd rage. There are reasons for the 2:1 ratio, a lot of it owing to vector-packing and the fact that register bits are historically expensive.

                    But, you know what's really expensive? fp64 multipliers. That's because the requisite silicon area increases as a square of the significand size (the same reason BFloat16 is preferred over IEEE 754 fp16) and with fp32 -> fp64, you're going from 24 to 53 bits.

                    So, I really don't see it as greedy hardware manufacturers holding out on us, by only building vectors with half as many fp64 elements. It just made sense, and was expensive to try and push beyond. The tech had to progress to the point where it made sense to expand fp64 beyond just scaling up the number of CUs/EUs/SMs and the market had to be there for an even higher fp64-demand than fp32, which is something that can happen with the CDNA architecture now so completely severed from the graphics world.

                    Comment

                    Working...
                    X