Announcement

**AlB80** · 24 February 2021, 08:04 PM

Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.

**tildearrow** · 24 February 2021, 08:40 PM

Originally posted by AlB80 View Post

Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.

What do you specifically mean? According to a spec sheet the 32-bit floating point rate is still double of the 64-bit rate....
Or are you talking about integer?

Originally posted by atomsymbol

Computation	MI100 GPU (Peak TFLOPS)
Matrix FP16	184.6
Matrix bf16	92.3
Matrix FP32	46.1
Vector FP32	23.1
Vector FP64	11.5

https://www.techpowerup.com/gpu-spec...chitecture.pdf

**vegabook** · 24 February 2021, 09:41 PM

Originally posted by AlB80 View Post

Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.

If only it were so simple: https://stackoverflow.com/q/29344800/122792.

Yes if you build in 64 bit floats they take 2x slower than 32s. But if you don't build them, going back the other way is not possible at 1/2 speed. This feature is not commutative.

So the GPU maker is giving something here that's not completely trivial. For him it is, for you it's not. The GPU maker can arguably rightly charge more given this asymmetric power relationship in his favour. The real problem is, this has historically been used to rip the customer off and is an example of market failure.

So basically, given the market situation we find ourselves in, depending on pricing, this might indeed be an exciting feature.

**Filiprino** · 24 February 2021, 09:58 PM

per-die basis on aldebaran

MCM confirmed. CDNA GPUs will be composed of chiplets.

**cb88** · 24 February 2021, 10:15 PM

Originally posted by Filiprino View Post

MCM confirmed. CDNA GPUs will be composed of chiplets.

Yeah its pretty much been a given that CNDA would get chiplets first, since its easier to implement for compute architectures.

**qarium** · 24 February 2021, 11:11 PM

Originally posted by Filiprino View Post

MCM confirmed. CDNA GPUs will be composed of chiplets.

o yes... i like the chiplet design.... this will totally rip nvidia into pices.

**coder** · 25 February 2021, 02:09 AM

Originally posted by Qaridarium View Post

o yes... i like the chiplet design.... this will totally rip nvidia into pices.

1.5 years ago, Nvidia already presented a prototype AI accelerator built with chiplets:

Hot Chips 31 Live Blogs: NVIDIA Multi-Chip AI Accelerator at 128 TOPS

https://www.anandtech.com/show/14767/hot-chips-31-live-blogs-nvidia-multichip-ai-accelerator-at-128-tops

And I would remind you that while AMD was first to use HBM (in Fury), Nvidia had a lot more financial success with their P100.

**coder** · 25 February 2021, 02:18 AM

Originally posted by AlB80 View Post

Full rate FP64 is not an exciting feature. It means that 32(/16/8)-bit ALU are halved. By GPU nature (data pump) a data processing rate depends on operand size.

How do you figure that? Sure, if you're reusing the same vector registers, you can pack in 2x as many 32-bit or 4x as many 16-bit values, but maybe they increased the width of each SIMD lane to 64-bits, without adding the ability to bifurcate them to hold 2x as many fp32? It's possible!

If the demand from HPC customers is for fp64, then more fp32 throughput could just go to waste! Keep in mind that GPUs generally don't implement denormals, so fp32 doesn't get you as far as on something like a x87 FPU. That strongly pushes anyone doing things like serious simulation or modelling to use fp64.

Heck, even doing matrix-inversions, you're likely to encounter numerical stability problems with non-denormal fp32.

**coder** · 25 February 2021, 02:38 AM

Originally posted by vegabook View Post

So the GPU maker is giving something here that's not completely trivial. For him it is, for you it's not. The GPU maker can arguably rightly charge more given this asymmetric power relationship in his favour. The real problem is, this has historically been used to rip the customer off and is an example of market failure.

Woah, chill out, bruh. Cool yer nerd rage. There are reasons for the 2:1 ratio, a lot of it owing to vector-packing and the fact that register bits are historically expensive.

But, you know what's really expensive? fp64 multipliers. That's because the requisite silicon area increases as a square of the significand size (the same reason BFloat16 is preferred over IEEE 754 fp16) and with fp32 -> fp64, you're going from 24 to 53 bits.

So, I really don't see it as greedy hardware manufacturers holding out on us, by only building vectors with half as many fp64 elements. It just made sense, and was expensive to try and push beyond. The tech had to progress to the point where it made sense to expand fp64 beyond just scaling up the number of CUs/EUs/SMs and the market had to be there for an even higher fp64-demand than fp32, which is something that can happen with the CDNA architecture now so completely severed from the graphics world.

Announcement

AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA

AMD Radeon "Aldebaran" GPU Support Published For Next-Gen CDNA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment