AMD’s flagship Intuition MI200 is on the verge of launch and it will likely be the first GPU for the HPC segment to feature an MCM design based mostly on the CDNA 2 structure. It appears just like the GPU will provide some insane efficiency numbers in comparison with the prevailing Intuition MI100 GPU with a 4x improve in FP16 compute.
AMD Intuition MI200 With CDNA 2 MCM GPU Design Heading To HPC Quickly, Options Monstrous Efficiency Numbers & A 4x Compute Enhance Over Intuition MI100
Now we have obtained to be taught the specs of the Intuition MI200 accelerator over time however its total efficiency figures have remained a thriller till now. Twitter insider and leaker, ExecutableFix, has shared the primary efficiency metrics for AMD’s CDNA 2 based mostly MCM GPU accelerator and it is a beast.
1.7GHz increase clock, such as you stated: very excessive 😜
— ExecutableFix (@ExecuFix) October 23, 2021
In response to tweets by ExecutableFix, the AMD Intuition MI200 will probably be rocking a clock velocity of as much as 1.7 GHz which is a 13% improve over the Intuition MI100. The CDNA 2 powered MCM GPU additionally rocks virtually twice the variety of stream processors at 14,080 cores, packed inside 220 Compute Items. Whereas it was anticipated that the GPU would rock 240 Compute models with 15,360 cores, the config is changed by a cut-down variant resulting from yields. With that stated, it’s attainable that we might even see the complete SKU launch sooner or later, providing even larger efficiency.
383 FP16/BF16
— ExecutableFix (@ExecuFix) October 23, 2021
When it comes to efficiency, the AMD Intuition MI200 HPC Accelerator goes to supply virtually 50 TFLOPs (47.9) TFLOPs of FP64 & FP32 compute horsepower. Versus the Instinct MI100, it is a 4.16x improve within the FP64 segement. Actually, the FP64 numbers of the MI200 exceed the FP32 efficiency of its predecessor. Transferring over to the FP16 and BF16 numbers, we’re an insane 383 TFLOPs of efficiency. For perspective, the MI100 solely provides 92.3 TFLOPs of peak BFloat16 efficiency and 184.6 TFLOPs peak FP16 efficiency.
As per HPCWire, the AMD Intuition MI200 will probably be powering three top-tier supercomputers which embody the US’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. The competitors contains the A100 80 GB which provides 19.5 TFLOPs of FP64, 156 TFLOPs of FP32 and 312 TFLOPs of FP16 compute energy. However we’re more likely to hear about NVIDIA’s personal Hopper MCM GPU subsequent yr so there’s going to be a heated competitors between the 2 GPU juggernauts in 2022.
This is What To Anticipate From AMD Intuition MI200 ‘CDNA 2’ GPU Accelerator
Contained in the AMD Intuition MI200 is an Aldebaran GPU that includes two dies, a secondary and a major. It has two dies with every consisting of 8 shader engines for a complete of 16 SE’s. Every Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Era Matrix Engine for FP16 & BF16 operations. Every die, as such, consists of 128 compute models or 8192 stream processors. This rounds as much as a complete of 220 compute models or 14,080 stream processors for the complete chip. The Aldebaran GPU can be powered by a brand new XGMI interconnect. Every chiplet contains a VCN 2.6 engine and the primary IO controller.
As for DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit vast bus interface. Every interface can help 2GB HBM2e DRAM modules. This could give us as much as 16 GB of HBM2e reminiscence capability per stack and since there are eight stacks in whole, the full quantity of capability can be a whopping 128 GB. That is 48 GB greater than the A100 which homes 80 GB HBM2e reminiscence. The complete visualization of the Aldebaran GPU on the Intuition MI200 is out there here.
AMD Radeon Intuition Accelerators 2020
Accelerator Title | AMD Intuition MI300 | AMD Intuition MI200 | AMD Intuition MI100 | AMD Radeon Intuition MI60 | AMD Radeon Intuition MI50 | AMD Radeon Intuition MI25 | AMD Radeon Intuition MI8 | AMD Radeon Intuition MI6 |
---|---|---|---|---|---|---|---|---|
GPU Structure | TBA (CDNA 3) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Course of Node | Superior Course of Node | Superior Course of Node | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Dies | 4 (MCM)? | 2 (MCM) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | 28,160? | 14,080? | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Pace | TBA | ~1700 MHz | ~1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
FP16 Compute | TBA | 383 TOPs | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBA | 95.8 TFLOPs | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBA | 47.9 TFLOPs | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBA | 64/128 GB HBM2e? | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Reminiscence Clock | TBA | TBA | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Reminiscence Bus | TBA | 8192-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Reminiscence Bandwidth | TBA | ~2 TB/s? | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Type Issue | TBA | Twin Slot, Full Size / OAM | Twin Slot, Full Size | Twin Slot, Full Size | Twin Slot, Full Size | Twin Slot, Full Size | Twin Slot, Half Size | Single Slot, Full Size |
Cooling | TBA | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP | TBA | TBA | 300W | 300W | 300W | 300W | 175W | 150W |