Nvidia Revs AI Engine All about Nvidia's new Blackwell architecture and B200 GPU

Published

Apr 03, 2024

Reading time

2 min read

Nvidia’s latest chip promises to boost AI’s speed and energy efficiency.

What’s new: The market leader in AI chips announced the B100 and B200 graphics processing units (GPUs) designed to eclipse its in-demand H100 and H200 chips. The company will also offer systems that integrate two, eight, and 72 chips.

How it works: The new chips are based on Blackwell, an updated chip architecture specialized for training and inferencing transformer models. Compared to Nvidia’s earlier Hopper architecture, used by H-series chips, Blackwell features hardware and firmware upgrades intended to cut the energy required for model training and inference.

Training a 1.8-trillion-parameter model (the estimated size of OpenAI’s GPT-4 and Beijing Academy of Artificial Intelligence’s WuDao) would require 2,000 Blackwell GPUs using 4 megawatts of electricity, compared to 8,000 Hopper GPUs using 15 megawatts, the company said.
Blackwell includes a second-generation Transformer Engine. While the first generation used 8 bits to process each neuron in a neural network, the new version can use as few as 4 bits, potentially doubling compute bandwidth.
A dedicated engine devoted to reliability, availability, and serviceability monitors the chip to identify potential faults. Nvidia hopes the engine can reduce compute times by minimizing chip downtime.
An upgraded version of the NVLink switch, which allows GPUs to communicate with each other, accommodates up to 1.8 terabytes of traffic in each direction, compared to Hopper’s maximum of 900 gigabytes. The architecture can handle up to 576 GPUs in combination, compared to Hopper’s cap of 256.
Nvidia doesn’t make it easy to compare the B200 with rival AMD’s top offering, the MI300X. Here are a few comparisons based on specs reported for Nvidia’s eight-GPU system: The B200 processes 4.5 dense/9 sparse PFLOPS at 8-bit precision, while the MI300X processes 2.61 dense/5.22 sparse PFLOPS at 8-bit precision. The B200 has 8TB/s peak memory bandwidth and 192GB of memory, while the MI300X has 5.3TB/s peak memory bandwidth and 192GB of memory.

Price and availability: The B200 will cost between $30,000 and $40,000, similar to the going rate for H100s today, Nvidia CEO Jensen Huang told CNBC. Nvidia did not specify when the chip would be available. Google, Amazon, and Microsoft stated intentions to offer Blackwell GPUs to their cloud customers.

Behind the news: Demand for the H100 chip has been so intense that the chip has been difficult to find, driving some users to adopt alternatives such as AMD’s MI300X. Moreover, in 2022, the U.S. restricted the export of H100s and other advanced chips to China. The B200 also falls under the ban.

Why it matters: Nvidia holds about 80 percent of the market for specialized AI chips. The new chips are primed to enable developers to continue pushing AI’s boundaries, training multi-trillion-parameter models and running more instances at once.

We’re thinking: Cathie Wood, author of ARK Invest’s “Big Ideas 2024” report, estimated that training costs are falling at a very rapid 75 percent annually, around half due to algorithmic improvements and half due to compute hardware improvements. Nvidia’s progress paints an optimistic picture of further gains. It also signals the difficulty of trying to use model training to build a moat around a business. It’s not easy to maintain a lead if you spend $100 million on training and next year a competitor can replicate the effort for $25 million.

Subscribe to The Batch