WEBVTT

00:00:00.000 --> 00:00:05.440
When you fire up chat GPT, you're connecting to a big silicon brain that lives somewhere and

00:00:05.440 --> 00:00:12.320
contains something. So what exactly is that thing? What is the hardware that runs the chat bot that

00:00:12.320 --> 00:00:17.040
you've fallen in love with? There's actually a sort of basic building block of the chat GPT

00:00:17.040 --> 00:00:21.840
infrastructure, the NVIDIA A100 GPU. And if you thought the graphics card in your computer was

00:00:21.840 --> 00:00:28.080
expensive, the A100 goes for around $10,000 a pop, roughly the same as six RTX 4090s.

00:00:28.080 --> 00:00:33.280
Artificial intelligence applications often utilize GPUs because GPUs are very good at doing lots of

00:00:33.280 --> 00:00:38.560
math calculations at once in parallel. And NVIDIA's newer models have the tensor cores,

00:00:38.560 --> 00:00:43.360
which are good at matrix operations that AIs frequently use. So even though the A100 is called

00:00:43.360 --> 00:00:49.440
a GPU, it's built specifically for AI and analytical applications. And as such, you can't

00:00:49.440 --> 00:00:54.160
realistically game on it. It doesn't even have a display out. Although you can get the A100 in a

00:00:54.240 --> 00:00:58.800
PCI Express version, such as the one Linus is holding up here, it's more common in data centers

00:00:58.800 --> 00:01:05.600
for them to come in this form factor called SXM4. Unlike a normal graphics card, the SXM4 cards

00:01:05.600 --> 00:01:12.240
lie flat and connect to a large motherboard like PCB using a pair of sockets with the connectors

00:01:12.240 --> 00:01:17.280
sitting on the underside of the card. Although SXM is just a connector and data is still carried

00:01:17.280 --> 00:01:23.600
over a PCI Express interface, SXM4 is preferred over the traditional PCIe slot for data centers

00:01:23.600 --> 00:01:28.800
because the socket can handle more electrical power. The PCIe version of the A100 can use a

00:01:28.800 --> 00:01:34.800
max of up to 300 watts, but the SXM4 version handles up to 500 watts, leading to higher

00:01:34.800 --> 00:01:43.120
performance. An SXM4 A100 has 312 teraflops of FP16 processing power. To put that in context,

00:01:43.120 --> 00:01:48.080
that's nearly four times as much as an RTX 4090, the most powerful consumer GPU on the market at

00:01:48.080 --> 00:01:53.280
the time of filming. Additionally, these GPUs are linked up with a high-speed NVLink interconnect

00:01:53.280 --> 00:01:59.680
so that the GPUs that sit on a single board can act like a single gigantic Chungish GPU.

00:01:59.680 --> 00:02:04.720
Now that you know what lies at the heart of the GPU servers, though, exactly how many A100s are

00:02:04.720 --> 00:02:10.160
needed to keep the service running for 100 million users. It turns out you can run ChatGPT

00:02:10.160 --> 00:02:18.080
just fine on its own on a single NVIDIA HGX A100 unit. These units typically contain 8 A100 GPUs

00:02:18.080 --> 00:02:23.680
in one machine powered by a pair of server CPUs that each feature a few dozen cores.

00:02:23.680 --> 00:02:28.640
But the issue is that with so many users, you need a lot more processing power to ensure the

00:02:28.640 --> 00:02:33.840
chatbot can answer query smoothly for everyone. OpenAI and Microsoft, who are behind the ChatGPT

00:02:33.840 --> 00:02:38.560
project, haven't disclosed exact numbers about their hardware, but given the processing capacity

00:02:38.560 --> 00:02:46.080
of these HGX A100 systems, ChatGPT likely uses somewhere around 30,000 A100s to keep up with

00:02:46.080 --> 00:02:52.160
demand. To put that into context, it's a heck of a lot more than the roughly 4 or 5,000 they likely

00:02:52.160 --> 00:02:56.560
needed to train the language model in the first place. Training is the process of feeding the AI

00:02:56.560 --> 00:03:00.960
lots of information in order to build it out before it can be used publicly. Intuitively,

00:03:00.960 --> 00:03:05.520
it might seem like the training process would need more processing power than actually running

00:03:05.520 --> 00:03:12.000
the model. But because of the massive amount of IO ChatGPT has to handle with 100 million users,

00:03:12.000 --> 00:03:18.080
it ends up actually requiring roughly 6 times more GPUs to run it. And with as pricey as these

00:03:18.080 --> 00:03:23.120
systems are, you can bet this meant a massive investment on Microsoft's part. While the

00:03:23.120 --> 00:03:28.240
actual dollar amount hasn't been disclosed, we do know that it was in the hundreds of millions of

00:03:28.240 --> 00:03:34.240
dollars in addition to several hundred grand a day just to keep the system running. Unless you

00:03:34.240 --> 00:03:39.280
think that Microsoft is going to stop there, the company is also integrating the newer NVIDIA

00:03:39.280 --> 00:03:46.560
H100 GPUs into its Azure Cloud AI service, which actually dwarf the A100's FP16 performance

00:03:46.560 --> 00:03:53.600
by a factor of 6. In addition to adding FP8 support, which should prove to be very useful for AI

00:03:53.600 --> 00:03:58.080
due to how the math calculations involved in running AI models work. Not only will this ensure

00:03:58.080 --> 00:04:04.000
that more people can use ChatGPT and other AI services, but will also allow Microsoft to train

00:04:04.000 --> 00:04:08.720
more complicated large language models. Maybe soon, you'll be able to completely replace those

00:04:08.720 --> 00:04:13.040
pesky real life friends of yours. So thanks for watching guys, if you liked this video hit like,

00:04:13.040 --> 00:04:17.280
hit subscribe, and hit us up in the comments section with your suggestions for topics that we

00:04:17.280 --> 00:04:18.880
should cover in the future.
