1
00:00:00,000 --> 00:00:05,440
When you fire up chat GPT, you're connecting to a big silicon brain that lives somewhere and

2
00:00:05,440 --> 00:00:12,320
contains something. So what exactly is that thing? What is the hardware that runs the chat bot that

3
00:00:12,320 --> 00:00:17,040
you've fallen in love with? There's actually a sort of basic building block of the chat GPT

4
00:00:17,040 --> 00:00:21,840
infrastructure, the NVIDIA A100 GPU. And if you thought the graphics card in your computer was

5
00:00:21,840 --> 00:00:28,080
expensive, the A100 goes for around $10,000 a pop, roughly the same as six RTX 4090s.

6
00:00:28,080 --> 00:00:33,280
Artificial intelligence applications often utilize GPUs because GPUs are very good at doing lots of

7
00:00:33,280 --> 00:00:38,560
math calculations at once in parallel. And NVIDIA's newer models have the tensor cores,

8
00:00:38,560 --> 00:00:43,360
which are good at matrix operations that AIs frequently use. So even though the A100 is called

9
00:00:43,360 --> 00:00:49,440
a GPU, it's built specifically for AI and analytical applications. And as such, you can't

10
00:00:49,440 --> 00:00:54,160
realistically game on it. It doesn't even have a display out. Although you can get the A100 in a

11
00:00:54,240 --> 00:00:58,800
PCI Express version, such as the one Linus is holding up here, it's more common in data centers

12
00:00:58,800 --> 00:01:05,600
for them to come in this form factor called SXM4. Unlike a normal graphics card, the SXM4 cards

13
00:01:05,600 --> 00:01:12,240
lie flat and connect to a large motherboard like PCB using a pair of sockets with the connectors

14
00:01:12,240 --> 00:01:17,280
sitting on the underside of the card. Although SXM is just a connector and data is still carried

15
00:01:17,280 --> 00:01:23,600
over a PCI Express interface, SXM4 is preferred over the traditional PCIe slot for data centers

16
00:01:23,600 --> 00:01:28,800
because the socket can handle more electrical power. The PCIe version of the A100 can use a

17
00:01:28,800 --> 00:01:34,800
max of up to 300 watts, but the SXM4 version handles up to 500 watts, leading to higher

18
00:01:34,800 --> 00:01:43,120
performance. An SXM4 A100 has 312 teraflops of FP16 processing power. To put that in context,

19
00:01:43,120 --> 00:01:48,080
that's nearly four times as much as an RTX 4090, the most powerful consumer GPU on the market at

20
00:01:48,080 --> 00:01:53,280
the time of filming. Additionally, these GPUs are linked up with a high-speed NVLink interconnect

21
00:01:53,280 --> 00:01:59,680
so that the GPUs that sit on a single board can act like a single gigantic Chungish GPU.

22
00:01:59,680 --> 00:02:04,720
Now that you know what lies at the heart of the GPU servers, though, exactly how many A100s are

23
00:02:04,720 --> 00:02:10,160
needed to keep the service running for 100 million users. It turns out you can run ChatGPT

24
00:02:10,160 --> 00:02:18,080
just fine on its own on a single NVIDIA HGX A100 unit. These units typically contain 8 A100 GPUs

25
00:02:18,080 --> 00:02:23,680
in one machine powered by a pair of server CPUs that each feature a few dozen cores.

26
00:02:23,680 --> 00:02:28,640
But the issue is that with so many users, you need a lot more processing power to ensure the

27
00:02:28,640 --> 00:02:33,840
chatbot can answer query smoothly for everyone. OpenAI and Microsoft, who are behind the ChatGPT

28
00:02:33,840 --> 00:02:38,560
project, haven't disclosed exact numbers about their hardware, but given the processing capacity

29
00:02:38,560 --> 00:02:46,080
of these HGX A100 systems, ChatGPT likely uses somewhere around 30,000 A100s to keep up with

30
00:02:46,080 --> 00:02:52,160
demand. To put that into context, it's a heck of a lot more than the roughly 4 or 5,000 they likely

31
00:02:52,160 --> 00:02:56,560
needed to train the language model in the first place. Training is the process of feeding the AI

32
00:02:56,560 --> 00:03:00,960
lots of information in order to build it out before it can be used publicly. Intuitively,

33
00:03:00,960 --> 00:03:05,520
it might seem like the training process would need more processing power than actually running

34
00:03:05,520 --> 00:03:12,000
the model. But because of the massive amount of IO ChatGPT has to handle with 100 million users,

35
00:03:12,000 --> 00:03:18,080
it ends up actually requiring roughly 6 times more GPUs to run it. And with as pricey as these

36
00:03:18,080 --> 00:03:23,120
systems are, you can bet this meant a massive investment on Microsoft's part. While the

37
00:03:23,120 --> 00:03:28,240
actual dollar amount hasn't been disclosed, we do know that it was in the hundreds of millions of

38
00:03:28,240 --> 00:03:34,240
dollars in addition to several hundred grand a day just to keep the system running. Unless you

39
00:03:34,240 --> 00:03:39,280
think that Microsoft is going to stop there, the company is also integrating the newer NVIDIA

40
00:03:39,280 --> 00:03:46,560
H100 GPUs into its Azure Cloud AI service, which actually dwarf the A100's FP16 performance

41
00:03:46,560 --> 00:03:53,600
by a factor of 6. In addition to adding FP8 support, which should prove to be very useful for AI

42
00:03:53,600 --> 00:03:58,080
due to how the math calculations involved in running AI models work. Not only will this ensure

43
00:03:58,080 --> 00:04:04,000
that more people can use ChatGPT and other AI services, but will also allow Microsoft to train

44
00:04:04,000 --> 00:04:08,720
more complicated large language models. Maybe soon, you'll be able to completely replace those

45
00:04:08,720 --> 00:04:13,040
pesky real life friends of yours. So thanks for watching guys, if you liked this video hit like,

46
00:04:13,040 --> 00:04:17,280
hit subscribe, and hit us up in the comments section with your suggestions for topics that we

47
00:04:17,280 --> 00:04:18,880
should cover in the future.
