{"video_id":"uifoj9iuTDw","title":"The Insane Hardware Behind ChatGPT","channel":"Techquickie","show":"Techquickie","published_at":"2024-05-04T14:58:16Z","duration_s":286,"segments":[{"start_s":0.0,"end_s":5.44,"text":"When you fire up chat GPT, you're connecting to a big silicon brain that lives somewhere and","speaker":null,"is_sponsor":0},{"start_s":5.44,"end_s":12.32,"text":"contains something. So what exactly is that thing? What is the hardware that runs the chat bot that","speaker":null,"is_sponsor":0},{"start_s":12.32,"end_s":17.04,"text":"you've fallen in love with? There's actually a sort of basic building block of the chat GPT","speaker":null,"is_sponsor":0},{"start_s":17.04,"end_s":21.84,"text":"infrastructure, the NVIDIA A100 GPU. And if you thought the graphics card in your computer was","speaker":null,"is_sponsor":0},{"start_s":21.84,"end_s":28.08,"text":"expensive, the A100 goes for around $10,000 a pop, roughly the same as six RTX 4090s.","speaker":null,"is_sponsor":0},{"start_s":28.08,"end_s":33.28,"text":"Artificial intelligence applications often utilize GPUs because GPUs are very good at doing lots of","speaker":null,"is_sponsor":0},{"start_s":33.28,"end_s":38.56,"text":"math calculations at once in parallel. And NVIDIA's newer models have the tensor cores,","speaker":null,"is_sponsor":0},{"start_s":38.56,"end_s":43.36,"text":"which are good at matrix operations that AIs frequently use. So even though the A100 is called","speaker":null,"is_sponsor":0},{"start_s":43.36,"end_s":49.44,"text":"a GPU, it's built specifically for AI and analytical applications. And as such, you can't","speaker":null,"is_sponsor":0},{"start_s":49.44,"end_s":54.16,"text":"realistically game on it. It doesn't even have a display out. Although you can get the A100 in a","speaker":null,"is_sponsor":0},{"start_s":54.24,"end_s":58.8,"text":"PCI Express version, such as the one Linus is holding up here, it's more common in data centers","speaker":null,"is_sponsor":0},{"start_s":58.8,"end_s":65.6,"text":"for them to come in this form factor called SXM4. Unlike a normal graphics card, the SXM4 cards","speaker":null,"is_sponsor":0},{"start_s":65.6,"end_s":72.24,"text":"lie flat and connect to a large motherboard like PCB using a pair of sockets with the connectors","speaker":null,"is_sponsor":0},{"start_s":72.24,"end_s":77.28,"text":"sitting on the underside of the card. Although SXM is just a connector and data is still carried","speaker":null,"is_sponsor":0},{"start_s":77.28,"end_s":83.6,"text":"over a PCI Express interface, SXM4 is preferred over the traditional PCIe slot for data centers","speaker":null,"is_sponsor":0},{"start_s":83.6,"end_s":88.8,"text":"because the socket can handle more electrical power. The PCIe version of the A100 can use a","speaker":null,"is_sponsor":0},{"start_s":88.8,"end_s":94.8,"text":"max of up to 300 watts, but the SXM4 version handles up to 500 watts, leading to higher","speaker":null,"is_sponsor":0},{"start_s":94.8,"end_s":103.12,"text":"performance. An SXM4 A100 has 312 teraflops of FP16 processing power. To put that in context,","speaker":null,"is_sponsor":0},{"start_s":103.12,"end_s":108.08,"text":"that's nearly four times as much as an RTX 4090, the most powerful consumer GPU on the market at","speaker":null,"is_sponsor":0},{"start_s":108.08,"end_s":113.28,"text":"the time of filming. Additionally, these GPUs are linked up with a high-speed NVLink interconnect","speaker":null,"is_sponsor":0},{"start_s":113.28,"end_s":119.68,"text":"so that the GPUs that sit on a single board can act like a single gigantic Chungish GPU.","speaker":null,"is_sponsor":0},{"start_s":119.68,"end_s":124.72,"text":"Now that you know what lies at the heart of the GPU servers, though, exactly how many A100s are","speaker":null,"is_sponsor":0},{"start_s":124.72,"end_s":130.16,"text":"needed to keep the service running for 100 million users. It turns out you can run ChatGPT","speaker":null,"is_sponsor":0},{"start_s":130.16,"end_s":138.08,"text":"just fine on its own on a single NVIDIA HGX A100 unit. These units typically contain 8 A100 GPUs","speaker":null,"is_sponsor":0},{"start_s":138.08,"end_s":143.68,"text":"in one machine powered by a pair of server CPUs that each feature a few dozen cores.","speaker":null,"is_sponsor":0},{"start_s":143.68,"end_s":148.64,"text":"But the issue is that with so many users, you need a lot more processing power to ensure the","speaker":null,"is_sponsor":0},{"start_s":148.64,"end_s":153.84,"text":"chatbot can answer query smoothly for everyone. OpenAI and Microsoft, who are behind the ChatGPT","speaker":null,"is_sponsor":0},{"start_s":153.84,"end_s":158.56,"text":"project, haven't disclosed exact numbers about their hardware, but given the processing capacity","speaker":null,"is_sponsor":0},{"start_s":158.56,"end_s":166.08,"text":"of these HGX A100 systems, ChatGPT likely uses somewhere around 30,000 A100s to keep up with","speaker":null,"is_sponsor":0},{"start_s":166.08,"end_s":172.16,"text":"demand. To put that into context, it's a heck of a lot more than the roughly 4 or 5,000 they likely","speaker":null,"is_sponsor":0},{"start_s":172.16,"end_s":176.56,"text":"needed to train the language model in the first place. Training is the process of feeding the AI","speaker":null,"is_sponsor":0},{"start_s":176.56,"end_s":180.96,"text":"lots of information in order to build it out before it can be used publicly. Intuitively,","speaker":null,"is_sponsor":0},{"start_s":180.96,"end_s":185.52,"text":"it might seem like the training process would need more processing power than actually running","speaker":null,"is_sponsor":0},{"start_s":185.52,"end_s":192.0,"text":"the model. But because of the massive amount of IO ChatGPT has to handle with 100 million users,","speaker":null,"is_sponsor":0},{"start_s":192.0,"end_s":198.08,"text":"it ends up actually requiring roughly 6 times more GPUs to run it. And with as pricey as these","speaker":null,"is_sponsor":0},{"start_s":198.08,"end_s":203.12,"text":"systems are, you can bet this meant a massive investment on Microsoft's part. While the","speaker":null,"is_sponsor":0},{"start_s":203.12,"end_s":208.24,"text":"actual dollar amount hasn't been disclosed, we do know that it was in the hundreds of millions of","speaker":null,"is_sponsor":0},{"start_s":208.24,"end_s":214.24,"text":"dollars in addition to several hundred grand a day just to keep the system running. Unless you","speaker":null,"is_sponsor":0},{"start_s":214.24,"end_s":219.28,"text":"think that Microsoft is going to stop there, the company is also integrating the newer NVIDIA","speaker":null,"is_sponsor":0},{"start_s":219.28,"end_s":226.56,"text":"H100 GPUs into its Azure Cloud AI service, which actually dwarf the A100's FP16 performance","speaker":null,"is_sponsor":0},{"start_s":226.56,"end_s":233.6,"text":"by a factor of 6. In addition to adding FP8 support, which should prove to be very useful for AI","speaker":null,"is_sponsor":0},{"start_s":233.6,"end_s":238.08,"text":"due to how the math calculations involved in running AI models work. Not only will this ensure","speaker":null,"is_sponsor":0},{"start_s":238.08,"end_s":244.0,"text":"that more people can use ChatGPT and other AI services, but will also allow Microsoft to train","speaker":null,"is_sponsor":0},{"start_s":244.0,"end_s":248.72,"text":"more complicated large language models. Maybe soon, you'll be able to completely replace those","speaker":null,"is_sponsor":0},{"start_s":248.72,"end_s":253.04,"text":"pesky real life friends of yours. So thanks for watching guys, if you liked this video hit like,","speaker":null,"is_sponsor":0},{"start_s":253.04,"end_s":257.28,"text":"hit subscribe, and hit us up in the comments section with your suggestions for topics that we","speaker":null,"is_sponsor":0},{"start_s":257.28,"end_s":258.88,"text":"should cover in the future.","speaker":null,"is_sponsor":0}],"full_text":"When you fire up chat GPT, you're connecting to a big silicon brain that lives somewhere and contains something. So what exactly is that thing? What is the hardware that runs the chat bot that you've fallen in love with? There's actually a sort of basic building block of the chat GPT infrastructure, the NVIDIA A100 GPU. And if you thought the graphics card in your computer was expensive, the A100 goes for around $10,000 a pop, roughly the same as six RTX 4090s. Artificial intelligence applications often utilize GPUs because GPUs are very good at doing lots of math calculations at once in parallel. And NVIDIA's newer models have the tensor cores, which are good at matrix operations that AIs frequently use. So even though the A100 is called a GPU, it's built specifically for AI and analytical applications. And as such, you can't realistically game on it. It doesn't even have a display out. Although you can get the A100 in a PCI Express version, such as the one Linus is holding up here, it's more common in data centers for them to come in this form factor called SXM4. Unlike a normal graphics card, the SXM4 cards lie flat and connect to a large motherboard like PCB using a pair of sockets with the connectors sitting on the underside of the card. Although SXM is just a connector and data is still carried over a PCI Express interface, SXM4 is preferred over the traditional PCIe slot for data centers because the socket can handle more electrical power. The PCIe version of the A100 can use a max of up to 300 watts, but the SXM4 version handles up to 500 watts, leading to higher performance. An SXM4 A100 has 312 teraflops of FP16 processing power. To put that in context, that's nearly four times as much as an RTX 4090, the most powerful consumer GPU on the market at the time of filming. Additionally, these GPUs are linked up with a high-speed NVLink interconnect so that the GPUs that sit on a single board can act like a single gigantic Chungish GPU. Now that you know what lies at the heart of the GPU servers, though, exactly how many A100s are needed to keep the service running for 100 million users. It turns out you can run ChatGPT just fine on its own on a single NVIDIA HGX A100 unit. These units typically contain 8 A100 GPUs in one machine powered by a pair of server CPUs that each feature a few dozen cores. But the issue is that with so many users, you need a lot more processing power to ensure the chatbot can answer query smoothly for everyone. OpenAI and Microsoft, who are behind the ChatGPT project, haven't disclosed exact numbers about their hardware, but given the processing capacity of these HGX A100 systems, ChatGPT likely uses somewhere around 30,000 A100s to keep up with demand. To put that into context, it's a heck of a lot more than the roughly 4 or 5,000 they likely needed to train the language model in the first place. Training is the process of feeding the AI lots of information in order to build it out before it can be used publicly. Intuitively, it might seem like the training process would need more processing power than actually running the model. But because of the massive amount of IO ChatGPT has to handle with 100 million users, it ends up actually requiring roughly 6 times more GPUs to run it. And with as pricey as these systems are, you can bet this meant a massive investment on Microsoft's part. While the actual dollar amount hasn't been disclosed, we do know that it was in the hundreds of millions of dollars in addition to several hundred grand a day just to keep the system running. Unless you think that Microsoft is going to stop there, the company is also integrating the newer NVIDIA H100 GPUs into its Azure Cloud AI service, which actually dwarf the A100's FP16 performance by a factor of 6. In addition to adding FP8 support, which should prove to be very useful for AI due to how the math calculations involved in running AI models work. Not only will this ensure that more people can use ChatGPT and other AI services, but will also allow Microsoft to train more complicated large language models. Maybe soon, you'll be able to completely replace those pesky real life friends of yours. So thanks for watching guys, if you liked this video hit like, hit subscribe, and hit us up in the comments section with your suggestions for topics that we should cover in the future."}