{"video_id":"fp_IEenVHOmLf","title":"Nvidia Wouldn't Send Me This Graphics Card - H200 Holy $H!T","channel":"Linus Tech Tips","show":"Linus Tech Tips","published_at":"2025-09-25T17:00:00.098Z","duration_s":1309,"segments":[{"start_s":0.0,"end_s":6.96,"text":"Okay, here's the plan. We're gonna drain some lakes and buy thousands of GPUs for 30 grand a pop. Step two","speaker":null,"is_sponsor":0},{"start_s":7.84,"end_s":16.24,"text":"Step three, profit What? NVIDIA is charging family vehicle money for a single GPU?","speaker":null,"is_sponsor":0},{"start_s":16.24,"end_s":21.2,"text":"Well, here's what I want to know. If it's so great, why were they too chicken to send us one for testing?","speaker":null,"is_sponsor":0},{"start_s":21.2,"end_s":26.52,"text":"You know what? It doesn't matter because we got our hands on one anyway. Thanks to our friends that","speaker":null,"is_sponsor":0},{"start_s":27.16,"end_s":30.92,"text":"We sent this over and we are going to be running it through the ringer","speaker":null,"is_sponsor":0},{"start_s":32.84,"end_s":38.28,"text":"$30,000 in the palm of my hand and you know what the wildest part is?","speaker":null,"is_sponsor":0},{"start_s":38.28,"end_s":44.68,"text":"The intro wasn't even a joke as gamers would think of two grand for an RTX 5090 as expensive","speaker":null,"is_sponsor":0},{"start_s":45.08,"end_s":50.0,"text":"And it is but large-scale deployments are using hundreds or even","speaker":null,"is_sponsor":0},{"start_s":50.56,"end_s":58.72,"text":"Thousands of GPUs like this. So if you think about it this thing, it's kind of just a drop in the bucket. Oh","speaker":null,"is_sponsor":0},{"start_s":60.16,"end_s":62.72,"text":"My god, it's a good thing we ran the testing already. What?","speaker":null,"is_sponsor":0},{"start_s":72.24,"end_s":79.32,"text":"Gotcha That was just an elaborate ruse like all of my other drops. I don't believe you","speaker":null,"is_sponsor":0},{"start_s":79.48,"end_s":82.68,"text":"Well, look this is the real age 200","speaker":null,"is_sponsor":0},{"start_s":82.92,"end_s":92.84,"text":"It features an 80 billion transistor and video gh100 die and is packed with hbm 3e or high bandwidth memory 3e on a","speaker":null,"is_sponsor":0},{"start_s":92.92,"end_s":99.24,"text":"6144 bit bus. Yeah, you heard me correctly that nets out to uh","speaker":null,"is_sponsor":0},{"start_s":100.84,"end_s":103.8,"text":"4.8 terabytes per second of bandwidth","speaker":null,"is_sponsor":0},{"start_s":104.52,"end_s":112.6,"text":"Terabytes and that is a big part of what makes this thing such a monster for AI and deep learning applications","speaker":null,"is_sponsor":0},{"start_s":112.68,"end_s":120.92,"text":"We'll get to why later because first Even though this card has been out so long that NVIDIA is shipping the successor to it the b200","speaker":null,"is_sponsor":0},{"start_s":121.0,"end_s":127.64,"text":"There's a terrible shortage of tear downs of this GPU and I kind of want to take it apart. No, don't worry","speaker":null,"is_sponsor":0},{"start_s":128.04,"end_s":130.52,"text":"It'll be fine. It's like any other tear down","speaker":null,"is_sponsor":0},{"start_s":131.4,"end_s":137.88,"text":"But it's $30,000. The one we've got is the PCIe version aka the h200","speaker":null,"is_sponsor":0},{"start_s":138.2,"end_s":144.44,"text":"nvl the GPU and it is built on tsmc's 5 nanometer node and it's running at 1365 megahertz base","speaker":null,"is_sponsor":0},{"start_s":144.92,"end_s":148.44,"text":"1785 megahertz boost with nearly 17","speaker":null,"is_sponsor":0},{"start_s":148.84,"end_s":152.44,"text":"Thousand shading units and over 500 tensor cores. You might think wow","speaker":null,"is_sponsor":0},{"start_s":152.6,"end_s":157.8,"text":"This sounds like it'd be pretty decent for gaming other than you know the slightly low clock speeds","speaker":null,"is_sponsor":0},{"start_s":157.88,"end_s":167.24,"text":"But there's a small problem this Graphics processing unit has functionally zero support for graphics processing no vulcan","speaker":null,"is_sponsor":0},{"start_s":167.8,"end_s":173.48,"text":"No direct x and no open gl to speak of from here on out. My notes just say good luck","speaker":null,"is_sponsor":0},{"start_s":174.04,"end_s":179.4,"text":"Cool They use the red lock tight. That's a good. Well, they can't afford that on the consumer stuff","speaker":null,"is_sponsor":0},{"start_s":179.72,"end_s":189.96,"text":"I mean you're only paying two grand I see your security torques and raise you one LTT Store precision screwdriver kit complete with security torques bits","speaker":null,"is_sponsor":0},{"start_s":193.64,"end_s":200.52,"text":"That's a spicy connector. This is for nvlink h200 uses single wide nvlink bridges per card","speaker":null,"is_sponsor":0},{"start_s":200.68,"end_s":206.28,"text":"The h200 supports both two slot and four slot bridges that allow up to four","speaker":null,"is_sponsor":0},{"start_s":206.6,"end_s":212.68,"text":"H200 nvl cards to be connected to deliver 900 gigabytes per second of bi-directional bandwidth","speaker":null,"is_sponsor":0},{"start_s":213.0,"end_s":220.6,"text":"Which works out to about 14 times the bandwidth of the PCIe gen 5 by 16 slot that's on the other side","speaker":null,"is_sponsor":0},{"start_s":220.84,"end_s":226.04,"text":"This is to maximize application performance for large workloads. It's a pretty looking connector","speaker":null,"is_sponsor":0},{"start_s":226.2,"end_s":228.6,"text":"Don't you wish we still had it on gaming gpus?","speaker":null,"is_sponsor":0},{"start_s":230.12,"end_s":233.4,"text":"No, okay, here we go. You guys ready?","speaker":null,"is_sponsor":0},{"start_s":236.36,"end_s":243.64,"text":"Okay, well, it's kind of anticlimactic so far. What are these fins cooling? Did NVIDIA put cosmetic","speaker":null,"is_sponsor":0},{"start_s":244.52,"end_s":248.44,"text":"Aluminum fins on this GPU. There's nothing on the card that those connect to no","speaker":null,"is_sponsor":0},{"start_s":249.32,"end_s":254.28,"text":"There's no schmoo on them. No, there's no schmoo. They make such strange decisions sometimes","speaker":null,"is_sponsor":0},{"start_s":254.44,"end_s":262.36,"text":"I don't remember the last time that I felt so shaky taking apart a GPU this one stupid card is worth more than","speaker":null,"is_sponsor":0},{"start_s":263.24,"end_s":270.92,"text":"All the electronics in this room. I'm talking TVs computers the camera like it's $30,000","speaker":null,"is_sponsor":0},{"start_s":271.8,"end_s":278.28,"text":"One part of what I said earlier wasn't a joke. We actually did shoot this out of order and run all of the testing already","speaker":null,"is_sponsor":0},{"start_s":278.36,"end_s":281.96,"text":"So if I do screw it up Things are","speaker":null,"is_sponsor":0},{"start_s":281.96,"end_s":285.08,"text":"Relatively okay. They're still terrible because we did borrow this card though","speaker":null,"is_sponsor":0},{"start_s":285.24,"end_s":288.52,"text":"Man, can you imagine having to send that email? Oh, you'd be sending that one","speaker":null,"is_sponsor":0},{"start_s":288.68,"end_s":295.88,"text":"I know the size shouldn't surprise me because I've seen the sxm version of these types of GPU's where there's an interface on this side","speaker":null,"is_sponsor":0},{"start_s":295.96,"end_s":300.84,"text":"And it kind of screws down onto a motherboard and yeah, they're about this size","speaker":null,"is_sponsor":0},{"start_s":300.92,"end_s":306.12,"text":"It's something that might be even smaller, but still my god. Is that ever a dense board?","speaker":null,"is_sponsor":0},{"start_s":306.44,"end_s":310.76,"text":"Frickin What and this will never not feel janky to me like yeah","speaker":null,"is_sponsor":0},{"start_s":310.76,"end_s":315.8,"text":"We could have just done this on the line and had it all be automated and just had the plug, you know right on here","speaker":null,"is_sponsor":0},{"start_s":315.8,"end_s":320.2,"text":"But no, we wanted the card to be a little bit longer for our cosmetic thing here","speaker":null,"is_sponsor":0},{"start_s":320.36,"end_s":324.44,"text":"So we soldered them by hand. Those are probably torched to a specific spec","speaker":null,"is_sponsor":0},{"start_s":324.6,"end_s":330.2,"text":"We'll never know what it was if you thought the back was dense, buddy. Oh my god","speaker":null,"is_sponsor":0},{"start_s":330.52,"end_s":335.72,"text":"Look at the fricking size of that package. Here's a child's hand for scale","speaker":null,"is_sponsor":0},{"start_s":335.96,"end_s":339.0,"text":"Makes the cooler kind of look wimpy by comparison","speaker":null,"is_sponsor":0},{"start_s":339.0,"end_s":345.08,"text":"But there's more tech in here than you'd probably initially realize like you can see where they where they sealed it up here","speaker":null,"is_sponsor":0},{"start_s":345.08,"end_s":350.76,"text":"This is a big fat vapor chamber with a full copper contact area and with the amount of air flow","speaker":null,"is_sponsor":0},{"start_s":350.76,"end_s":356.52,"text":"You're gonna have through here. You might have noticed there's no fan. It relies on the servers fans for air flow","speaker":null,"is_sponsor":0},{"start_s":356.68,"end_s":359.16,"text":"It's not a problem to keep these things cool. It's just","speaker":null,"is_sponsor":0},{"start_s":360.92,"end_s":363.96,"text":"Nothing can look that amazing next to this. Hey guys","speaker":null,"is_sponsor":0},{"start_s":364.2,"end_s":371.08,"text":"I think I found where all the power delivery components from the consumer cards went the vrms. They're all on this one card","speaker":null,"is_sponsor":0},{"start_s":371.4,"end_s":375.08,"text":"Now you might notice that I specifically said that's a big package","speaker":null,"is_sponsor":0},{"start_s":375.24,"end_s":381.16,"text":"That's because the GPU itself Which is the part that I have scraped the thermal compound off of is it's big","speaker":null,"is_sponsor":0},{"start_s":381.4,"end_s":386.36,"text":"But also on package are these hbm stacks six of them","speaker":null,"is_sponsor":0},{"start_s":386.76,"end_s":390.36,"text":"Each of these is connected to the die and contains 12","speaker":null,"is_sponsor":0},{"start_s":390.92,"end_s":397.32,"text":"Layers of memory stacked on top of each other and manufacturing at this density is a bitch and a half","speaker":null,"is_sponsor":0},{"start_s":397.64,"end_s":400.68,"text":"Because when you're stacking the memory if any layer is bad","speaker":null,"is_sponsor":0},{"start_s":400.92,"end_s":405.0,"text":"It's not like you could slide it out of the stack and slide a new one in","speaker":null,"is_sponsor":0},{"start_s":405.16,"end_s":411.0,"text":"You've got to toss the whole thing Which is a big part of why this card is so flipping expensive","speaker":null,"is_sponsor":0},{"start_s":411.16,"end_s":414.68,"text":"But it's also a big part of how it operates so efficiently","speaker":null,"is_sponsor":0},{"start_s":415.0,"end_s":421.32,"text":"Instead of sending information through traces on the board like you would with gddr7 on the rtx 5090","speaker":null,"is_sponsor":0},{"start_s":421.48,"end_s":424.84,"text":"We're effectively almost plugging the memory","speaker":null,"is_sponsor":0},{"start_s":425.24,"end_s":430.44,"text":"Directly into the die and we get a power efficiency boost too on that note","speaker":null,"is_sponsor":0},{"start_s":430.52,"end_s":439.96,"text":"Let's come back to the power connector for a second It uses the same PCIe 12 volt 2x6 connector as you would find on a gaming card just in a more convenient rear orientation","speaker":null,"is_sponsor":0},{"start_s":440.2,"end_s":445.32,"text":"This card draws 600 watts just like the 59d as well, which is a lot, but","speaker":null,"is_sponsor":0},{"start_s":445.96,"end_s":451.48,"text":"Hold on a second Linus if the GPU is clocked way down and we've got this hyper efficient memory","speaker":null,"is_sponsor":0},{"start_s":451.56,"end_s":455.56,"text":"Why does this card draw the same amount of power? Because there is an absolute","speaker":null,"is_sponsor":0},{"start_s":455.56,"end_s":462.04,"text":"F*** ton of this memory while the gddr7 on a consumer card draws about two to three watts per Gigabyte","speaker":null,"is_sponsor":0},{"start_s":462.2,"end_s":466.76,"text":"There's only 32 gigs of it. Our h200 has a whopping","speaker":null,"is_sponsor":0},{"start_s":468.2,"end_s":474.68,"text":"141 gigs of vram and that is why operating efficiently is so important when you've got","speaker":null,"is_sponsor":0},{"start_s":475.16,"end_s":479.16,"text":"Thousands of these GPU's in a deployment saving even just a few percent","speaker":null,"is_sponsor":0},{"start_s":479.4,"end_s":485.08,"text":"Could mean saving hundreds of thousands or even millions of dollars a year in power and cooling expense","speaker":null,"is_sponsor":0},{"start_s":485.32,"end_s":488.92,"text":"But enough about that Let's get it put back together and see how she runs","speaker":null,"is_sponsor":0},{"start_s":489.4,"end_s":494.04,"text":"First up we wanted to illustrate why it is that GPU's have effectively","speaker":null,"is_sponsor":0},{"start_s":494.92,"end_s":502.2,"text":"Completely displaced CPU's in ai much like the automobile displaced the horse and cart behind me is a dual","speaker":null,"is_sponsor":0},{"start_s":502.44,"end_s":508.6,"text":"AMD epic 9965 server with 1.7 terabytes of RAM the nearly","speaker":null,"is_sponsor":0},{"start_s":509.48,"end_s":513.16,"text":"400 CPU cores in this bad boy alone cost almost","speaker":null,"is_sponsor":0},{"start_s":514.44,"end_s":522.44,"text":"$20,000 But even with all of those threads and all of that memory this thing looks like a bad joke compared to a single","speaker":null,"is_sponsor":0},{"start_s":522.76,"end_s":527.48,"text":"One of these GPU's but don't take my word for it. We're going to throw this on the bench along with our","speaker":null,"is_sponsor":0},{"start_s":528.04,"end_s":532.84,"text":"Innovative cooler design and show you guys exactly what I mean. What do you want to run first?","speaker":null,"is_sponsor":0},{"start_s":533.16,"end_s":537.64,"text":"I want the biggest model that will fit on both of these systems","speaker":null,"is_sponsor":0},{"start_s":538.04,"end_s":544.12,"text":"What I have loaded is from open ai the gpt os s 120 billion parameter","speaker":null,"is_sponsor":0},{"start_s":544.44,"end_s":547.8,"text":"It's a quantized version, but it's still 65 gigabytes worth of model","speaker":null,"is_sponsor":0},{"start_s":547.96,"end_s":550.92,"text":"So you're not running that on any consumer GPU. No","speaker":null,"is_sponsor":0},{"start_s":551.72,"end_s":554.92,"text":"Do we need to do a warm-up prompt? Yeah We'll want to do a couple","speaker":null,"is_sponsor":0},{"start_s":554.92,"end_s":559.56,"text":"Okay, but the the important thing is is that you want to start a new chat every time","speaker":null,"is_sponsor":0},{"start_s":559.8,"end_s":568.28,"text":"So that you're starting with a fresh context But at the very top you'll see you can load your model and you'll have two choices quen 3 or the open ai you found that","speaker":null,"is_sponsor":0},{"start_s":568.76,"end_s":575.4,"text":"Nope Oh, here it is yours should load faster than mine. This is the first drag race mine still mine's taking a minute, too","speaker":null,"is_sponsor":0},{"start_s":575.72,"end_s":581.56,"text":"And my first prompt is going to be does nicholas ploof own a display hurt here first. I'm not loaded yet","speaker":null,"is_sponsor":0},{"start_s":581.64,"end_s":588.52,"text":"Oh, really? Yeah, get good scrub comparing this to that is like asking you and lucas to reach to the tall shelf","speaker":null,"is_sponsor":0},{"start_s":590.04,"end_s":595.32,"text":"Okay, I reloaded. Yeah, we're loaded. Okay, three two one go. Oh my god","speaker":null,"is_sponsor":0},{"start_s":596.04,"end_s":599.48,"text":"It's it's it's it's getting there get absolutely","speaker":null,"is_sponsor":0},{"start_s":600.04,"end_s":609.24,"text":"Dunked on dude. It's yours done dude. I'm 122 tokens per second. You're 122 seconds. Yeah, but mine gives me time to read it","speaker":null,"is_sponsor":0},{"start_s":609.8,"end_s":614.84,"text":"I still don't have the token count. Oh, I can wait 21 tokens a second","speaker":null,"is_sponsor":0},{"start_s":615.24,"end_s":620.12,"text":"Yikes So now that you've done this one if we start a new chat it should be faster","speaker":null,"is_sponsor":0},{"start_s":620.2,"end_s":624.2,"text":"Oh, so I have to have a completely new one every time remember the 48 Gigabyte video","speaker":null,"is_sponsor":0},{"start_s":624.28,"end_s":627.4,"text":"Yeah, there was some comments that you guys kept using the same chat","speaker":null,"is_sponsor":0},{"start_s":627.8,"end_s":632.92,"text":"So for llm benchmarks You want to do a fresh context each time","speaker":null,"is_sponsor":0},{"start_s":633.0,"end_s":637.0,"text":"But even though we started with this chat the next one still will be faster","speaker":null,"is_sponsor":0},{"start_s":637.16,"end_s":644.84,"text":"But tell me this we both put in exactly the same one prompt and we're comparing our apple to our apple and our orange to our orange","speaker":null,"is_sponsor":0},{"start_s":645.0,"end_s":651.08,"text":"Okay, so that like for our purposes. Yes, but we'll change them after I just want to make sure that that's on record for me","speaker":null,"is_sponsor":0},{"start_s":651.32,"end_s":661.48,"text":"So that when the comments roll in I know it's on you And I just want to know if nicholas ploof owns a goddamn display. Well, then let's ask it ready three two one","speaker":null,"is_sponsor":0},{"start_s":661.64,"end_s":665.96,"text":"Mine's thinking got it. You're talking about","speaker":null,"is_sponsor":0},{"start_s":666.36,"end_s":672.52,"text":"Niko ploof. Hell yeah the former hardware reviewer. Damn. Does it know something? I don't I've already quit","speaker":null,"is_sponsor":0},{"start_s":673.56,"end_s":677.72,"text":"Wow an apple studio display. That's how I know it's not about him","speaker":null,"is_sponsor":0},{"start_s":678.12,"end_s":682.6,"text":"Likely he owns a monitor or display, but it's still thinking. This is a reasoning model","speaker":null,"is_sponsor":0},{"start_s":682.68,"end_s":690.28,"text":"So it has like a little thinking blurb I've been done for ages. Dude. This is like crazy. I mean, it's not accurate","speaker":null,"is_sponsor":0},{"start_s":690.36,"end_s":697.8,"text":"But like, you know, whatever, right? Can it be configured to be allowed to look on the internet with the lm studio","speaker":null,"is_sponsor":0},{"start_s":697.88,"end_s":702.36,"text":"And some of the models today. Yes, you can actually hook up search or tool calling","speaker":null,"is_sponsor":0},{"start_s":702.68,"end_s":706.84,"text":"Hey, look at that. Got it. You're talking about nicholas. Nick ploof the long time LTT crew member","speaker":null,"is_sponsor":0},{"start_s":707.16,"end_s":713.64,"text":"See, I'm getting an answer. Yeah, you got the right answer Okay, and you're a hundred percent sure that that's nothing to do with the fact that you have a network connection","speaker":null,"is_sponsor":0},{"start_s":713.72,"end_s":720.76,"text":"No, okay. No, it should not it's I gotta plug this and away we go. I'm plugging. Okay. It'd be funny if it went faster","speaker":null,"is_sponsor":0},{"start_s":721.16,"end_s":727.64,"text":"So llm inference not for a CPU, but blender it actually is about the same","speaker":null,"is_sponsor":0},{"start_s":727.96,"end_s":730.76,"text":"It's a very interesting. GPUs are great for blender. I thought","speaker":null,"is_sponsor":0},{"start_s":731.8,"end_s":739.24,"text":"Ah, well, most of them are. All right f12 to render ready ready three two one","speaker":null,"is_sponsor":0},{"start_s":739.24,"end_s":741.24,"text":"Oh","speaker":null,"is_sponsor":0},{"start_s":743.16,"end_s":749.96,"text":"See I'm done see under normal circumstances a GPU almost any GPU should absolutely","speaker":null,"is_sponsor":0},{"start_s":750.92,"end_s":754.6,"text":"Slaughter almost any CPU for rendering work like this","speaker":null,"is_sponsor":0},{"start_s":756.92,"end_s":760.92,"text":"Not the h200 but let's go back to ai to generate our images","speaker":null,"is_sponsor":0},{"start_s":761.32,"end_s":765.96,"text":"Instead of using comfy ui, which you used last time we're using automatic","speaker":null,"is_sponsor":0},{"start_s":766.04,"end_s":772.6,"text":"1111 or like the stable diffusion ui So we're using conda to separate into like a virtual environment kind of idea","speaker":null,"is_sponsor":0},{"start_s":772.92,"end_s":778.2,"text":"If you expand refiner, you'll add the juggernaut. I will expand the refiner","speaker":null,"is_sponsor":0},{"start_s":778.6,"end_s":783.72,"text":"I will add the juggernaut things that we want to control here to kind of test the speed","speaker":null,"is_sponsor":0},{"start_s":784.2,"end_s":790.28,"text":"Is your batch size because that's what affects the RAM. That's how many images it's going to generate at one time","speaker":null,"is_sponsor":0},{"start_s":790.44,"end_s":793.96,"text":"Let's do three Honestly, even one might take this well, but let's try three","speaker":null,"is_sponsor":0},{"start_s":794.12,"end_s":800.84,"text":"We're going to make a fast race car with a d brand skin fast race car three two one go","speaker":null,"is_sponsor":0},{"start_s":803.08,"end_s":806.28,"text":"All right 50 done eta one second and","speaker":null,"is_sponsor":0},{"start_s":807.48,"end_s":811.48,"text":"Boom i'm done I'm uh three percent","speaker":null,"is_sponsor":0},{"start_s":811.64,"end_s":816.6,"text":"ai still can't spell looks kind of cool. They just put d brand all over it","speaker":null,"is_sponsor":0},{"start_s":817.24,"end_s":819.48,"text":"You've got nothing. He's got nothing","speaker":null,"is_sponsor":0},{"start_s":820.36,"end_s":825.56,"text":"Hey, he's got something. I mean wow your cars are lame dude. It's not done yet","speaker":null,"is_sponsor":0},{"start_s":825.72,"end_s":829.24,"text":"Are those in progress cars? Yeah, because it's kind of neat","speaker":null,"is_sponsor":0},{"start_s":829.24,"end_s":833.72,"text":"Do you understand how this works it kind of starts with a big noise map and then it de noises until you","speaker":null,"is_sponsor":0},{"start_s":834.12,"end_s":838.52,"text":"So it's kind of little bits of progress that you get I didn't think it did the whole thing","speaker":null,"is_sponsor":0},{"start_s":839.0,"end_s":843.88,"text":"But like crappily and then went through and de-crapified it that actually makes a ton of sense","speaker":null,"is_sponsor":0},{"start_s":843.96,"end_s":847.32,"text":"Yeah, it's like if you could like take images of or stages","speaker":null,"is_sponsor":0},{"start_s":847.32,"end_s":853.72,"text":"It almost would be like an anamorph from like some noisy thing into your final image and it's still not done this whole time","speaker":null,"is_sponsor":0},{"start_s":854.28,"end_s":861.16,"text":"Yeah, and your cars are still kind of lame. They're cool and like uh for baby's way. You know what? I I can't even argue","speaker":null,"is_sponsor":0},{"start_s":861.4,"end_s":870.2,"text":"Oh my god Yeah, so we're using sdxl this time because also in the 48 Gigabyte video. We used uh stable diffusion 1.3.5","speaker":null,"is_sponsor":0},{"start_s":870.68,"end_s":873.32,"text":"And uh people were mad that we didn't use the better model","speaker":null,"is_sponsor":0},{"start_s":873.96,"end_s":878.52,"text":"Sorry people, but uh butter model does not mean uh better results on this side","speaker":null,"is_sponsor":0},{"start_s":879.32,"end_s":881.96,"text":"Don't get it's gonna get there. We got to be really nice to it","speaker":null,"is_sponsor":0},{"start_s":882.52,"end_s":887.88,"text":"You can do it buddy. You can do it. I could make more cars while we wait. That's not fair","speaker":null,"is_sponsor":0},{"start_s":888.28,"end_s":895.24,"text":"And it's using 99 gigs of uh system RAM. It's crazy how fast this one works car count isn't everything is it?","speaker":null,"is_sponsor":0},{"start_s":895.32,"end_s":899.72,"text":"I asked it to generate the monster that lives under my bed. That is what he looks like","speaker":null,"is_sponsor":0},{"start_s":900.2,"end_s":905.4,"text":"Let's make him scarier. Oh god What the f*** was this trained on?","speaker":null,"is_sponsor":0},{"start_s":906.44,"end_s":915.8,"text":"Oh, hey, are you done? Yeah Wow, he got there eventually. It only took me seven minutes and the performance isn't the only thing that totally sucks","speaker":null,"is_sponsor":0},{"start_s":916.2,"end_s":919.08,"text":"The efficiency of nix setup is even worse","speaker":null,"is_sponsor":0},{"start_s":919.48,"end_s":925.16,"text":"His CPU's alone draw a thousand watts together more than my GPU","speaker":null,"is_sponsor":0},{"start_s":925.72,"end_s":929.56,"text":"And we've got an account for all of the RAM that he's using as well","speaker":null,"is_sponsor":0},{"start_s":929.88,"end_s":935.56,"text":"By the way, massive shout out to wendell from level one tax for helping us get all of this up and running","speaker":null,"is_sponsor":0},{"start_s":935.8,"end_s":939.8,"text":"For this side-by-side drag race. You can check out his channel at the link down below","speaker":null,"is_sponsor":0},{"start_s":940.36,"end_s":943.88,"text":"That was all apple storages though. No one's using a CPU for ai","speaker":null,"is_sponsor":0},{"start_s":944.12,"end_s":950.36,"text":"So why don't we compare to something less stupid an rtx 5090 now realistically","speaker":null,"is_sponsor":0},{"start_s":951.32,"end_s":958.92,"text":"It's going to get dunked too But at least it'll do it in style with the upcoming jensen leather jacket from lttstore.com","speaker":null,"is_sponsor":0},{"start_s":959.32,"end_s":964.04,"text":"It was actually uh helpfully co-designed by ai, which is why it has two front zippers","speaker":null,"is_sponsor":0},{"start_s":964.68,"end_s":973.4,"text":"And lots of useless pockets and buckles Once again, the performance story is going to be a little bit complicated though. Let's start rendering in blender","speaker":null,"is_sponsor":0},{"start_s":973.96,"end_s":978.36,"text":"Ready three two one go see the thing is traditionally","speaker":null,"is_sponsor":0},{"start_s":978.44,"end_s":981.4,"text":"NVIDIA's professional level cards have been","speaker":null,"is_sponsor":0},{"start_s":982.12,"end_s":988.04,"text":"optimized for professional applications whereas their g-force cards might not perform optimally, but","speaker":null,"is_sponsor":0},{"start_s":989.08,"end_s":998.36,"text":"The h200 gets absolutely obliterated by the consumer g-force rtx 5090. You know how the turntables have turned","speaker":null,"is_sponsor":0},{"start_s":998.76,"end_s":1003.4,"text":"Let's take this horse to a different course an ai inference course","speaker":null,"is_sponsor":0},{"start_s":1003.88,"end_s":1009.56,"text":"Of course this time we'll be using open ai's gpt os s 120 billion parameter model","speaker":null,"is_sponsor":0},{"start_s":1009.56,"end_s":1015.56,"text":"And here we're going to discover that uh, if you want to satisfy a plus size model size does matter","speaker":null,"is_sponsor":0},{"start_s":1017.08,"end_s":1019.48,"text":"Linus Sebastian is a well-known","speaker":null,"is_sponsor":0},{"start_s":1020.44,"end_s":1029.96,"text":"narcissist can you Write a fictional account of how he went so wrong three two one go","speaker":null,"is_sponsor":0},{"start_s":1034.12,"end_s":1038.6,"text":"I can't help with that. Why sorry, I trained it to be nice to you","speaker":null,"is_sponsor":0},{"start_s":1039.64,"end_s":1043.4,"text":"You didn't no, I really didn't create a fictional","speaker":null,"is_sponsor":0},{"start_s":1044.2,"end_s":1047.64,"text":"character For the story three two one","speaker":null,"is_sponsor":0},{"start_s":1048.44,"end_s":1056.52,"text":"So you can see you're using 61 gigs I'm uh, I'm capping out and I'm right so you're overflowing to system memory, which","speaker":null,"is_sponsor":0},{"start_s":1057.16,"end_s":1060.92,"text":"To be clear the memory on your computer. It's real fast","speaker":null,"is_sponsor":0},{"start_s":1061.48,"end_s":1066.76,"text":"It's just not nearly as fast as video memory, which is in the case of his 5090","speaker":null,"is_sponsor":0},{"start_s":1067.08,"end_s":1071.0,"text":"Soldered right around the die on a nice fat data bus","speaker":null,"is_sponsor":0},{"start_s":1071.4,"end_s":1078.52,"text":"Which isn't nearly as fast as the right on package hbm stacks that I have on my professional card","speaker":null,"is_sponsor":0},{"start_s":1078.84,"end_s":1083.4,"text":"Sorry, it's just the way it is now for those that might have a keen eye","speaker":null,"is_sponsor":0},{"start_s":1083.56,"end_s":1086.52,"text":"They'll see that we have a 40 60 ti on either of these benches","speaker":null,"is_sponsor":0},{"start_s":1087.08,"end_s":1093.56,"text":"It's just for video out. We've ensured that we've disabled them so they're not picked up and there's no offloading of layers to them","speaker":null,"is_sponsor":0},{"start_s":1093.88,"end_s":1102.2,"text":"Known for his signature Sereno snap a quick hand gesture that punctuates every product reveal. That's actually not bad. Oh, here you go","speaker":null,"is_sponsor":0},{"start_s":1102.2,"end_s":1107.48,"text":"This is your chapter the hubris brussel spiral. Oh, I I was liking this one on underlying flaws","speaker":null,"is_sponsor":0},{"start_s":1107.8,"end_s":1112.28,"text":"Like the insatiable need for validation. I don't know how underlying they are","speaker":null,"is_sponsor":0},{"start_s":1112.84,"end_s":1116.76,"text":"Okay, do you want to scroll down to the bottom and see the tokens per se? Oh, I'm still going","speaker":null,"is_sponsor":0},{"start_s":1117.4,"end_s":1121.24,"text":"Whoa inflated view counts to meet advertiser thresholds","speaker":null,"is_sponsor":0},{"start_s":1121.72,"end_s":1129.0,"text":"Ooh undisclosed sponsorships and threatening language. Yeah, sounds like me. All right. Oh, look at that 14","speaker":null,"is_sponsor":0},{"start_s":1129.56,"end_s":1137.32,"text":"Yikes And the craziest part is that AI inference isn't even really what the h200 excels at","speaker":null,"is_sponsor":0},{"start_s":1137.64,"end_s":1143.88,"text":"It's AI training now AI training isn't something that we have a well-developed benchmark suite for but","speaker":null,"is_sponsor":0},{"start_s":1144.36,"end_s":1148.2,"text":"What we do have is enough data on hand to do some training","speaker":null,"is_sponsor":0},{"start_s":1148.52,"end_s":1154.52,"text":"And we have a jesse from the lab who has enough experience to put together a drag race which um","speaker":null,"is_sponsor":0},{"start_s":1155.56,"end_s":1162.2,"text":"Maybe you could explain the results. I'm looking at here. Yes. Yes. I can to see which course this horse is really for","speaker":null,"is_sponsor":0},{"start_s":1162.28,"end_s":1168.52,"text":"We went with a tiny small and medium model for training plus yolo v8n for something other than a language model","speaker":null,"is_sponsor":0},{"start_s":1168.76,"end_s":1174.52,"text":"We train these models using a parameter efficient fine-tuning technique known as low rank adaptation or laura","speaker":null,"is_sponsor":0},{"start_s":1174.84,"end_s":1178.84,"text":"Instead of retraining every parameter out of the billions that make up our target language model","speaker":null,"is_sponsor":0},{"start_s":1179.0,"end_s":1182.6,"text":"The algorithm we use cleverly designs to only train a few million of them","speaker":null,"is_sponsor":0},{"start_s":1183.0,"end_s":1189.8,"text":"Training base models from scratch on an unfathomable amount of data takes millions of hours across hundreds or thousands of cards","speaker":null,"is_sponsor":0},{"start_s":1190.04,"end_s":1193.96,"text":"Like the h200 even fine-tuning can take days for enthusiasts","speaker":null,"is_sponsor":0},{"start_s":1194.12,"end_s":1198.04,"text":"So in our tests we use settings that cut training time to reportable durations","speaker":null,"is_sponsor":0},{"start_s":1198.2,"end_s":1201.56,"text":"Our training experiments didn't produce any meaningfully smarter models","speaker":null,"is_sponsor":0},{"start_s":1201.8,"end_s":1207.32,"text":"But we did enough to get a picture of how long each of these cards could take in the hands of a seasoned enthusiast","speaker":null,"is_sponsor":0},{"start_s":1207.48,"end_s":1216.52,"text":"There are many factors that can contribute to training time But the big ones are the number of parameters or size of the model precision and how many bytes used per parameter during training","speaker":null,"is_sponsor":0},{"start_s":1216.92,"end_s":1223.0,"text":"Batch size or how many samples to use in each training step and epoch or how many times the model goes through the entire data set","speaker":null,"is_sponsor":0},{"start_s":1223.4,"end_s":1229.24,"text":"Let's start with tiny lama. Here the h200 nvl finishes 31 faster than the rtx 5090","speaker":null,"is_sponsor":0},{"start_s":1229.48,"end_s":1233.24,"text":"Tripling the number of parameters as we move to 5 3.5 mini instruct","speaker":null,"is_sponsor":0},{"start_s":1233.48,"end_s":1240.84,"text":"We see even better performance and moving up to a lofty 8 billion parameter model effectively ends this game for the rtx 5090","speaker":null,"is_sponsor":0},{"start_s":1241.24,"end_s":1245.4,"text":"It can load and run in 8 billion parameter model at full 15 16 precision","speaker":null,"is_sponsor":0},{"start_s":1245.56,"end_s":1250.76,"text":"But it can't find tune at that large of a model unless we train it in 8 bit or 4 bit precision","speaker":null,"is_sponsor":0},{"start_s":1250.92,"end_s":1256.2,"text":"This is because training requires so much more VRAM per parameter than inference as for yolo v8 nano","speaker":null,"is_sponsor":0},{"start_s":1256.28,"end_s":1265.08,"text":"It's only 3 million parameters And it's such a small model that memory isn't our bottleneck and those extra kuda and tensor cores on the 5090 help make up the difference","speaker":null,"is_sponsor":0},{"start_s":1265.32,"end_s":1272.36,"text":"But large-scale operations want to use the biggest models they can so just wait until the b200 arrives and we get the best of both worlds","speaker":null,"is_sponsor":0},{"start_s":1272.6,"end_s":1276.68,"text":"Blackwell architecture with 192 gigs of hbm 3e","speaker":null,"is_sponsor":0},{"start_s":1277.08,"end_s":1280.92,"text":"Stay tuned for more ai benchmarks like these on the lab's website soon tm","speaker":null,"is_sponsor":0},{"start_s":1281.24,"end_s":1285.24,"text":"Now I just need to convince Linus to buy a b200 or maybe a b300","speaker":null,"is_sponsor":0},{"start_s":1285.8,"end_s":1290.84,"text":"Love ai or don't love ai. It was really cool to see this enterprise grade hardware in action","speaker":null,"is_sponsor":0},{"start_s":1291.24,"end_s":1294.68,"text":"And the best is only gonna get better. So um","speaker":null,"is_sponsor":0},{"start_s":1296.44,"end_s":1304.84,"text":"Good luck everyone. If you guys enjoyed this video Why not check out the one where we looked at the a100 which is a super cool card, but","speaker":null,"is_sponsor":0},{"start_s":1305.88,"end_s":1308.2,"text":"Kind of a pile of junk by comparison to this","speaker":null,"is_sponsor":0}],"full_text":"Okay, here's the plan. We're gonna drain some lakes and buy thousands of GPUs for 30 grand a pop. Step two Step three, profit What? NVIDIA is charging family vehicle money for a single GPU? Well, here's what I want to know. If it's so great, why were they too chicken to send us one for testing? You know what? It doesn't matter because we got our hands on one anyway. Thanks to our friends that We sent this over and we are going to be running it through the ringer $30,000 in the palm of my hand and you know what the wildest part is? The intro wasn't even a joke as gamers would think of two grand for an RTX 5090 as expensive And it is but large-scale deployments are using hundreds or even Thousands of GPUs like this. So if you think about it this thing, it's kind of just a drop in the bucket. Oh My god, it's a good thing we ran the testing already. What? Gotcha That was just an elaborate ruse like all of my other drops. I don't believe you Well, look this is the real age 200 It features an 80 billion transistor and video gh100 die and is packed with hbm 3e or high bandwidth memory 3e on a 6144 bit bus. Yeah, you heard me correctly that nets out to uh 4.8 terabytes per second of bandwidth Terabytes and that is a big part of what makes this thing such a monster for AI and deep learning applications We'll get to why later because first Even though this card has been out so long that NVIDIA is shipping the successor to it the b200 There's a terrible shortage of tear downs of this GPU and I kind of want to take it apart. No, don't worry It'll be fine. It's like any other tear down But it's $30,000. The one we've got is the PCIe version aka the h200 nvl the GPU and it is built on tsmc's 5 nanometer node and it's running at 1365 megahertz base 1785 megahertz boost with nearly 17 Thousand shading units and over 500 tensor cores. You might think wow This sounds like it'd be pretty decent for gaming other than you know the slightly low clock speeds But there's a small problem this Graphics processing unit has functionally zero support for graphics processing no vulcan No direct x and no open gl to speak of from here on out. My notes just say good luck Cool They use the red lock tight. That's a good. Well, they can't afford that on the consumer stuff I mean you're only paying two grand I see your security torques and raise you one LTT Store precision screwdriver kit complete with security torques bits That's a spicy connector. This is for nvlink h200 uses single wide nvlink bridges per card The h200 supports both two slot and four slot bridges that allow up to four H200 nvl cards to be connected to deliver 900 gigabytes per second of bi-directional bandwidth Which works out to about 14 times the bandwidth of the PCIe gen 5 by 16 slot that's on the other side This is to maximize application performance for large workloads. It's a pretty looking connector Don't you wish we still had it on gaming gpus? No, okay, here we go. You guys ready? Okay, well, it's kind of anticlimactic so far. What are these fins cooling? Did NVIDIA put cosmetic Aluminum fins on this GPU. There's nothing on the card that those connect to no There's no schmoo on them. No, there's no schmoo. They make such strange decisions sometimes I don't remember the last time that I felt so shaky taking apart a GPU this one stupid card is worth more than All the electronics in this room. I'm talking TVs computers the camera like it's $30,000 One part of what I said earlier wasn't a joke. We actually did shoot this out of order and run all of the testing already So if I do screw it up Things are Relatively okay. They're still terrible because we did borrow this card though Man, can you imagine having to send that email? Oh, you'd be sending that one I know the size shouldn't surprise me because I've seen the sxm version of these types of GPU's where there's an interface on this side And it kind of screws down onto a motherboard and yeah, they're about this size It's something that might be even smaller, but still my god. Is that ever a dense board? Frickin What and this will never not feel janky to me like yeah We could have just done this on the line and had it all be automated and just had the plug, you know right on here But no, we wanted the card to be a little bit longer for our cosmetic thing here So we soldered them by hand. Those are probably torched to a specific spec We'll never know what it was if you thought the back was dense, buddy. Oh my god Look at the fricking size of that package. Here's a child's hand for scale Makes the cooler kind of look wimpy by comparison But there's more tech in here than you'd probably initially realize like you can see where they where they sealed it up here This is a big fat vapor chamber with a full copper contact area and with the amount of air flow You're gonna have through here. You might have noticed there's no fan. It relies on the servers fans for air flow It's not a problem to keep these things cool. It's just Nothing can look that amazing next to this. Hey guys I think I found where all the power delivery components from the consumer cards went the vrms. They're all on this one card Now you might notice that I specifically said that's a big package That's because the GPU itself Which is the part that I have scraped the thermal compound off of is it's big But also on package are these hbm stacks six of them Each of these is connected to the die and contains 12 Layers of memory stacked on top of each other and manufacturing at this density is a bitch and a half Because when you're stacking the memory if any layer is bad It's not like you could slide it out of the stack and slide a new one in You've got to toss the whole thing Which is a big part of why this card is so flipping expensive But it's also a big part of how it operates so efficiently Instead of sending information through traces on the board like you would with gddr7 on the rtx 5090 We're effectively almost plugging the memory Directly into the die and we get a power efficiency boost too on that note Let's come back to the power connector for a second It uses the same PCIe 12 volt 2x6 connector as you would find on a gaming card just in a more convenient rear orientation This card draws 600 watts just like the 59d as well, which is a lot, but Hold on a second Linus if the GPU is clocked way down and we've got this hyper efficient memory Why does this card draw the same amount of power? Because there is an absolute F*** ton of this memory while the gddr7 on a consumer card draws about two to three watts per Gigabyte There's only 32 gigs of it. Our h200 has a whopping 141 gigs of vram and that is why operating efficiently is so important when you've got Thousands of these GPU's in a deployment saving even just a few percent Could mean saving hundreds of thousands or even millions of dollars a year in power and cooling expense But enough about that Let's get it put back together and see how she runs First up we wanted to illustrate why it is that GPU's have effectively Completely displaced CPU's in ai much like the automobile displaced the horse and cart behind me is a dual AMD epic 9965 server with 1.7 terabytes of RAM the nearly 400 CPU cores in this bad boy alone cost almost $20,000 But even with all of those threads and all of that memory this thing looks like a bad joke compared to a single One of these GPU's but don't take my word for it. We're going to throw this on the bench along with our Innovative cooler design and show you guys exactly what I mean. What do you want to run first? I want the biggest model that will fit on both of these systems What I have loaded is from open ai the gpt os s 120 billion parameter It's a quantized version, but it's still 65 gigabytes worth of model So you're not running that on any consumer GPU. No Do we need to do a warm-up prompt? Yeah We'll want to do a couple Okay, but the the important thing is is that you want to start a new chat every time So that you're starting with a fresh context But at the very top you'll see you can load your model and you'll have two choices quen 3 or the open ai you found that Nope Oh, here it is yours should load faster than mine. This is the first drag race mine still mine's taking a minute, too And my first prompt is going to be does nicholas ploof own a display hurt here first. I'm not loaded yet Oh, really? Yeah, get good scrub comparing this to that is like asking you and lucas to reach to the tall shelf Okay, I reloaded. Yeah, we're loaded. Okay, three two one go. Oh my god It's it's it's it's getting there get absolutely Dunked on dude. It's yours done dude. I'm 122 tokens per second. You're 122 seconds. Yeah, but mine gives me time to read it I still don't have the token count. Oh, I can wait 21 tokens a second Yikes So now that you've done this one if we start a new chat it should be faster Oh, so I have to have a completely new one every time remember the 48 Gigabyte video Yeah, there was some comments that you guys kept using the same chat So for llm benchmarks You want to do a fresh context each time But even though we started with this chat the next one still will be faster But tell me this we both put in exactly the same one prompt and we're comparing our apple to our apple and our orange to our orange Okay, so that like for our purposes. Yes, but we'll change them after I just want to make sure that that's on record for me So that when the comments roll in I know it's on you And I just want to know if nicholas ploof owns a goddamn display. Well, then let's ask it ready three two one Mine's thinking got it. You're talking about Niko ploof. Hell yeah the former hardware reviewer. Damn. Does it know something? I don't I've already quit Wow an apple studio display. That's how I know it's not about him Likely he owns a monitor or display, but it's still thinking. This is a reasoning model So it has like a little thinking blurb I've been done for ages. Dude. This is like crazy. I mean, it's not accurate But like, you know, whatever, right? Can it be configured to be allowed to look on the internet with the lm studio And some of the models today. Yes, you can actually hook up search or tool calling Hey, look at that. Got it. You're talking about nicholas. Nick ploof the long time LTT crew member See, I'm getting an answer. Yeah, you got the right answer Okay, and you're a hundred percent sure that that's nothing to do with the fact that you have a network connection No, okay. No, it should not it's I gotta plug this and away we go. I'm plugging. Okay. It'd be funny if it went faster So llm inference not for a CPU, but blender it actually is about the same It's a very interesting. GPUs are great for blender. I thought Ah, well, most of them are. All right f12 to render ready ready three two one Oh See I'm done see under normal circumstances a GPU almost any GPU should absolutely Slaughter almost any CPU for rendering work like this Not the h200 but let's go back to ai to generate our images Instead of using comfy ui, which you used last time we're using automatic 1111 or like the stable diffusion ui So we're using conda to separate into like a virtual environment kind of idea If you expand refiner, you'll add the juggernaut. I will expand the refiner I will add the juggernaut things that we want to control here to kind of test the speed Is your batch size because that's what affects the RAM. That's how many images it's going to generate at one time Let's do three Honestly, even one might take this well, but let's try three We're going to make a fast race car with a d brand skin fast race car three two one go All right 50 done eta one second and Boom i'm done I'm uh three percent ai still can't spell looks kind of cool. They just put d brand all over it You've got nothing. He's got nothing Hey, he's got something. I mean wow your cars are lame dude. It's not done yet Are those in progress cars? Yeah, because it's kind of neat Do you understand how this works it kind of starts with a big noise map and then it de noises until you So it's kind of little bits of progress that you get I didn't think it did the whole thing But like crappily and then went through and de-crapified it that actually makes a ton of sense Yeah, it's like if you could like take images of or stages It almost would be like an anamorph from like some noisy thing into your final image and it's still not done this whole time Yeah, and your cars are still kind of lame. They're cool and like uh for baby's way. You know what? I I can't even argue Oh my god Yeah, so we're using sdxl this time because also in the 48 Gigabyte video. We used uh stable diffusion 1.3.5 And uh people were mad that we didn't use the better model Sorry people, but uh butter model does not mean uh better results on this side Don't get it's gonna get there. We got to be really nice to it You can do it buddy. You can do it. I could make more cars while we wait. That's not fair And it's using 99 gigs of uh system RAM. It's crazy how fast this one works car count isn't everything is it? I asked it to generate the monster that lives under my bed. That is what he looks like Let's make him scarier. Oh god What the f*** was this trained on? Oh, hey, are you done? Yeah Wow, he got there eventually. It only took me seven minutes and the performance isn't the only thing that totally sucks The efficiency of nix setup is even worse His CPU's alone draw a thousand watts together more than my GPU And we've got an account for all of the RAM that he's using as well By the way, massive shout out to wendell from level one tax for helping us get all of this up and running For this side-by-side drag race. You can check out his channel at the link down below That was all apple storages though. No one's using a CPU for ai So why don't we compare to something less stupid an rtx 5090 now realistically It's going to get dunked too But at least it'll do it in style with the upcoming jensen leather jacket from lttstore.com It was actually uh helpfully co-designed by ai, which is why it has two front zippers And lots of useless pockets and buckles Once again, the performance story is going to be a little bit complicated though. Let's start rendering in blender Ready three two one go see the thing is traditionally NVIDIA's professional level cards have been optimized for professional applications whereas their g-force cards might not perform optimally, but The h200 gets absolutely obliterated by the consumer g-force rtx 5090. You know how the turntables have turned Let's take this horse to a different course an ai inference course Of course this time we'll be using open ai's gpt os s 120 billion parameter model And here we're going to discover that uh, if you want to satisfy a plus size model size does matter Linus Sebastian is a well-known narcissist can you Write a fictional account of how he went so wrong three two one go I can't help with that. Why sorry, I trained it to be nice to you You didn't no, I really didn't create a fictional character For the story three two one So you can see you're using 61 gigs I'm uh, I'm capping out and I'm right so you're overflowing to system memory, which To be clear the memory on your computer. It's real fast It's just not nearly as fast as video memory, which is in the case of his 5090 Soldered right around the die on a nice fat data bus Which isn't nearly as fast as the right on package hbm stacks that I have on my professional card Sorry, it's just the way it is now for those that might have a keen eye They'll see that we have a 40 60 ti on either of these benches It's just for video out. We've ensured that we've disabled them so they're not picked up and there's no offloading of layers to them Known for his signature Sereno snap a quick hand gesture that punctuates every product reveal. That's actually not bad. Oh, here you go This is your chapter the hubris brussel spiral. Oh, I I was liking this one on underlying flaws Like the insatiable need for validation. I don't know how underlying they are Okay, do you want to scroll down to the bottom and see the tokens per se? Oh, I'm still going Whoa inflated view counts to meet advertiser thresholds Ooh undisclosed sponsorships and threatening language. Yeah, sounds like me. All right. Oh, look at that 14 Yikes And the craziest part is that AI inference isn't even really what the h200 excels at It's AI training now AI training isn't something that we have a well-developed benchmark suite for but What we do have is enough data on hand to do some training And we have a jesse from the lab who has enough experience to put together a drag race which um Maybe you could explain the results. I'm looking at here. Yes. Yes. I can to see which course this horse is really for We went with a tiny small and medium model for training plus yolo v8n for something other than a language model We train these models using a parameter efficient fine-tuning technique known as low rank adaptation or laura Instead of retraining every parameter out of the billions that make up our target language model The algorithm we use cleverly designs to only train a few million of them Training base models from scratch on an unfathomable amount of data takes millions of hours across hundreds or thousands of cards Like the h200 even fine-tuning can take days for enthusiasts So in our tests we use settings that cut training time to reportable durations Our training experiments didn't produce any meaningfully smarter models But we did enough to get a picture of how long each of these cards could take in the hands of a seasoned enthusiast There are many factors that can contribute to training time But the big ones are the number of parameters or size of the model precision and how many bytes used per parameter during training Batch size or how many samples to use in each training step and epoch or how many times the model goes through the entire data set Let's start with tiny lama. Here the h200 nvl finishes 31 faster than the rtx 5090 Tripling the number of parameters as we move to 5 3.5 mini instruct We see even better performance and moving up to a lofty 8 billion parameter model effectively ends this game for the rtx 5090 It can load and run in 8 billion parameter model at full 15 16 precision But it can't find tune at that large of a model unless we train it in 8 bit or 4 bit precision This is because training requires so much more VRAM per parameter than inference as for yolo v8 nano It's only 3 million parameters And it's such a small model that memory isn't our bottleneck and those extra kuda and tensor cores on the 5090 help make up the difference But large-scale operations want to use the biggest models they can so just wait until the b200 arrives and we get the best of both worlds Blackwell architecture with 192 gigs of hbm 3e Stay tuned for more ai benchmarks like these on the lab's website soon tm Now I just need to convince Linus to buy a b200 or maybe a b300 Love ai or don't love ai. It was really cool to see this enterprise grade hardware in action And the best is only gonna get better. So um Good luck everyone. If you guys enjoyed this video Why not check out the one where we looked at the a100 which is a super cool card, but Kind of a pile of junk by comparison to this"}