{"video_id":"fp_WUK3PiL8sQ","title":"An inside look at the FIR Supercomputer","channel":"Linus Tech Tips","show":"Linus Tech Tips","published_at":"2025-10-16T17:00:00.065Z","duration_s":1131,"segments":[{"start_s":0.0,"end_s":8.96,"text":"165,000 CPU cores, 20 million dollars of GPU, and a cool tetebite of RAM.","speaker":null,"is_sponsor":0},{"start_s":8.96,"end_s":16.0,"text":"I wouldn't normally describe myself as a furry, but the new Fur Supercomputer has definitely","speaker":null,"is_sponsor":0},{"start_s":16.0,"end_s":19.84,"text":"awakened some feelings that I can't say I've ever felt before.","speaker":null,"is_sponsor":0},{"start_s":19.84,"end_s":26.96,"text":"Feelings like wanting to go deep inside it to gently remove its panels and maybe some light","speaker":null,"is_sponsor":0},{"start_s":27.04,"end_s":33.12,"text":"screwing? And thanks to our friends here at Simon Fraser University in beautiful British Columbia,","speaker":null,"is_sponsor":0},{"start_s":33.12,"end_s":39.92,"text":"we're going to be doing just that, going deep under the hood of the CPU and GPU compute deployment","speaker":null,"is_sponsor":0},{"start_s":39.92,"end_s":44.8,"text":"that is going to be serving tens of thousands of scientists and researchers in fields all the","speaker":null,"is_sponsor":0},{"start_s":44.8,"end_s":49.04,"text":"way from AI to zoology all over the country for years to come.","speaker":null,"is_sponsor":0},{"start_s":49.04,"end_s":54.96,"text":"This will be our first up close look at a real world deployment that uses direct dye liquid","speaker":null,"is_sponsor":0},{"start_s":54.96,"end_s":60.64,"text":"cooling to increase cooling efficiency from about 30 percent to over 90 percent.","speaker":null,"is_sponsor":0},{"start_s":60.64,"end_s":63.6,"text":"Or at least it'll be the first data center grade deployment.","speaker":null,"is_sponsor":0},{"start_s":64.24,"end_s":67.76,"text":"Mine doesn't count, and it doesn't look nearly as sexy.","speaker":null,"is_sponsor":0},{"start_s":68.48,"end_s":72.72,"text":"But what is sexy is this segue to our sponsor.","speaker":null,"is_sponsor":0},{"start_s":72.72,"end_s":91.36,"text":"In the row behind me is 640 NVIDIA H180 Gigabyte GPUs each with an estimated cost of around","speaker":null,"is_sponsor":0},{"start_s":91.36,"end_s":93.68,"text":"31,000 US dollars.","speaker":null,"is_sponsor":0},{"start_s":94.4,"end_s":101.2,"text":"Even at less than half of the maximum density, just 20 nodes per rack, the team here had to","speaker":null,"is_sponsor":0},{"start_s":101.2,"end_s":107.28,"text":"reroute power from elsewhere in the building and significantly upgrade the building's cooling system","speaker":null,"is_sponsor":0},{"start_s":107.28,"end_s":112.88,"text":"just to accommodate the incredible power requirements of these NVIDIA hoppers.","speaker":null,"is_sponsor":0},{"start_s":112.88,"end_s":117.6,"text":"This is actually a common theme that I hear from basically anyone in the data center space.","speaker":null,"is_sponsor":0},{"start_s":117.6,"end_s":122.16,"text":"I mean we tried to build for the future, but we couldn't have possibly seen this coming,","speaker":null,"is_sponsor":0},{"start_s":122.8,"end_s":126.56,"text":"and there's no sign of things slowing down. We'll get to that later though.","speaker":null,"is_sponsor":0},{"start_s":127.44,"end_s":135.84,"text":"First, the most exciting part of the tour. They pulled one of their spares out of the rack for us to crack open and get up close and personal","speaker":null,"is_sponsor":0},{"start_s":135.84,"end_s":140.64,"text":"width, and oh my god, look at this thing.","speaker":null,"is_sponsor":0},{"start_s":141.84,"end_s":150.8,"text":"It's heavy. I guess when you got this much hopper in you like, wow, it's kind of scary handling it.","speaker":null,"is_sponsor":0},{"start_s":150.8,"end_s":154.48,"text":"I mean this one you know to loan is worth more than my part,","speaker":null,"is_sponsor":0},{"start_s":154.48,"end_s":156.96,"text":"and a rack of these is worth more than my house.","speaker":null,"is_sponsor":0},{"start_s":158.32,"end_s":161.52,"text":"It's a little sketchy, but I want you guys to be able to see it.","speaker":null,"is_sponsor":0},{"start_s":161.52,"end_s":169.52,"text":"The CPUs are Epic Genoa, so last generation Zen 4 based, but Genoa still supports up to 12","speaker":null,"is_sponsor":0},{"start_s":169.52,"end_s":177.28,"text":"channel DDR5 memory and 128 lanes of PCIe Gen5, which is plenty to keep these GPU cores fed.","speaker":null,"is_sponsor":0},{"start_s":178.0,"end_s":183.6,"text":"If more CPU compute is needed, clearly there is support for dual CPU sockets,","speaker":null,"is_sponsor":0},{"start_s":183.6,"end_s":190.56,"text":"but the team at SFU found that 8 CPU cores per GPU was plenty for their purposes,","speaker":null,"is_sponsor":0},{"start_s":190.56,"end_s":198.24,"text":"and they opted for a single 48 core CPU and 1.152 terabytes of RAM in each of their nodes.","speaker":null,"is_sponsor":0},{"start_s":199.04,"end_s":206.32,"text":"Now for a closer look at the GPUs. Unfortunately, I'm not allowed to take the coolers off them, but under each of these","speaker":null,"is_sponsor":0},{"start_s":206.32,"end_s":217.12,"text":"4 cold plates is an NVIDIA H100 SXM5 80 gig GPU giving us a total of 320 gigabytes of VRAM per node.","speaker":null,"is_sponsor":0},{"start_s":217.68,"end_s":224.24,"text":"And guys, that's not just any VRAM, that is HBM3 running on a 5120 bit bus","speaker":null,"is_sponsor":0},{"start_s":224.24,"end_s":229.84,"text":"for a total bandwidth per GPU of 3.36 terabytes per second.","speaker":null,"is_sponsor":0},{"start_s":230.4,"end_s":238.0,"text":"For context, a top of the line consumer card, the RTX 5090, achieves just over half of that bandwidth.","speaker":null,"is_sponsor":0},{"start_s":238.72,"end_s":243.52,"text":"This kind of power does come with drawbacks however, like for example heat.","speaker":null,"is_sponsor":0},{"start_s":244.24,"end_s":250.96,"text":"Each of these is rated for 700 watts of power consumption through the SXM socket that's","speaker":null,"is_sponsor":0},{"start_s":250.96,"end_s":256.4,"text":"underneath them. And that is where the incredible cooling solution in this Lenovo node comes in.","speaker":null,"is_sponsor":0},{"start_s":256.4,"end_s":261.28,"text":"As a liquid cooling nerd, I gotta say guys, this is the coolest part for me.","speaker":null,"is_sponsor":0},{"start_s":261.28,"end_s":266.96,"text":"I mean, did you notice that there isn't a single fan in sight anywhere in this machine?","speaker":null,"is_sponsor":0},{"start_s":267.68,"end_s":277.04,"text":"That is because everything, CPUs, GPUs, VRMs, network interface, SSD caddy, even the system","speaker":null,"is_sponsor":0},{"start_s":277.04,"end_s":282.96,"text":"memory is directly liquid cooled. All of it. This feels a little bit like doing the maze and","speaker":null,"is_sponsor":0},{"start_s":282.96,"end_s":289.04,"text":"highlights magazine. So here's our inlet over here, which splits into two main loops that go","speaker":null,"is_sponsor":0},{"start_s":289.04,"end_s":294.4,"text":"through the system. The primary loop, which we can tell because it has a thicker pipe coming off of it,","speaker":null,"is_sponsor":0},{"start_s":294.4,"end_s":300.96,"text":"goes straight to the middle of our four GPUs, where this manifold splits fresh incoming water","speaker":null,"is_sponsor":0},{"start_s":300.96,"end_s":307.84,"text":"out to our four GPUs. Two of them just fit right back into the outlet here, while the other two","speaker":null,"is_sponsor":0},{"start_s":307.92,"end_s":314.48,"text":"run up to this networking board and then consolidate back to the outlet. That's our primary loop.","speaker":null,"is_sponsor":0},{"start_s":314.48,"end_s":320.64,"text":"Our secondary loop comes through here, handling some of the power delivery, and then carries over to","speaker":null,"is_sponsor":0},{"start_s":321.84,"end_s":330.96,"text":"interesting. It splits out doing the RAM next. I am not 100% sure what to make of that because I","speaker":null,"is_sponsor":0},{"start_s":330.96,"end_s":336.88,"text":"would think RAM would be a tertiary priority in terms of cooling. But that's what they've done.","speaker":null,"is_sponsor":0},{"start_s":336.88,"end_s":342.56,"text":"We go through the RAM, splitting into three different tubes that sit between our dims down","speaker":null,"is_sponsor":0},{"start_s":342.56,"end_s":349.6,"text":"both rows. Then one side handles this network caddy here and the other side handles our SSD caddy.","speaker":null,"is_sponsor":0},{"start_s":350.16,"end_s":356.96,"text":"Then each of those come back to one of the CPUs, which come out into the middle here and then run","speaker":null,"is_sponsor":0},{"start_s":356.96,"end_s":362.56,"text":"back to the outlet here. Not maybe the way I would have laid it out. There's a lot of 90 degree","speaker":null,"is_sponsor":0},{"start_s":362.56,"end_s":367.52,"text":"turns in here, meaning a lot of restriction, but I'm sure the engineers at Lenovo know what","speaker":null,"is_sponsor":0},{"start_s":367.52,"end_s":372.08,"text":"they're doing. There's a ton of other cool stuff to unpack here too. You probably noticed there's","speaker":null,"is_sponsor":0},{"start_s":372.08,"end_s":378.48,"text":"no power supply. That's because it uses these shonky connectors here at the back to plug into a","speaker":null,"is_sponsor":0},{"start_s":378.48,"end_s":385.04,"text":"backplane in the back of the rack. As for the cooling connections, well, according to the manufacturer,","speaker":null,"is_sponsor":0},{"start_s":385.04,"end_s":392.24,"text":"these do have a little bit of natural leakage, but it's on the order of molecules, which is pretty","speaker":null,"is_sponsor":0},{"start_s":392.24,"end_s":398.8,"text":"damn impressive. There are sensors all over the motherboard to detect any kind of leakage,","speaker":null,"is_sponsor":0},{"start_s":398.8,"end_s":405.28,"text":"and there is grounding throughout the system in the form of these little copper, what look like","speaker":null,"is_sponsor":0},{"start_s":405.28,"end_s":411.04,"text":"solder wick pieces. That's to prevent what's called stray current corrosion, which can be","speaker":null,"is_sponsor":0},{"start_s":411.04,"end_s":417.28,"text":"caused by a current that's accidentally induced in the coolant, which can lead to massive corrosion,","speaker":null,"is_sponsor":0},{"start_s":417.28,"end_s":423.76,"text":"which can lead to leaks, which I know from experience. Now, the team here wasn't sure about","speaker":null,"is_sponsor":0},{"start_s":423.76,"end_s":428.64,"text":"the exact chemistry of the coolant they're using, but they did tell me that it has antimicrobial","speaker":null,"is_sponsor":0},{"start_s":428.64,"end_s":433.44,"text":"properties to prevent anything from growing in the loop. There's some other fun stuff.","speaker":null,"is_sponsor":0},{"start_s":433.44,"end_s":439.68,"text":"There's a little stylus in here. Apparently, this is meant to assist in removing memory,","speaker":null,"is_sponsor":0},{"start_s":439.68,"end_s":442.64,"text":"which is great. I'd actually love to see more gaming motherboards come with that.","speaker":null,"is_sponsor":0},{"start_s":443.12,"end_s":452.08,"text":"I thought this lone 7.68 terabyte NVMe drive was interesting too. I mean, the networking is 400","speaker":null,"is_sponsor":0},{"start_s":452.08,"end_s":459.84,"text":"gigabit per second times two to the two petabytes of NVMe storage, not to mention 49 petabytes of","speaker":null,"is_sponsor":0},{"start_s":459.84,"end_s":465.84,"text":"spinning rust that's right over there, but according to the team here, occasionally they need no local","speaker":null,"is_sponsor":0},{"start_s":465.84,"end_s":471.52,"text":"storage to improve GPU performance a little bit. So you'd never boot off of this or anything,","speaker":null,"is_sponsor":0},{"start_s":471.52,"end_s":477.44,"text":"but it's nice to have there as a scratch. Also, the button cell in here is mounted in a vertical","speaker":null,"is_sponsor":0},{"start_s":477.44,"end_s":484.0,"text":"caddy because the density is so high in this one, you know, that they just couldn't give up the space","speaker":null,"is_sponsor":0},{"start_s":484.0,"end_s":489.84,"text":"that it would have taken to mount it parallel to the board. I also spotted a micro SD header.","speaker":null,"is_sponsor":0},{"start_s":491.12,"end_s":496.0,"text":"If anyone out there works in the data center and knows what that's for, I haven't seen it before,","speaker":null,"is_sponsor":0},{"start_s":496.0,"end_s":502.0,"text":"and Gem and I just assumed that I typoed. Oh, there was one other thing we wanted to look at,","speaker":null,"is_sponsor":0},{"start_s":502.0,"end_s":508.72,"text":"these big power bad boys. We couldn't see them until we got that shroud off. So these,","speaker":null,"is_sponsor":0},{"start_s":508.72,"end_s":514.0,"text":"they're just bus bars. They're going from power supply here, which is a DC to DC power supply.","speaker":null,"is_sponsor":0},{"start_s":515.52,"end_s":521.76,"text":"And they're going over to our GPUs. Damn. What's interesting to me that I just noticed","speaker":null,"is_sponsor":0},{"start_s":521.76,"end_s":527.28,"text":"is that there's a clear delineation between the NVIDIA engineered parts of this with the Black","speaker":null,"is_sponsor":0},{"start_s":527.28,"end_s":533.28,"text":"PCB and they're completely separate from everything else and the Lenovo engineered parts of this.","speaker":null,"is_sponsor":0},{"start_s":533.28,"end_s":538.96,"text":"So Lenovo is acting like more of a system integrator around this compute block here.","speaker":null,"is_sponsor":0},{"start_s":538.96,"end_s":545.52,"text":"Like you can even see the silk screening on the PCB is distinctly NVIDIA and Lenovo is just doing","speaker":null,"is_sponsor":0},{"start_s":545.52,"end_s":553.04,"text":"their DC to DC power. So it's just power in here and then PCIe in here in the form of these four","speaker":null,"is_sponsor":0},{"start_s":553.04,"end_s":560.24,"text":"MCIO connectors right here. This is essentially like plugging a GPU into your Legion gaming PC.","speaker":null,"is_sponsor":0},{"start_s":561.28,"end_s":569.12,"text":"The GPU house. With extra steps. Yeah. Before we poke around in one of the 192 core CPU nodes that","speaker":null,"is_sponsor":0},{"start_s":569.12,"end_s":576.8,"text":"they've got, let's take a look at one of the racks that these boys slide into. They're still using a","speaker":null,"is_sponsor":0},{"start_s":576.8,"end_s":583.28,"text":"very similar rear door chilled liquid rack like we saw with their air cooled nodes when we did a tour","speaker":null,"is_sponsor":0},{"start_s":584.24,"end_s":593.04,"text":"over eight years ago. Anywho, the point is that chilled 16 and a half degree cooling comes from","speaker":null,"is_sponsor":0},{"start_s":593.04,"end_s":599.84,"text":"the evaporative cooling towers outside. Then hot air from the power supplies and any of the","speaker":null,"is_sponsor":0},{"start_s":599.84,"end_s":607.04,"text":"networking equipment that's in the rack runs through here and wow is that ever hot. Then it","speaker":null,"is_sponsor":0},{"start_s":607.04,"end_s":614.32,"text":"spits out nice comfortable room temperature air on the other side. Each of these racks is fed by","speaker":null,"is_sponsor":0},{"start_s":614.4,"end_s":623.44,"text":"dual three phase 60 amp feeds for a total of about 70,000 watts per rack. Now if SFU had the power and","speaker":null,"is_sponsor":0},{"start_s":623.44,"end_s":630.96,"text":"cooling in this 1960s bunker, they could juice these up to 180,000 watts per rack, but they don't.","speaker":null,"is_sponsor":0},{"start_s":630.96,"end_s":638.8,"text":"Hence the empty rack space. Since we have this open. Oh wow. That is a big difference between the","speaker":null,"is_sponsor":0},{"start_s":638.8,"end_s":645.04,"text":"cold side and the hot side going into the back of these back planes for the servers. I don't have","speaker":null,"is_sponsor":0},{"start_s":645.04,"end_s":650.8,"text":"to ask which one's the supply. That's the cold side, which since we're on the subject, this is a","speaker":null,"is_sponsor":0},{"start_s":650.8,"end_s":656.72,"text":"perfect time to look at the cooling distribution system. This is the Lieber XTU from Virtus. It","speaker":null,"is_sponsor":0},{"start_s":656.72,"end_s":664.64,"text":"can do 600,000 watts of cooling capacity per one of these cooling distribution units or CDUs. Water","speaker":null,"is_sponsor":0},{"start_s":664.64,"end_s":671.68,"text":"comes in the supply side here. This thick boy. Ha, she's chilly. That's coming from the cooling","speaker":null,"is_sponsor":0},{"start_s":671.68,"end_s":678.96,"text":"towers outside. Then that runs all the way down to the bottom here to the heat exchanger in the front.","speaker":null,"is_sponsor":0},{"start_s":678.96,"end_s":684.16,"text":"This liquid to liquid heat exchanger does exactly what it says on the tin. Taking that cold water","speaker":null,"is_sponsor":0},{"start_s":684.16,"end_s":690.56,"text":"from the primary leak that goes outside and using it to chill the warm water that is coming directly","speaker":null,"is_sponsor":0},{"start_s":690.56,"end_s":697.76,"text":"off of the blocks that are going to our nodes. This unit uses dual redundant pumps and if we go","speaker":null,"is_sponsor":0},{"start_s":697.76,"end_s":704.4,"text":"back or under the back uses these manifolds and valves to control flow to up to six different racks.","speaker":null,"is_sponsor":0},{"start_s":704.4,"end_s":709.68,"text":"And it's very easy to tell which is the cold side that's being chilled in, which is the hot side here.","speaker":null,"is_sponsor":0},{"start_s":710.72,"end_s":718.16,"text":"Wow. I want one. Vaughn, can I have one? Probably the coolest part is this little touchscreen display","speaker":null,"is_sponsor":0},{"start_s":718.16,"end_s":722.64,"text":"on the front that much more succinctly illustrates what I just said. Here's your primary loop,","speaker":null,"is_sponsor":0},{"start_s":722.64,"end_s":727.36,"text":"here's your secondary loop, here's all your flow rates, all your temperatures, and here's an alarm","speaker":null,"is_sponsor":0},{"start_s":727.36,"end_s":734.72,"text":"that they assure me is totally fine. This data is super important because if they accidentally add","speaker":null,"is_sponsor":0},{"start_s":734.72,"end_s":739.76,"text":"water that is too cool going into the servers behind me, then they could end up a condensation,","speaker":null,"is_sponsor":0},{"start_s":739.76,"end_s":745.2,"text":"which hopefully I don't have to explain why that's super, super bad. Everything's hooked up using","speaker":null,"is_sponsor":0},{"start_s":745.2,"end_s":750.48,"text":"aqua-therm tubing from Germany. The admins here spoke with some other facilities that used stainless","speaker":null,"is_sponsor":0},{"start_s":750.48,"end_s":755.76,"text":"steel and one of them got rust in their cooling system. It was a big, big mess. They've been really,","speaker":null,"is_sponsor":0},{"start_s":755.76,"end_s":760.72,"text":"really happy with their aqua-therm. Now let's go check out the CPU mode. Contrary to what NVIDIA","speaker":null,"is_sponsor":0},{"start_s":760.72,"end_s":766.24,"text":"would like everyone to believe, not everything runs best on a GPU even today and that's where these","speaker":null,"is_sponsor":0},{"start_s":766.24,"end_s":776.96,"text":"come in. Each of these 1U racks contains two nodes and each node contains 192 Zen 5","speaker":null,"is_sponsor":0},{"start_s":776.96,"end_s":786.88,"text":"epic Turing cores with 768 gigs of memory. So that's a total of nearly 400 cores in each of these","speaker":null,"is_sponsor":0},{"start_s":786.88,"end_s":796.72,"text":"1Us. Holy freaking... For networking, they actually don't go as heavy on these using 200 gig connections","speaker":null,"is_sponsor":0},{"start_s":796.72,"end_s":803.2,"text":"and NDR to dynamically share that 200 gigabit link between the two nodes depending on their needs.","speaker":null,"is_sponsor":0},{"start_s":803.76,"end_s":808.72,"text":"This approach does have the drawback of meaning that if the primary node goes down,","speaker":null,"is_sponsor":0},{"start_s":808.72,"end_s":813.44,"text":"we lose network connection to the secondary one, but I have to assume that the cost savings","speaker":null,"is_sponsor":0},{"start_s":813.52,"end_s":819.52,"text":"outweigh the disadvantages in this case. In terms of loop layout, this one is much","speaker":null,"is_sponsor":0},{"start_s":819.52,"end_s":826.4,"text":"simpler coming in to both sides and then out of both sides, but just like the GPU nodes,","speaker":null,"is_sponsor":0},{"start_s":826.4,"end_s":831.84,"text":"the goal here is to get a water tube up against pretty much anything in the server that generates","speaker":null,"is_sponsor":0},{"start_s":831.84,"end_s":838.4,"text":"heat because there are no fans whatsoever. One cool thing we missed on the GPU node was we never","speaker":null,"is_sponsor":0},{"start_s":838.4,"end_s":843.28,"text":"got a look under the little cooling plates that the SSDs and network cards sit on. So here's what","speaker":null,"is_sponsor":0},{"start_s":843.28,"end_s":849.84,"text":"it looks like. It pretty much looks like a heat pipe, except instead of being full of what is usually","speaker":null,"is_sponsor":0},{"start_s":849.84,"end_s":855.2,"text":"a vapor and sometimes a liquid that circulates just within itself, it's just full of water or","speaker":null,"is_sponsor":0},{"start_s":855.2,"end_s":860.32,"text":"other coolant that will circulate into an external system. Now let's take a look at the racks that","speaker":null,"is_sponsor":0},{"start_s":860.32,"end_s":868.0,"text":"these live in. Each of these racks contains 72 of the nodes that I just showed you guys, top to","speaker":null,"is_sponsor":0},{"start_s":868.08,"end_s":879.52,"text":"freak and bottom with roughly 13,824 cores. Each polyrack is an island with a non-blocking 800","speaker":null,"is_sponsor":0},{"start_s":879.52,"end_s":886.72,"text":"gig connection between islands, so 41,000 cores can represent a single job with no blocking.","speaker":null,"is_sponsor":0},{"start_s":886.72,"end_s":891.6,"text":"They have some other specialized nodes like the storage ones, including the ones on the other","speaker":null,"is_sponsor":0},{"start_s":891.6,"end_s":896.56,"text":"side of the aisle that hold data for our local particle collider. Try them. We've got a whole","speaker":null,"is_sponsor":0},{"start_s":896.56,"end_s":902.0,"text":"video about that, along with some eight terabyte RAM nodes, which are, I think, pretty self-explanatory.","speaker":null,"is_sponsor":0},{"start_s":902.0,"end_s":906.88,"text":"They're for jobs that would overflow on a regular node, as long as you don't mind them having a few","speaker":null,"is_sponsor":0},{"start_s":906.88,"end_s":914.8,"text":"bugs. And finally, a single AMD MI300X node, too. I don't know what, keeping video on the toes or","speaker":null,"is_sponsor":0},{"start_s":916.24,"end_s":921.68,"text":"we could do it. We could buy more than one of these. You better not charge too much, especially","speaker":null,"is_sponsor":0},{"start_s":921.68,"end_s":928.16,"text":"when you factor in modern security needs. There are six zones of security to get to some of the cages","speaker":null,"is_sponsor":0},{"start_s":928.16,"end_s":932.72,"text":"that actually have biometric locks on them, where not only do you need to know the pin code,","speaker":null,"is_sponsor":0},{"start_s":932.72,"end_s":938.24,"text":"but you have to put your hand under it and it will check if that hand is attached to a living person.","speaker":null,"is_sponsor":0},{"start_s":939.12,"end_s":943.28,"text":"There's cameras everywhere in the data center with full visibility in all directions,","speaker":null,"is_sponsor":0},{"start_s":943.28,"end_s":947.6,"text":"and our tour guide today actually said that there's someone monitoring them so often","speaker":null,"is_sponsor":0},{"start_s":947.6,"end_s":952.48,"text":"that it's become a bit of a game where they'll send non-flattering pictures of him moving around in","speaker":null,"is_sponsor":0},{"start_s":952.48,"end_s":959.04,"text":"the data center just to make sure he knows they're watching. Cooling everything are these evaporative","speaker":null,"is_sponsor":0},{"start_s":959.04,"end_s":963.92,"text":"cooling towers behind me. The three that are closest to the building, they were there the last","speaker":null,"is_sponsor":0},{"start_s":963.92,"end_s":968.4,"text":"time we were here, but I couldn't show them to you for reasons that involve red tape and approvals.","speaker":null,"is_sponsor":0},{"start_s":968.4,"end_s":973.68,"text":"So here they are. I still can't get any closer to them for reasons that involve red tape and","speaker":null,"is_sponsor":0},{"start_s":973.68,"end_s":979.12,"text":"approvals, but hey, we can check out the acoustic damping that's on these ones on the other side.","speaker":null,"is_sponsor":0},{"start_s":979.12,"end_s":984.08,"text":"That's impressive. Here I am right at the intake next to these sound baffles.","speaker":null,"is_sponsor":0},{"start_s":984.08,"end_s":990.72,"text":"And for context, here's the untreated ones. The total cooling capacity is about 4.7 megawatts,","speaker":null,"is_sponsor":0},{"start_s":990.72,"end_s":996.8,"text":"which that is way more than what's needed for the machines inside. But just like power and","speaker":null,"is_sponsor":0},{"start_s":996.8,"end_s":1001.6,"text":"storage, you want to have some extra for resiliency in the event of an equipment failure.","speaker":null,"is_sponsor":0},{"start_s":1001.6,"end_s":1004.72,"text":"Hey, what's one of these worth? Maybe I'll pick one up for the office.","speaker":null,"is_sponsor":0},{"start_s":1004.72,"end_s":1011.6,"text":"$1.2 million each. Oh, never mind. Frankly, I'd rather have one of these anyway.","speaker":null,"is_sponsor":0},{"start_s":1012.16,"end_s":1018.88,"text":"To augment the original pumps for cedar, which could do 800 gallons per minute of cooling,","speaker":null,"is_sponsor":0},{"start_s":1018.88,"end_s":1026.8,"text":"they added these two new ones that do 1500 gallons per minute. They also now have two","speaker":null,"is_sponsor":0},{"start_s":1026.8,"end_s":1032.08,"text":"mechanical chillers, which can be useful during the times of year that we get outside temperatures","speaker":null,"is_sponsor":0},{"start_s":1032.08,"end_s":1037.84,"text":"above 33 Celsius. So for maybe six hours a day, they'll switch over to mechanical chilling to","speaker":null,"is_sponsor":0},{"start_s":1037.84,"end_s":1041.84,"text":"help out their evaporative cooling tower. Probably the coolest thing about this gear, though,","speaker":null,"is_sponsor":0},{"start_s":1041.84,"end_s":1047.2,"text":"is how smart it is. They've got telemetry capture for things like temperature and flow rates,","speaker":null,"is_sponsor":0},{"start_s":1047.2,"end_s":1051.92,"text":"and it all feeds into a third party called Kaizen that helps with logging and determining if","speaker":null,"is_sponsor":0},{"start_s":1051.92,"end_s":1057.44,"text":"something's gone wrong with the system. Fun fact, by the way, the two-foot thick concrete floor","speaker":null,"is_sponsor":0},{"start_s":1057.44,"end_s":1063.12,"text":"that I'm standing on is so burdened by all of this heavy equipment and coolant that it actually","speaker":null,"is_sponsor":0},{"start_s":1063.12,"end_s":1071.68,"text":"deflects half an inch in the center. Is that okay? Am I going to break it? I mean, the place used to","speaker":null,"is_sponsor":0},{"start_s":1071.68,"end_s":1076.72,"text":"be the power distribution center for the southern half of our province, and it's built like a bunker.","speaker":null,"is_sponsor":0},{"start_s":1077.52,"end_s":1083.6,"text":"But that's not enough. Who paid for it all? FUR had a total budget of about $82 million, which,","speaker":null,"is_sponsor":0},{"start_s":1084.16,"end_s":1090.48,"text":"oh, I assume, is Canadian ruble deduce. So a little under 60 million US dollars,","speaker":null,"is_sponsor":0},{"start_s":1090.48,"end_s":1094.08,"text":"and that came from a combination of the Digital Research Alliance of Canada,","speaker":null,"is_sponsor":0},{"start_s":1094.08,"end_s":1099.44,"text":"BCKDF, and vendor-in-kind contributions, which I just learned are a vendor giving","speaker":null,"is_sponsor":0},{"start_s":1099.44,"end_s":1103.68,"text":"significant discounts. How do I get signed up for that program? Is that only for educational","speaker":null,"is_sponsor":0},{"start_s":1103.68,"end_s":1109.04,"text":"institutions? Anyway, they did ask us to shout out a couple of companies who helped them out.","speaker":null,"is_sponsor":0},{"start_s":1109.04,"end_s":1115.04,"text":"Lenovo, DDN, and Vertiv on the cooling side. And they didn't ask us to shout these guys out,","speaker":null,"is_sponsor":0},{"start_s":1115.04,"end_s":1119.68,"text":"but we're going to do it anyway. A shout out for our sponsor. If you guys enjoyed this video,","speaker":null,"is_sponsor":0},{"start_s":1119.68,"end_s":1124.0,"text":"why not check out the tour we did of Triumph, the particle accelerator that is just down the road.","speaker":null,"is_sponsor":0},{"start_s":1125.28,"end_s":1131.2,"text":"Down the hill, down the really long road. Canada's only road. It's pretty long.","speaker":null,"is_sponsor":0}],"full_text":"165,000 CPU cores, 20 million dollars of GPU, and a cool tetebite of RAM. I wouldn't normally describe myself as a furry, but the new Fur Supercomputer has definitely awakened some feelings that I can't say I've ever felt before. Feelings like wanting to go deep inside it to gently remove its panels and maybe some light screwing? And thanks to our friends here at Simon Fraser University in beautiful British Columbia, we're going to be doing just that, going deep under the hood of the CPU and GPU compute deployment that is going to be serving tens of thousands of scientists and researchers in fields all the way from AI to zoology all over the country for years to come. This will be our first up close look at a real world deployment that uses direct dye liquid cooling to increase cooling efficiency from about 30 percent to over 90 percent. Or at least it'll be the first data center grade deployment. Mine doesn't count, and it doesn't look nearly as sexy. But what is sexy is this segue to our sponsor. In the row behind me is 640 NVIDIA H180 Gigabyte GPUs each with an estimated cost of around 31,000 US dollars. Even at less than half of the maximum density, just 20 nodes per rack, the team here had to reroute power from elsewhere in the building and significantly upgrade the building's cooling system just to accommodate the incredible power requirements of these NVIDIA hoppers. This is actually a common theme that I hear from basically anyone in the data center space. I mean we tried to build for the future, but we couldn't have possibly seen this coming, and there's no sign of things slowing down. We'll get to that later though. First, the most exciting part of the tour. They pulled one of their spares out of the rack for us to crack open and get up close and personal width, and oh my god, look at this thing. It's heavy. I guess when you got this much hopper in you like, wow, it's kind of scary handling it. I mean this one you know to loan is worth more than my part, and a rack of these is worth more than my house. It's a little sketchy, but I want you guys to be able to see it. The CPUs are Epic Genoa, so last generation Zen 4 based, but Genoa still supports up to 12 channel DDR5 memory and 128 lanes of PCIe Gen5, which is plenty to keep these GPU cores fed. If more CPU compute is needed, clearly there is support for dual CPU sockets, but the team at SFU found that 8 CPU cores per GPU was plenty for their purposes, and they opted for a single 48 core CPU and 1.152 terabytes of RAM in each of their nodes. Now for a closer look at the GPUs. Unfortunately, I'm not allowed to take the coolers off them, but under each of these 4 cold plates is an NVIDIA H100 SXM5 80 gig GPU giving us a total of 320 gigabytes of VRAM per node. And guys, that's not just any VRAM, that is HBM3 running on a 5120 bit bus for a total bandwidth per GPU of 3.36 terabytes per second. For context, a top of the line consumer card, the RTX 5090, achieves just over half of that bandwidth. This kind of power does come with drawbacks however, like for example heat. Each of these is rated for 700 watts of power consumption through the SXM socket that's underneath them. And that is where the incredible cooling solution in this Lenovo node comes in. As a liquid cooling nerd, I gotta say guys, this is the coolest part for me. I mean, did you notice that there isn't a single fan in sight anywhere in this machine? That is because everything, CPUs, GPUs, VRMs, network interface, SSD caddy, even the system memory is directly liquid cooled. All of it. This feels a little bit like doing the maze and highlights magazine. So here's our inlet over here, which splits into two main loops that go through the system. The primary loop, which we can tell because it has a thicker pipe coming off of it, goes straight to the middle of our four GPUs, where this manifold splits fresh incoming water out to our four GPUs. Two of them just fit right back into the outlet here, while the other two run up to this networking board and then consolidate back to the outlet. That's our primary loop. Our secondary loop comes through here, handling some of the power delivery, and then carries over to interesting. It splits out doing the RAM next. I am not 100% sure what to make of that because I would think RAM would be a tertiary priority in terms of cooling. But that's what they've done. We go through the RAM, splitting into three different tubes that sit between our dims down both rows. Then one side handles this network caddy here and the other side handles our SSD caddy. Then each of those come back to one of the CPUs, which come out into the middle here and then run back to the outlet here. Not maybe the way I would have laid it out. There's a lot of 90 degree turns in here, meaning a lot of restriction, but I'm sure the engineers at Lenovo know what they're doing. There's a ton of other cool stuff to unpack here too. You probably noticed there's no power supply. That's because it uses these shonky connectors here at the back to plug into a backplane in the back of the rack. As for the cooling connections, well, according to the manufacturer, these do have a little bit of natural leakage, but it's on the order of molecules, which is pretty damn impressive. There are sensors all over the motherboard to detect any kind of leakage, and there is grounding throughout the system in the form of these little copper, what look like solder wick pieces. That's to prevent what's called stray current corrosion, which can be caused by a current that's accidentally induced in the coolant, which can lead to massive corrosion, which can lead to leaks, which I know from experience. Now, the team here wasn't sure about the exact chemistry of the coolant they're using, but they did tell me that it has antimicrobial properties to prevent anything from growing in the loop. There's some other fun stuff. There's a little stylus in here. Apparently, this is meant to assist in removing memory, which is great. I'd actually love to see more gaming motherboards come with that. I thought this lone 7.68 terabyte NVMe drive was interesting too. I mean, the networking is 400 gigabit per second times two to the two petabytes of NVMe storage, not to mention 49 petabytes of spinning rust that's right over there, but according to the team here, occasionally they need no local storage to improve GPU performance a little bit. So you'd never boot off of this or anything, but it's nice to have there as a scratch. Also, the button cell in here is mounted in a vertical caddy because the density is so high in this one, you know, that they just couldn't give up the space that it would have taken to mount it parallel to the board. I also spotted a micro SD header. If anyone out there works in the data center and knows what that's for, I haven't seen it before, and Gem and I just assumed that I typoed. Oh, there was one other thing we wanted to look at, these big power bad boys. We couldn't see them until we got that shroud off. So these, they're just bus bars. They're going from power supply here, which is a DC to DC power supply. And they're going over to our GPUs. Damn. What's interesting to me that I just noticed is that there's a clear delineation between the NVIDIA engineered parts of this with the Black PCB and they're completely separate from everything else and the Lenovo engineered parts of this. So Lenovo is acting like more of a system integrator around this compute block here. Like you can even see the silk screening on the PCB is distinctly NVIDIA and Lenovo is just doing their DC to DC power. So it's just power in here and then PCIe in here in the form of these four MCIO connectors right here. This is essentially like plugging a GPU into your Legion gaming PC. The GPU house. With extra steps. Yeah. Before we poke around in one of the 192 core CPU nodes that they've got, let's take a look at one of the racks that these boys slide into. They're still using a very similar rear door chilled liquid rack like we saw with their air cooled nodes when we did a tour over eight years ago. Anywho, the point is that chilled 16 and a half degree cooling comes from the evaporative cooling towers outside. Then hot air from the power supplies and any of the networking equipment that's in the rack runs through here and wow is that ever hot. Then it spits out nice comfortable room temperature air on the other side. Each of these racks is fed by dual three phase 60 amp feeds for a total of about 70,000 watts per rack. Now if SFU had the power and cooling in this 1960s bunker, they could juice these up to 180,000 watts per rack, but they don't. Hence the empty rack space. Since we have this open. Oh wow. That is a big difference between the cold side and the hot side going into the back of these back planes for the servers. I don't have to ask which one's the supply. That's the cold side, which since we're on the subject, this is a perfect time to look at the cooling distribution system. This is the Lieber XTU from Virtus. It can do 600,000 watts of cooling capacity per one of these cooling distribution units or CDUs. Water comes in the supply side here. This thick boy. Ha, she's chilly. That's coming from the cooling towers outside. Then that runs all the way down to the bottom here to the heat exchanger in the front. This liquid to liquid heat exchanger does exactly what it says on the tin. Taking that cold water from the primary leak that goes outside and using it to chill the warm water that is coming directly off of the blocks that are going to our nodes. This unit uses dual redundant pumps and if we go back or under the back uses these manifolds and valves to control flow to up to six different racks. And it's very easy to tell which is the cold side that's being chilled in, which is the hot side here. Wow. I want one. Vaughn, can I have one? Probably the coolest part is this little touchscreen display on the front that much more succinctly illustrates what I just said. Here's your primary loop, here's your secondary loop, here's all your flow rates, all your temperatures, and here's an alarm that they assure me is totally fine. This data is super important because if they accidentally add water that is too cool going into the servers behind me, then they could end up a condensation, which hopefully I don't have to explain why that's super, super bad. Everything's hooked up using aqua-therm tubing from Germany. The admins here spoke with some other facilities that used stainless steel and one of them got rust in their cooling system. It was a big, big mess. They've been really, really happy with their aqua-therm. Now let's go check out the CPU mode. Contrary to what NVIDIA would like everyone to believe, not everything runs best on a GPU even today and that's where these come in. Each of these 1U racks contains two nodes and each node contains 192 Zen 5 epic Turing cores with 768 gigs of memory. So that's a total of nearly 400 cores in each of these 1Us. Holy freaking... For networking, they actually don't go as heavy on these using 200 gig connections and NDR to dynamically share that 200 gigabit link between the two nodes depending on their needs. This approach does have the drawback of meaning that if the primary node goes down, we lose network connection to the secondary one, but I have to assume that the cost savings outweigh the disadvantages in this case. In terms of loop layout, this one is much simpler coming in to both sides and then out of both sides, but just like the GPU nodes, the goal here is to get a water tube up against pretty much anything in the server that generates heat because there are no fans whatsoever. One cool thing we missed on the GPU node was we never got a look under the little cooling plates that the SSDs and network cards sit on. So here's what it looks like. It pretty much looks like a heat pipe, except instead of being full of what is usually a vapor and sometimes a liquid that circulates just within itself, it's just full of water or other coolant that will circulate into an external system. Now let's take a look at the racks that these live in. Each of these racks contains 72 of the nodes that I just showed you guys, top to freak and bottom with roughly 13,824 cores. Each polyrack is an island with a non-blocking 800 gig connection between islands, so 41,000 cores can represent a single job with no blocking. They have some other specialized nodes like the storage ones, including the ones on the other side of the aisle that hold data for our local particle collider. Try them. We've got a whole video about that, along with some eight terabyte RAM nodes, which are, I think, pretty self-explanatory. They're for jobs that would overflow on a regular node, as long as you don't mind them having a few bugs. And finally, a single AMD MI300X node, too. I don't know what, keeping video on the toes or we could do it. We could buy more than one of these. You better not charge too much, especially when you factor in modern security needs. There are six zones of security to get to some of the cages that actually have biometric locks on them, where not only do you need to know the pin code, but you have to put your hand under it and it will check if that hand is attached to a living person. There's cameras everywhere in the data center with full visibility in all directions, and our tour guide today actually said that there's someone monitoring them so often that it's become a bit of a game where they'll send non-flattering pictures of him moving around in the data center just to make sure he knows they're watching. Cooling everything are these evaporative cooling towers behind me. The three that are closest to the building, they were there the last time we were here, but I couldn't show them to you for reasons that involve red tape and approvals. So here they are. I still can't get any closer to them for reasons that involve red tape and approvals, but hey, we can check out the acoustic damping that's on these ones on the other side. That's impressive. Here I am right at the intake next to these sound baffles. And for context, here's the untreated ones. The total cooling capacity is about 4.7 megawatts, which that is way more than what's needed for the machines inside. But just like power and storage, you want to have some extra for resiliency in the event of an equipment failure. Hey, what's one of these worth? Maybe I'll pick one up for the office. $1.2 million each. Oh, never mind. Frankly, I'd rather have one of these anyway. To augment the original pumps for cedar, which could do 800 gallons per minute of cooling, they added these two new ones that do 1500 gallons per minute. They also now have two mechanical chillers, which can be useful during the times of year that we get outside temperatures above 33 Celsius. So for maybe six hours a day, they'll switch over to mechanical chilling to help out their evaporative cooling tower. Probably the coolest thing about this gear, though, is how smart it is. They've got telemetry capture for things like temperature and flow rates, and it all feeds into a third party called Kaizen that helps with logging and determining if something's gone wrong with the system. Fun fact, by the way, the two-foot thick concrete floor that I'm standing on is so burdened by all of this heavy equipment and coolant that it actually deflects half an inch in the center. Is that okay? Am I going to break it? I mean, the place used to be the power distribution center for the southern half of our province, and it's built like a bunker. But that's not enough. Who paid for it all? FUR had a total budget of about $82 million, which, oh, I assume, is Canadian ruble deduce. So a little under 60 million US dollars, and that came from a combination of the Digital Research Alliance of Canada, BCKDF, and vendor-in-kind contributions, which I just learned are a vendor giving significant discounts. How do I get signed up for that program? Is that only for educational institutions? Anyway, they did ask us to shout out a couple of companies who helped them out. Lenovo, DDN, and Vertiv on the cooling side. And they didn't ask us to shout these guys out, but we're going to do it anyway. A shout out for our sponsor. If you guys enjoyed this video, why not check out the tour we did of Triumph, the particle accelerator that is just down the road. Down the hill, down the really long road. Canada's only road. It's pretty long."}