{"video_id":"fp_1D6yeGXM3N","title":"This World Record took YEARS (and a Million dollars..) - PI World Record (SPONSORED)","channel":"Linus Tech Tips","show":"Linus Tech Tips","published_at":"2025-05-16T16:43:00.018Z","duration_s":1155,"segments":[{"start_s":0.0,"end_s":6.2,"text":"When I was a young tinkerer, one of my favorite tools was super pi, a benchmark and stability test","speaker":null,"is_sponsor":0},{"start_s":6.2,"end_s":10.44,"text":"that calculates pi. No, not this kind of pi.","speaker":null,"is_sponsor":0},{"start_s":10.44,"end_s":15.44,"text":"But rather, the mathematical constant that describes the ratio between a circle's circumference","speaker":null,"is_sponsor":0},{"start_s":15.44,"end_s":19.84,"text":"and its diameter. As far as we know, pi is an irrational number,","speaker":null,"is_sponsor":0},{"start_s":19.84,"end_s":22.88,"text":"meaning that there is an infinite number of digits","speaker":null,"is_sponsor":0},{"start_s":22.88,"end_s":26.96,"text":"after the decimal place. And with my overclocked athelon,","speaker":null,"is_sponsor":0},{"start_s":26.96,"end_s":30.32,"text":"I could calculate 32 million of those digits","speaker":null,"is_sponsor":0},{"start_s":30.32,"end_s":33.32,"text":"in a half an hour or so. Girls dug it.","speaker":null,"is_sponsor":0},{"start_s":33.32,"end_s":38.0,"text":"But I dreamed of more. I wanted to calculate more digits of pi","speaker":null,"is_sponsor":0},{"start_s":38.0,"end_s":41.4,"text":"than any man had ever calculated before. And why shouldn't I?","speaker":null,"is_sponsor":0},{"start_s":41.4,"end_s":46.72,"text":"Online is motherfucking tech tips. So the current record, set by Jordan Reines last year,","speaker":null,"is_sponsor":0},{"start_s":46.72,"end_s":49.76,"text":"is 202 trillion digits. Trillion?","speaker":null,"is_sponsor":0},{"start_s":49.76,"end_s":53.36,"text":"Yeah, you say. That's a lot of athelons. It's more than that.","speaker":null,"is_sponsor":0},{"start_s":53.36,"end_s":56.52,"text":"See, at a certain point, it's not even just the computation","speaker":null,"is_sponsor":0},{"start_s":56.56,"end_s":60.0,"text":"that becomes the problem. But rather, it's the storage.","speaker":null,"is_sponsor":0},{"start_s":60.0,"end_s":63.76,"text":"Fortunately for us, both have gotten a little beefier.","speaker":null,"is_sponsor":0},{"start_s":63.76,"end_s":68.32,"text":"And thanks to our friends over at Kyoksia, who sponsored this entire project,","speaker":null,"is_sponsor":0},{"start_s":68.32,"end_s":72.84,"text":"providing over two petabytes of Gen 4 NVMe storage,","speaker":null,"is_sponsor":0},{"start_s":72.84,"end_s":80.64,"text":"we were able to smash that record, calculating nearly 100 trillion more digits.","speaker":null,"is_sponsor":0},{"start_s":81.12,"end_s":84.48,"text":"The process of getting this was an absolute cluster.","speaker":null,"is_sponsor":0},{"start_s":84.48,"end_s":89.6,"text":"But hey, that's content, baby. And here it is, our verified Guinness World Record","speaker":null,"is_sponsor":0},{"start_s":89.6,"end_s":94.2,"text":"for a calculating pi to an astonishing 300 trillion digits.","speaker":null,"is_sponsor":0},{"start_s":94.2,"end_s":97.84,"text":"Holy shit. Let's talk about how we got here.","speaker":null,"is_sponsor":0},{"start_s":97.84,"end_s":102.84,"text":"The scale of 300 trillion digits into perspective.","speaker":null,"is_sponsor":0},{"start_s":110.48,"end_s":113.84,"text":"At size four aerial font, which is still readable,","speaker":null,"is_sponsor":0},{"start_s":113.84,"end_s":118.52,"text":"but pushing the limits, you can fit around 25 and a half thousand digits","speaker":null,"is_sponsor":0},{"start_s":118.52,"end_s":123.8,"text":"on a normal sheet of paper. That means that my childhood dreams of 32 million digits","speaker":null,"is_sponsor":0},{"start_s":123.8,"end_s":128.24,"text":"could fit on around 1,250 sheets of paper.","speaker":null,"is_sponsor":0},{"start_s":128.24,"end_s":133.56,"text":"But if we wanted to print out the number of digits that we just calculated.","speaker":null,"is_sponsor":0},{"start_s":133.56,"end_s":139.48,"text":"We should totally do that. No, why not? Because it would be literally billions of pages.","speaker":null,"is_sponsor":0},{"start_s":139.48,"end_s":143.4,"text":"It's only paper, Linus. What could it cost, $10?","speaker":null,"is_sponsor":0},{"start_s":144.04,"end_s":148.92,"text":"No. Let's take a look at the hardware.","speaker":null,"is_sponsor":0},{"start_s":148.92,"end_s":154.16,"text":"Yes, my friends. This is the super secret project that we teased","speaker":null,"is_sponsor":0},{"start_s":154.16,"end_s":158.36,"text":"last time we were working on thermal management for the million dollar PC.","speaker":null,"is_sponsor":0},{"start_s":158.36,"end_s":162.44,"text":"You've seen this six server cluster here before,","speaker":null,"is_sponsor":0},{"start_s":162.44,"end_s":165.76,"text":"but while originally it had about a petabyte","speaker":null,"is_sponsor":0},{"start_s":165.76,"end_s":169.56,"text":"of stupid fast, Keopsia Gen 4 NVMe storage,","speaker":null,"is_sponsor":0},{"start_s":169.56,"end_s":172.96,"text":"it has thrown a little bit since then.","speaker":null,"is_sponsor":0},{"start_s":172.96,"end_s":177.04,"text":"The 72 15 terabyte drives that made up the original pool","speaker":null,"is_sponsor":0},{"start_s":177.04,"end_s":181.96,"text":"would have like almost been enough space to reach my original goal of 200 trillion digits.","speaker":null,"is_sponsor":0},{"start_s":181.96,"end_s":185.64,"text":"That was double the standing record at the time set by Emma at Google.","speaker":null,"is_sponsor":0},{"start_s":185.64,"end_s":190.4,"text":"But it wasn't anywhere near enough to beat the 202 trillion digit record","speaker":null,"is_sponsor":0},{"start_s":190.4,"end_s":196.2,"text":"that popped up a few months into the planning and testing for this project. So I had to call in like just a couple favors.","speaker":null,"is_sponsor":0},{"start_s":196.2,"end_s":201.28,"text":"Specifically from our friends over at Gigabyte, who sent this lovely 1U DualSocket Epic chassis,","speaker":null,"is_sponsor":0},{"start_s":201.68,"end_s":205.28,"text":"our 183 from Keopsia, who went to us an older tie-in server","speaker":null,"is_sponsor":0},{"start_s":205.28,"end_s":210.52,"text":"from their test feed lab. And finally from AMD, who provided this tight mate reference platform","speaker":null,"is_sponsor":0},{"start_s":210.52,"end_s":214.52,"text":"for the launch of Epic Genoa. That gave us a total of nine servers.","speaker":null,"is_sponsor":0},{"start_s":214.52,"end_s":218.2,"text":"Why so many servers? I mean, couldn't we just pack all the storage in one?","speaker":null,"is_sponsor":0},{"start_s":218.2,"end_s":222.12,"text":"I mean, yeah, but here's the thing. We already had the million dollar PC,","speaker":null,"is_sponsor":0},{"start_s":222.12,"end_s":226.4,"text":"and those nodes, they're already full. So if we wanted to expand the storage,","speaker":null,"is_sponsor":0},{"start_s":226.4,"end_s":230.68,"text":"we either needed to throw away that existing petabyte we already had,","speaker":null,"is_sponsor":0},{"start_s":230.68,"end_s":233.72,"text":"or we need to expand the cluster with more machines.","speaker":null,"is_sponsor":0},{"start_s":233.72,"end_s":238.64,"text":"It's definitely not the most power or space-efficient way to get two petabytes of NVMe,","speaker":null,"is_sponsor":0},{"start_s":238.64,"end_s":241.72,"text":"but you work with what you got. That's why we ended up with","speaker":null,"is_sponsor":0},{"start_s":241.72,"end_s":246.44,"text":"kind of a mix of drives as well. See, every one of our servers","speaker":null,"is_sponsor":0},{"start_s":246.44,"end_s":249.64,"text":"needs to have the same amount of storage space.","speaker":null,"is_sponsor":0},{"start_s":249.64,"end_s":255.76,"text":"Otherwise, it's lowest common denominator, and you're gonna waste any of the extra capacity","speaker":null,"is_sponsor":0},{"start_s":255.76,"end_s":260.76,"text":"that's in the higher capacity nodes. But the issue is that some of our machines","speaker":null,"is_sponsor":0},{"start_s":260.76,"end_s":266.52,"text":"are expansion slot-challenged. So Kyoksia had to send over a bucket","speaker":null,"is_sponsor":0},{"start_s":266.52,"end_s":273.6,"text":"of the 30 terabyte drives, allowing us to stuff a whopping 245 terabytes","speaker":null,"is_sponsor":0},{"start_s":273.76,"end_s":281.2,"text":"in each of these nine servers, totaling 2.2 petabytes of raw combined Gen4 storage.","speaker":null,"is_sponsor":0},{"start_s":281.4,"end_s":286.32,"text":"So that takes care of the capacity, but if my count is correct,","speaker":null,"is_sponsor":0},{"start_s":286.32,"end_s":290.72,"text":"there's another server here that we haven't mentioned yet, or at least it looks like a server,","speaker":null,"is_sponsor":0},{"start_s":290.72,"end_s":296.4,"text":"but IDK, half the fronts are missing. So those are speed holes, brother.","speaker":null,"is_sponsor":0},{"start_s":296.4,"end_s":300.44,"text":"It's for performance. All right, let's get it out of here and take a look.","speaker":null,"is_sponsor":0},{"start_s":300.44,"end_s":304.44,"text":"No problem to get this removed. I just got to battle the crack in here.","speaker":null,"is_sponsor":0},{"start_s":304.44,"end_s":306.92,"text":"Do you want to mark any of these? Nope, doesn't matter.","speaker":null,"is_sponsor":0},{"start_s":308.92,"end_s":313.32,"text":"This is the compute node, the machine that actually did the crunching.","speaker":null,"is_sponsor":0},{"start_s":313.32,"end_s":319.04,"text":"This Gigabyte R283 Z96 has been through some modifications, let's go.","speaker":null,"is_sponsor":0},{"start_s":319.04,"end_s":323.52,"text":"It started life as a 24-base storage server, actually.","speaker":null,"is_sponsor":0},{"start_s":323.52,"end_s":326.96,"text":"So most of the 160-inch PCIe Gen5 lanes","speaker":null,"is_sponsor":0},{"start_s":326.96,"end_s":330.32,"text":"from its dual 96-core Epic processors","speaker":null,"is_sponsor":0},{"start_s":330.32,"end_s":335.2,"text":"were allocated to storage upfront, but our machine, it doesn't really need storage anymore.","speaker":null,"is_sponsor":0},{"start_s":335.2,"end_s":340.88,"text":"All that storage is in everything else. What it needs now, or needed anyways,","speaker":null,"is_sponsor":0},{"start_s":340.88,"end_s":345.96,"text":"was networking a lot of it. Specifically, four of these NVIDIA Melanox","speaker":null,"is_sponsor":0},{"start_s":345.96,"end_s":349.2,"text":"200 gigabit Kinect X7 network cards.","speaker":null,"is_sponsor":0},{"start_s":349.2,"end_s":354.8,"text":"These things are wild. Oh yeah. They have two of those 200 gigabit ports per card,","speaker":null,"is_sponsor":0},{"start_s":354.8,"end_s":357.84,"text":"and thanks to the insane bandwidth","speaker":null,"is_sponsor":0},{"start_s":357.84,"end_s":362.24,"text":"of their 16X PCIe Gen5 slot, which is good for, I don't know,","speaker":null,"is_sponsor":0},{"start_s":362.24,"end_s":368.32,"text":"64 gigabytes a second both ways, these can saturate both of those ports at the same time.","speaker":null,"is_sponsor":0},{"start_s":368.32,"end_s":373.52,"text":"So if you put them together, that's... It's 1.6 terabits of throughput.","speaker":null,"is_sponsor":0},{"start_s":373.52,"end_s":377.12,"text":"That's a lot of terabits. Yeah, it's actually around 100 gigabytes per second","speaker":null,"is_sponsor":0},{"start_s":377.12,"end_s":380.44,"text":"to each of our 96-core CPUs, which yes,","speaker":null,"is_sponsor":0},{"start_s":380.44,"end_s":385.0,"text":"can actually handle that, thanks to the 24128 gig sticks","speaker":null,"is_sponsor":0},{"start_s":385.0,"end_s":389.64,"text":"of DDR5 ECC that Micron sent over. That's three terabytes of RAM.","speaker":null,"is_sponsor":0},{"start_s":389.64,"end_s":393.16,"text":"The more the better. This is the moreest I could get. As for the storage nodes,","speaker":null,"is_sponsor":0},{"start_s":393.16,"end_s":396.48,"text":"those are using dual Kinect X6 200 gig cards","speaker":null,"is_sponsor":0},{"start_s":396.48,"end_s":401.88,"text":"from the original setup. So each storage server has 400 gig,","speaker":null,"is_sponsor":0},{"start_s":401.88,"end_s":405.72,"text":"or about a dual layer Blu-ray per second.","speaker":null,"is_sponsor":0},{"start_s":405.72,"end_s":409.0,"text":"But how does the whole thing work together? Well, we gotta put it back in the rack before","speaker":null,"is_sponsor":0},{"start_s":409.0,"end_s":413.32,"text":"I can show you that. Yeah, that'll probably help me. Okay, there we go.","speaker":null,"is_sponsor":0},{"start_s":413.32,"end_s":416.98,"text":"All right, it should be back up. With the magic of Weka FS,","speaker":null,"is_sponsor":0},{"start_s":416.98,"end_s":420.6,"text":"which is the same clustered file system that we've been using ever since we first set up","speaker":null,"is_sponsor":0},{"start_s":420.6,"end_s":425.08,"text":"the million dollar PC, we're able to use, all that network speed we just talked about","speaker":null,"is_sponsor":0},{"start_s":425.08,"end_s":430.56,"text":"to run one combined file system off of all nine of our storage servers,","speaker":null,"is_sponsor":0},{"start_s":430.56,"end_s":434.72,"text":"completely transparently to any application, including Y Cruncher,","speaker":null,"is_sponsor":0},{"start_s":434.72,"end_s":439.7,"text":"the application we're using to calculate Pi. When I say the same though, I don't mean the same, same.","speaker":null,"is_sponsor":0},{"start_s":439.7,"end_s":445.6,"text":"I mean, the same old array did work, but that version of Weka was years out of date","speaker":null,"is_sponsor":0},{"start_s":445.6,"end_s":449.12,"text":"and running on an operating system that is now completely end of life.","speaker":null,"is_sponsor":0},{"start_s":449.12,"end_s":454.36,"text":"Luckily for us, the Weka folks helped us nuke the old installs and install their custom image,","speaker":null,"is_sponsor":0},{"start_s":454.36,"end_s":459.08,"text":"which comes with everything pretty much ready to roll. Massive shout out to Josh and Bob, you guys rock.","speaker":null,"is_sponsor":0},{"start_s":459.08,"end_s":462.96,"text":"Thank you for helping us achieve this silly goal. And the rest of the folks at Weka","speaker":null,"is_sponsor":0},{"start_s":462.96,"end_s":467.48,"text":"for allowing us to use your software in a very, very unsupported unconventional way.","speaker":null,"is_sponsor":0},{"start_s":467.48,"end_s":471.24,"text":"Oh, oh yeah. I basically had to trick Weka into thinking","speaker":null,"is_sponsor":0},{"start_s":471.24,"end_s":477.04,"text":"that each server is actually two servers. That way we could use the most space efficient stripe width,","speaker":null,"is_sponsor":0},{"start_s":477.04,"end_s":481.44,"text":"which for Weka means each chunk of data gets split into 16 pieces","speaker":null,"is_sponsor":0},{"start_s":481.44,"end_s":486.76,"text":"with two pieces of parity data calculated. 16 plus two is 18, which is also what nine servers","speaker":null,"is_sponsor":0},{"start_s":486.76,"end_s":492.28,"text":"times two instances of Weka gets you. For reference, in a supported configuration,","speaker":null,"is_sponsor":0},{"start_s":492.28,"end_s":495.28,"text":"we would have needed 19 discrete servers","speaker":null,"is_sponsor":0},{"start_s":495.28,"end_s":501.0,"text":"in order to accomplish this stripe width, including two for parity and one as a hot spare,","speaker":null,"is_sponsor":0},{"start_s":501.0,"end_s":506.6,"text":"which is fantastic for an enterprise environment like where Weka is meant to be deployed,","speaker":null,"is_sponsor":0},{"start_s":506.6,"end_s":509.96,"text":"but very expensive for us. Enough to ever gather.","speaker":null,"is_sponsor":0},{"start_s":509.96,"end_s":512.92,"text":"How fast is it? We haven't even tuned it yet, what do you buy?","speaker":null,"is_sponsor":0},{"start_s":513.32,"end_s":516.88,"text":"Okay, well we can tune it. First we can tune it. The biggest hurdle on the storage side","speaker":null,"is_sponsor":0},{"start_s":516.88,"end_s":521.28,"text":"was finding a way to limit the amount of data that flowed between the two CPUs,","speaker":null,"is_sponsor":0},{"start_s":521.28,"end_s":526.32,"text":"which may be a bit counterintuitive, but as soon as say the left CPU wants to send data","speaker":null,"is_sponsor":0},{"start_s":526.32,"end_s":529.36,"text":"via the network cards that are connected to the right CPU,","speaker":null,"is_sponsor":0},{"start_s":529.36,"end_s":534.44,"text":"that's a ton of latency. That's a lot of hops. And on top of that, there's a limited amount of bandwidth.","speaker":null,"is_sponsor":0},{"start_s":534.44,"end_s":538.32,"text":"And since this is such a memory intensive calculation, that's why we need so much RAM,","speaker":null,"is_sponsor":0},{"start_s":538.32,"end_s":542.64,"text":"and that's why we need all this storage, we don't wanna waste memory bandwidth.","speaker":null,"is_sponsor":0},{"start_s":542.68,"end_s":547.6,"text":"So we set up two Weka client containers, which is just their application that runs on a computer","speaker":null,"is_sponsor":0},{"start_s":547.6,"end_s":551.76,"text":"and allows you to access the storage. Each of those containers got 12 cores assigned to it,","speaker":null,"is_sponsor":0},{"start_s":551.76,"end_s":554.76,"text":"one per chiplet on our giant CPUs.","speaker":null,"is_sponsor":0},{"start_s":554.76,"end_s":559.24,"text":"So we can maximize the turbo speed? No, actually the reason for that is the cache.","speaker":null,"is_sponsor":0},{"start_s":559.24,"end_s":564.12,"text":"So those are 3DV cache CPUs. That gives us a certain amount of cache per chiplet,","speaker":null,"is_sponsor":0},{"start_s":564.12,"end_s":568.08,"text":"and we didn't want the buffers of Y Cruncher, which is like the amount of space it uses","speaker":null,"is_sponsor":0},{"start_s":568.08,"end_s":571.64,"text":"to like hold stuff in flight to spill out of that cache.","speaker":null,"is_sponsor":0},{"start_s":571.64,"end_s":576.44,"text":"Because as soon as you do, now it's in memory, more memory copies, more wasted bandwidth.","speaker":null,"is_sponsor":0},{"start_s":576.44,"end_s":582.4,"text":"And I tested a lot, which we'll get into a bit. But first, why don't we look at how fast it goes?","speaker":null,"is_sponsor":0},{"start_s":582.4,"end_s":587.52,"text":"Final setup, underscore final, underscore for real. These are just scripts to like make the Weka containers.","speaker":null,"is_sponsor":0},{"start_s":587.52,"end_s":591.2,"text":"Look how the cores, those ones are at 100% usage, those individual ones,","speaker":null,"is_sponsor":0},{"start_s":591.2,"end_s":597.04,"text":"those are all Weka IO cores basically. It's a lot of compute that needs to be reserved.","speaker":null,"is_sponsor":0},{"start_s":597.04,"end_s":600.4,"text":"But when you're talking like 100 plus gigabytes a second,","speaker":null,"is_sponsor":0},{"start_s":600.44,"end_s":603.48,"text":"which theoretically we are, but you haven't actually shown me that yet.","speaker":null,"is_sponsor":0},{"start_s":603.48,"end_s":607.8,"text":"Here's our little script. The interesting thing about those cores being used","speaker":null,"is_sponsor":0},{"start_s":607.8,"end_s":611.2,"text":"is that while you can just run an app and hope that it ignores them,","speaker":null,"is_sponsor":0},{"start_s":611.2,"end_s":615.52,"text":"the Linux scheduler, not always the best for that. So there's this command called task set,","speaker":null,"is_sponsor":0},{"start_s":615.52,"end_s":619.28,"text":"which allows you to like map whatever command or application you're running","speaker":null,"is_sponsor":0},{"start_s":619.28,"end_s":623.0,"text":"to only run on specific cores. Core one is a Weka core, we're skipping that one.","speaker":null,"is_sponsor":0},{"start_s":623.0,"end_s":626.64,"text":"Core nine, we're skipping that one. And then this is running two separate tasks,","speaker":null,"is_sponsor":0},{"start_s":626.64,"end_s":630.24,"text":"one for each of our mounts. And you can see it only has the CPU cores","speaker":null,"is_sponsor":0},{"start_s":630.24,"end_s":634.28,"text":"from CPU one or CPU two, dependent on the map folder we're using.","speaker":null,"is_sponsor":0},{"start_s":634.28,"end_s":637.64,"text":"Let me run over to Weka. Cute little dashboard.","speaker":null,"is_sponsor":0},{"start_s":637.64,"end_s":640.8,"text":"Woo! It's not a hundred, but it is pretty nice.","speaker":null,"is_sponsor":0},{"start_s":640.8,"end_s":644.16,"text":"That's writing. This is a write. That's right. That's just setting.","speaker":null,"is_sponsor":0},{"start_s":644.16,"end_s":646.64,"text":"You said you were doing read. It is a retest, but it's setting up the files.","speaker":null,"is_sponsor":0},{"start_s":648.24,"end_s":653.96,"text":"I was telling Jake as we were working on the review for this script, I was like, man, I've gotten kind of numb to these numbers.","speaker":null,"is_sponsor":0},{"start_s":653.96,"end_s":657.72,"text":"You know, after all the iterations of one, it can all that. You know, a hundred gigabytes a second,","speaker":null,"is_sponsor":0},{"start_s":658.44,"end_s":661.48,"text":"this is over the network. It never gets old.","speaker":null,"is_sponsor":0},{"start_s":661.48,"end_s":666.32,"text":"Actually, you know, you're numb to the numbers until the numbers you're looking at are like 200 gigabytes a second or something.","speaker":null,"is_sponsor":0},{"start_s":666.32,"end_s":670.32,"text":"But like, no, but dude, like the first time we cracked a hundred gigabytes a second.","speaker":null,"is_sponsor":0},{"start_s":670.32,"end_s":675.04,"text":"It was all installed locally. And that was no file system, all local.","speaker":null,"is_sponsor":0},{"start_s":675.04,"end_s":679.2,"text":"This is over a network. With a file system. With a functioning file system.","speaker":null,"is_sponsor":0},{"start_s":679.2,"end_s":682.44,"text":"Real ass actual copying data. Yeah.","speaker":null,"is_sponsor":0},{"start_s":682.44,"end_s":687.28,"text":"That's crazy. It is crazy. And look at the read. The latency is two milliseconds.","speaker":null,"is_sponsor":0},{"start_s":687.28,"end_s":691.12,"text":"It's because I'm like oversaturating this. So down here, you see the front end usage.","speaker":null,"is_sponsor":0},{"start_s":691.12,"end_s":694.12,"text":"That's the cores on this machine. They're being utilized a hundred percent.","speaker":null,"is_sponsor":0},{"start_s":694.12,"end_s":699.12,"text":"I have the system set to have four NUMA nodes per CPU because that made Y Cruncher a little bit happier.","speaker":null,"is_sponsor":0},{"start_s":699.12,"end_s":705.28,"text":"If I turn that off and do one NUMA node per socket, I was able to get this up to like 150 gigabytes a second.","speaker":null,"is_sponsor":0},{"start_s":705.28,"end_s":710.4,"text":"At the time, I actually set the record for the fastest single client usage.","speaker":null,"is_sponsor":0},{"start_s":710.4,"end_s":714.68,"text":"According to the WECA guys, they since have broken that with like GPU direct storage or whatever.","speaker":null,"is_sponsor":0},{"start_s":714.68,"end_s":718.36,"text":"But this isn't even with RDMA. This is just good code. Built for NVMe.","speaker":null,"is_sponsor":0},{"start_s":718.36,"end_s":722.16,"text":"It's also good SSDs. Oh brother. Yeah. Look at this.","speaker":null,"is_sponsor":0},{"start_s":722.16,"end_s":725.56,"text":"The average usage of the drives in the array right now is 23%.","speaker":null,"is_sponsor":0},{"start_s":725.56,"end_s":729.4,"text":"Wait, nothing. Shout out Kyokesia. We're running a mix of their CD","speaker":null,"is_sponsor":0},{"start_s":729.4,"end_s":733.12,"text":"and CM series Gen 4 drives. These things are super fast","speaker":null,"is_sponsor":0},{"start_s":733.12,"end_s":738.76,"text":"with individual drive read speeds that are in excess of five gigabytes per second.","speaker":null,"is_sponsor":0},{"start_s":738.76,"end_s":745.12,"text":"And that's not even the fastest they have. You step up to their Gen 5 drives and you're talking like 12, 13, 14 gigabytes a second.","speaker":null,"is_sponsor":0},{"start_s":745.12,"end_s":749.52,"text":"They're available in self-encrypting SKUs. They have Dyke failure recovery, power loss protection.","speaker":null,"is_sponsor":0},{"start_s":749.52,"end_s":752.64,"text":"They're perfect for your next server or data center deployment.","speaker":null,"is_sponsor":0},{"start_s":752.64,"end_s":758.12,"text":"Yeah. And this entire time running this application, I didn't have a single drive, had a single issue.","speaker":null,"is_sponsor":0},{"start_s":758.12,"end_s":762.04,"text":"You got a spreadsheet for tuning Y Cruncher? Dude, dude.","speaker":null,"is_sponsor":0},{"start_s":762.04,"end_s":765.68,"text":"When he adjust the glasses, you know, getting real. Okay. Here's Y Cruncher.","speaker":null,"is_sponsor":0},{"start_s":765.68,"end_s":769.2,"text":"Let's just do a normal pie run. 32 million, 25.","speaker":null,"is_sponsor":0},{"start_s":769.2,"end_s":773.88,"text":"Let's go. So this would have taken about half an hour on my old past one.","speaker":null,"is_sponsor":0},{"start_s":773.88,"end_s":777.6,"text":"It took 0.2 seconds to compute. Really? Yeah.","speaker":null,"is_sponsor":0},{"start_s":777.6,"end_s":781.08,"text":"What? For the uninitiated, Y Cruncher is the software we use to do this run.","speaker":null,"is_sponsor":0},{"start_s":781.08,"end_s":785.76,"text":"It was developed by a guy named Alexander Yee. Super nice guy, helped us do some messing about.","speaker":null,"is_sponsor":0},{"start_s":785.76,"end_s":790.0,"text":"Also, didn't help me that much. Honestly, I asked a lot of questions he didn't answer,","speaker":null,"is_sponsor":0},{"start_s":790.0,"end_s":793.88,"text":"but I figured it out anyways, I guess. To be clear, the storage, we've talked about,","speaker":null,"is_sponsor":0},{"start_s":793.88,"end_s":798.52,"text":"oh man, we need a lot of storage. It's because Y Cruncher uses the storage like RAM","speaker":null,"is_sponsor":0},{"start_s":798.52,"end_s":803.44,"text":"because the output of digits, like that 300 trillion, is only about 120 terabytes compressed.","speaker":null,"is_sponsor":0},{"start_s":803.44,"end_s":807.36,"text":"Wow, it's not that much. Could we make that available to people? Oh, God.","speaker":null,"is_sponsor":0},{"start_s":807.36,"end_s":811.0,"text":"I don't wanna think about that. We'll try. Maybe we'll do a torrent or something. Oh, God.","speaker":null,"is_sponsor":0},{"start_s":811.0,"end_s":814.36,"text":"But when you're setting up for a run, it actually tells you how much storage you need","speaker":null,"is_sponsor":0},{"start_s":814.36,"end_s":817.88,"text":"for this swap space, which is just basically like RAM plus.","speaker":null,"is_sponsor":0},{"start_s":817.88,"end_s":822.16,"text":"That's slower. That's what it's using it for. For us, it was like, yeah, we're probably gonna use","speaker":null,"is_sponsor":0},{"start_s":822.16,"end_s":825.6,"text":"like a 1.5 petabytes of space at peak.","speaker":null,"is_sponsor":0},{"start_s":825.6,"end_s":828.64,"text":"It's pretty crazy. Okay. But what did you tune, Jake?","speaker":null,"is_sponsor":0},{"start_s":828.64,"end_s":832.6,"text":"You wanna see the tuning? Oh boy. So this is like some of the tests I did.","speaker":null,"is_sponsor":0},{"start_s":832.6,"end_s":836.6,"text":"So why Cruncher was built for direct attached storage?","speaker":null,"is_sponsor":0},{"start_s":836.6,"end_s":841.72,"text":"And in fact, it doesn't even want you to use like a RAID controller or software RAID.","speaker":null,"is_sponsor":0},{"start_s":841.72,"end_s":845.28,"text":"It does its own internal RAID. And then on top of that,","speaker":null,"is_sponsor":0},{"start_s":845.28,"end_s":850.56,"text":"it also has things you can tune like, what multi-threading algorithm do you use?","speaker":null,"is_sponsor":0},{"start_s":850.56,"end_s":854.36,"text":"And like how many threads? And what size are your IO buffers?","speaker":null,"is_sponsor":0},{"start_s":854.36,"end_s":859.46,"text":"How much memory? How much memory and how many bytes can we read per seek?","speaker":null,"is_sponsor":0},{"start_s":859.46,"end_s":862.88,"text":"Ideally, because if you're using hard drives or SSDs, it's different.","speaker":null,"is_sponsor":0},{"start_s":862.88,"end_s":867.16,"text":"Got it. It was built in an older time and the code base is huge.","speaker":null,"is_sponsor":0},{"start_s":867.16,"end_s":870.24,"text":"And Alex just does it in his spare time as far as I'm aware.","speaker":null,"is_sponsor":0},{"start_s":870.24,"end_s":874.04,"text":"So no shade, super cool project. But at some point in the future,","speaker":null,"is_sponsor":0},{"start_s":874.04,"end_s":877.76,"text":"technology has gotten good enough that we can just rely on the operating system to do this.","speaker":null,"is_sponsor":0},{"start_s":877.76,"end_s":881.4,"text":"Like that Weka speed test we just did. Let's hope for that. One day.","speaker":null,"is_sponsor":0},{"start_s":881.4,"end_s":885.0,"text":"Anyway, with everything dialed in on August 1st, 2024.","speaker":null,"is_sponsor":0},{"start_s":885.0,"end_s":889.76,"text":"Yes. It was a while ago. Yes. Jake finally hit enter on his command prompt","speaker":null,"is_sponsor":0},{"start_s":889.76,"end_s":893.2,"text":"and began our glorious journey to nerd glory.","speaker":null,"is_sponsor":0},{"start_s":893.2,"end_s":897.08,"text":"Yeah, for 12 days. And then it stopped thanks to a multi-day power outage","speaker":null,"is_sponsor":0},{"start_s":897.08,"end_s":900.68,"text":"while I was on vacation. And it was so early in the process that I said,","speaker":null,"is_sponsor":0},{"start_s":900.68,"end_s":906.12,"text":"f*** that s***. Let's just start it again. I want to get a clean run with no outages.","speaker":null,"is_sponsor":0},{"start_s":906.12,"end_s":909.48,"text":"But it was smooth sailing from then on.","speaker":null,"is_sponsor":0},{"start_s":909.48,"end_s":913.52,"text":"No, it wasn't. See, even with a cluster this chonk,","speaker":null,"is_sponsor":0},{"start_s":913.52,"end_s":916.58,"text":"calculations like this take a lot of time.","speaker":null,"is_sponsor":0},{"start_s":916.58,"end_s":921.32,"text":"The previous 202 trillion digit record took a hundred days just to compute.","speaker":null,"is_sponsor":0},{"start_s":921.32,"end_s":924.36,"text":"And whether it's bad luck or user error.","speaker":null,"is_sponsor":0},{"start_s":924.36,"end_s":927.72,"text":"I think there's a little bit of user error. Finding a space in our facilities","speaker":null,"is_sponsor":0},{"start_s":927.72,"end_s":930.96,"text":"where a machine like that can operate completely uninterrupted.","speaker":null,"is_sponsor":0},{"start_s":930.96,"end_s":933.96,"text":"I have no idea what I just done plugged.","speaker":null,"is_sponsor":0},{"start_s":933.96,"end_s":937.88,"text":"How's your edit going? I'm holding your server. What's the challenge?","speaker":null,"is_sponsor":0},{"start_s":937.88,"end_s":943.2,"text":"At first things were pretty okay in the lab server room here. We had our air conditioning working to keep things cool.","speaker":null,"is_sponsor":0},{"start_s":943.2,"end_s":946.2,"text":"We had our battery backup to keep the digits flowing","speaker":null,"is_sponsor":0},{"start_s":946.2,"end_s":950.36,"text":"during a short outage or a brownout. It's just that over the course of this run,","speaker":null,"is_sponsor":0},{"start_s":950.36,"end_s":954.16,"text":"we had multiple other power outages and none of them were small.","speaker":null,"is_sponsor":0},{"start_s":954.16,"end_s":957.72,"text":"So each time our calculation had to stop and restart.","speaker":null,"is_sponsor":0},{"start_s":957.72,"end_s":961.6,"text":"And the same goes for when the cooling failed multiple times.","speaker":null,"is_sponsor":0},{"start_s":961.6,"end_s":966.04,"text":"It's pretty mid now though. The AC is fixed and that with our sick water door","speaker":null,"is_sponsor":0},{"start_s":966.04,"end_s":970.48,"text":"that is definitely not going to leak has room at around 22, 23 degrees.","speaker":null,"is_sponsor":0},{"start_s":970.48,"end_s":974.44,"text":"And the cluster is still running. Fortunately, Y Cruncher makes checkpoints","speaker":null,"is_sponsor":0},{"start_s":974.44,"end_s":980.34,"text":"which allows resuming the calculation. But it does mean our record could have been done much faster.","speaker":null,"is_sponsor":0},{"start_s":980.34,"end_s":986.06,"text":"Like based on the log somewhere in the neighborhood of like 30, 40, 50 days faster.","speaker":null,"is_sponsor":0},{"start_s":986.06,"end_s":989.1,"text":"The 300 trillionth digit of pie is five.","speaker":null,"is_sponsor":0},{"start_s":989.1,"end_s":993.94,"text":"Really? Ha ha ha ha ha. It's done baby.","speaker":null,"is_sponsor":0},{"start_s":993.94,"end_s":998.58,"text":"Wow. It only took way longer than it should have.","speaker":null,"is_sponsor":0},{"start_s":998.58,"end_s":1003.62,"text":"190 days. Speaking of the logs, Jake's got them here right now.","speaker":null,"is_sponsor":0},{"start_s":1003.62,"end_s":1008.42,"text":"100 gigabytes a second read and then you're writing for like 30, 40 gigabytes a second.","speaker":null,"is_sponsor":0},{"start_s":1008.42,"end_s":1012.66,"text":"So that's as it's, what? Pulling in data from the swap space","speaker":null,"is_sponsor":0},{"start_s":1012.66,"end_s":1016.7,"text":"which is our NVMe drives and bringing it into the three terabytes of RAM","speaker":null,"is_sponsor":0},{"start_s":1016.7,"end_s":1020.06,"text":"that are in the system. So the crunch, crunch, crunch, crunch, crunch, crunch, crunch","speaker":null,"is_sponsor":0},{"start_s":1020.06,"end_s":1023.82,"text":"and huck it back over there. Write some data over there, yeah. What I don't see in the logs here","speaker":null,"is_sponsor":0},{"start_s":1023.82,"end_s":1027.22,"text":"is how much power this consumed. How much did this cost?","speaker":null,"is_sponsor":0},{"start_s":1027.22,"end_s":1032.34,"text":"I actually haven't done the math on that. I think it roughly draws around 8,000 watts","speaker":null,"is_sponsor":0},{"start_s":1032.38,"end_s":1035.1,"text":"which means 24 hours a day for a year.","speaker":null,"is_sponsor":0},{"start_s":1036.3,"end_s":1039.1,"text":"Yeah, it was like 10 grand. That's Canadian.","speaker":null,"is_sponsor":0},{"start_s":1040.14,"end_s":1043.9,"text":"You know, that's, are you kidding me right now? No. Like just CPUs.","speaker":null,"is_sponsor":0},{"start_s":1043.9,"end_s":1049.94,"text":"We don't even have GPUs in this thing. It's like 1,500 Watts in SSDs alone.","speaker":null,"is_sponsor":0},{"start_s":1049.94,"end_s":1054.7,"text":"Okay, but hey, that means our record should be safe for a while then, right?","speaker":null,"is_sponsor":0},{"start_s":1054.7,"end_s":1057.82,"text":"Well, it's possible, maybe even probable","speaker":null,"is_sponsor":0},{"start_s":1057.82,"end_s":1061.58,"text":"that someone is already working on a run that would beat this record","speaker":null,"is_sponsor":0},{"start_s":1062.02,"end_s":1065.02,"text":"and they could probably even do it on a single machine. They totally could.","speaker":null,"is_sponsor":0},{"start_s":1065.02,"end_s":1069.26,"text":"But that's how it is with computing and they can never take that piece of paper away.","speaker":null,"is_sponsor":0},{"start_s":1069.26,"end_s":1072.74,"text":"I can, I'm taking this one home. And don't forget about the other pieces of paper.","speaker":null,"is_sponsor":0},{"start_s":1072.74,"end_s":1076.14,"text":"Okay, real talk though. In school, we're taught that two digits of pi","speaker":null,"is_sponsor":0},{"start_s":1076.14,"end_s":1081.02,"text":"is enough to approximate most calculations but obviously, depending what you're doing,","speaker":null,"is_sponsor":0},{"start_s":1081.02,"end_s":1086.02,"text":"you could need a few more. Is there, in your mind, any practical use","speaker":null,"is_sponsor":0},{"start_s":1086.02,"end_s":1090.02,"text":"for 300 trillion digits? No, I mean other than for this.","speaker":null,"is_sponsor":0},{"start_s":1090.02,"end_s":1094.26,"text":"But it was fun. It's about the journey, not the destination Linus.","speaker":null,"is_sponsor":0},{"start_s":1094.26,"end_s":1099.18,"text":"It's about doing something cool with the help of Kyoksia who builds high quality, high performance storage","speaker":null,"is_sponsor":0},{"start_s":1099.18,"end_s":1104.58,"text":"for the data center and who will have link down below. It's about Weka and their crazy software.","speaker":null,"is_sponsor":0},{"start_s":1104.58,"end_s":1107.74,"text":"It's about Y Cruncher. It's about because we fucking could.","speaker":null,"is_sponsor":0},{"start_s":1107.74,"end_s":1110.9,"text":"Because we fucking can't. Just like we could also shout out","speaker":null,"is_sponsor":0},{"start_s":1110.9,"end_s":1115.1,"text":"some of the other folks who helped us. Yeah, Josh and Bob again. Thank you so much from Weka.","speaker":null,"is_sponsor":0},{"start_s":1115.1,"end_s":1119.18,"text":"Gigabyte for sending us that server and I haven't made content about it in like four years.","speaker":null,"is_sponsor":0},{"start_s":1119.18,"end_s":1123.5,"text":"Just, just thank you. AMD, AMD sent the CPUs for the compute node","speaker":null,"is_sponsor":0},{"start_s":1123.5,"end_s":1128.18,"text":"like three years ago. Finally, thank you. I swore I was gonna make this video","speaker":null,"is_sponsor":0},{"start_s":1128.18,"end_s":1131.94,"text":"and it happened. It just took longer than I thought. Thank you, James, the writing manager","speaker":null,"is_sponsor":0},{"start_s":1131.94,"end_s":1136.58,"text":"for being patient with this project. And hey, if you guys wanna check out","speaker":null,"is_sponsor":0},{"start_s":1136.58,"end_s":1141.06,"text":"more Linus and Jake shenanigans, how about the high availability cheapo computers?","speaker":null,"is_sponsor":0},{"start_s":1141.06,"end_s":1144.42,"text":"That was fun. Cheapo computers? Yeah, I remember. Oh, that was cool.","speaker":null,"is_sponsor":0},{"start_s":1144.42,"end_s":1147.5,"text":"Yeah, that was super cool. I don't know if we didn't actually do the cheapo one. We did it, we just did the demo.","speaker":null,"is_sponsor":0},{"start_s":1147.5,"end_s":1150.02,"text":"Just for the intro. I know, but it was cool. Yeah, yeah, that was cool.","speaker":null,"is_sponsor":0},{"start_s":1150.7,"end_s":1154.38,"text":"Yeah, this was fun. I don't think I ever wanna do this again.","speaker":null,"is_sponsor":0},{"start_s":1154.38,"end_s":1155.22,"text":"And cut.","speaker":null,"is_sponsor":0}],"full_text":"When I was a young tinkerer, one of my favorite tools was super pi, a benchmark and stability test that calculates pi. No, not this kind of pi. But rather, the mathematical constant that describes the ratio between a circle's circumference and its diameter. As far as we know, pi is an irrational number, meaning that there is an infinite number of digits after the decimal place. And with my overclocked athelon, I could calculate 32 million of those digits in a half an hour or so. Girls dug it. But I dreamed of more. I wanted to calculate more digits of pi than any man had ever calculated before. And why shouldn't I? Online is motherfucking tech tips. So the current record, set by Jordan Reines last year, is 202 trillion digits. Trillion? Yeah, you say. That's a lot of athelons. It's more than that. See, at a certain point, it's not even just the computation that becomes the problem. But rather, it's the storage. Fortunately for us, both have gotten a little beefier. And thanks to our friends over at Kyoksia, who sponsored this entire project, providing over two petabytes of Gen 4 NVMe storage, we were able to smash that record, calculating nearly 100 trillion more digits. The process of getting this was an absolute cluster. But hey, that's content, baby. And here it is, our verified Guinness World Record for a calculating pi to an astonishing 300 trillion digits. Holy shit. Let's talk about how we got here. The scale of 300 trillion digits into perspective. At size four aerial font, which is still readable, but pushing the limits, you can fit around 25 and a half thousand digits on a normal sheet of paper. That means that my childhood dreams of 32 million digits could fit on around 1,250 sheets of paper. But if we wanted to print out the number of digits that we just calculated. We should totally do that. No, why not? Because it would be literally billions of pages. It's only paper, Linus. What could it cost, $10? No. Let's take a look at the hardware. Yes, my friends. This is the super secret project that we teased last time we were working on thermal management for the million dollar PC. You've seen this six server cluster here before, but while originally it had about a petabyte of stupid fast, Keopsia Gen 4 NVMe storage, it has thrown a little bit since then. The 72 15 terabyte drives that made up the original pool would have like almost been enough space to reach my original goal of 200 trillion digits. That was double the standing record at the time set by Emma at Google. But it wasn't anywhere near enough to beat the 202 trillion digit record that popped up a few months into the planning and testing for this project. So I had to call in like just a couple favors. Specifically from our friends over at Gigabyte, who sent this lovely 1U DualSocket Epic chassis, our 183 from Keopsia, who went to us an older tie-in server from their test feed lab. And finally from AMD, who provided this tight mate reference platform for the launch of Epic Genoa. That gave us a total of nine servers. Why so many servers? I mean, couldn't we just pack all the storage in one? I mean, yeah, but here's the thing. We already had the million dollar PC, and those nodes, they're already full. So if we wanted to expand the storage, we either needed to throw away that existing petabyte we already had, or we need to expand the cluster with more machines. It's definitely not the most power or space-efficient way to get two petabytes of NVMe, but you work with what you got. That's why we ended up with kind of a mix of drives as well. See, every one of our servers needs to have the same amount of storage space. Otherwise, it's lowest common denominator, and you're gonna waste any of the extra capacity that's in the higher capacity nodes. But the issue is that some of our machines are expansion slot-challenged. So Kyoksia had to send over a bucket of the 30 terabyte drives, allowing us to stuff a whopping 245 terabytes in each of these nine servers, totaling 2.2 petabytes of raw combined Gen4 storage. So that takes care of the capacity, but if my count is correct, there's another server here that we haven't mentioned yet, or at least it looks like a server, but IDK, half the fronts are missing. So those are speed holes, brother. It's for performance. All right, let's get it out of here and take a look. No problem to get this removed. I just got to battle the crack in here. Do you want to mark any of these? Nope, doesn't matter. This is the compute node, the machine that actually did the crunching. This Gigabyte R283 Z96 has been through some modifications, let's go. It started life as a 24-base storage server, actually. So most of the 160-inch PCIe Gen5 lanes from its dual 96-core Epic processors were allocated to storage upfront, but our machine, it doesn't really need storage anymore. All that storage is in everything else. What it needs now, or needed anyways, was networking a lot of it. Specifically, four of these NVIDIA Melanox 200 gigabit Kinect X7 network cards. These things are wild. Oh yeah. They have two of those 200 gigabit ports per card, and thanks to the insane bandwidth of their 16X PCIe Gen5 slot, which is good for, I don't know, 64 gigabytes a second both ways, these can saturate both of those ports at the same time. So if you put them together, that's... It's 1.6 terabits of throughput. That's a lot of terabits. Yeah, it's actually around 100 gigabytes per second to each of our 96-core CPUs, which yes, can actually handle that, thanks to the 24128 gig sticks of DDR5 ECC that Micron sent over. That's three terabytes of RAM. The more the better. This is the moreest I could get. As for the storage nodes, those are using dual Kinect X6 200 gig cards from the original setup. So each storage server has 400 gig, or about a dual layer Blu-ray per second. But how does the whole thing work together? Well, we gotta put it back in the rack before I can show you that. Yeah, that'll probably help me. Okay, there we go. All right, it should be back up. With the magic of Weka FS, which is the same clustered file system that we've been using ever since we first set up the million dollar PC, we're able to use, all that network speed we just talked about to run one combined file system off of all nine of our storage servers, completely transparently to any application, including Y Cruncher, the application we're using to calculate Pi. When I say the same though, I don't mean the same, same. I mean, the same old array did work, but that version of Weka was years out of date and running on an operating system that is now completely end of life. Luckily for us, the Weka folks helped us nuke the old installs and install their custom image, which comes with everything pretty much ready to roll. Massive shout out to Josh and Bob, you guys rock. Thank you for helping us achieve this silly goal. And the rest of the folks at Weka for allowing us to use your software in a very, very unsupported unconventional way. Oh, oh yeah. I basically had to trick Weka into thinking that each server is actually two servers. That way we could use the most space efficient stripe width, which for Weka means each chunk of data gets split into 16 pieces with two pieces of parity data calculated. 16 plus two is 18, which is also what nine servers times two instances of Weka gets you. For reference, in a supported configuration, we would have needed 19 discrete servers in order to accomplish this stripe width, including two for parity and one as a hot spare, which is fantastic for an enterprise environment like where Weka is meant to be deployed, but very expensive for us. Enough to ever gather. How fast is it? We haven't even tuned it yet, what do you buy? Okay, well we can tune it. First we can tune it. The biggest hurdle on the storage side was finding a way to limit the amount of data that flowed between the two CPUs, which may be a bit counterintuitive, but as soon as say the left CPU wants to send data via the network cards that are connected to the right CPU, that's a ton of latency. That's a lot of hops. And on top of that, there's a limited amount of bandwidth. And since this is such a memory intensive calculation, that's why we need so much RAM, and that's why we need all this storage, we don't wanna waste memory bandwidth. So we set up two Weka client containers, which is just their application that runs on a computer and allows you to access the storage. Each of those containers got 12 cores assigned to it, one per chiplet on our giant CPUs. So we can maximize the turbo speed? No, actually the reason for that is the cache. So those are 3DV cache CPUs. That gives us a certain amount of cache per chiplet, and we didn't want the buffers of Y Cruncher, which is like the amount of space it uses to like hold stuff in flight to spill out of that cache. Because as soon as you do, now it's in memory, more memory copies, more wasted bandwidth. And I tested a lot, which we'll get into a bit. But first, why don't we look at how fast it goes? Final setup, underscore final, underscore for real. These are just scripts to like make the Weka containers. Look how the cores, those ones are at 100% usage, those individual ones, those are all Weka IO cores basically. It's a lot of compute that needs to be reserved. But when you're talking like 100 plus gigabytes a second, which theoretically we are, but you haven't actually shown me that yet. Here's our little script. The interesting thing about those cores being used is that while you can just run an app and hope that it ignores them, the Linux scheduler, not always the best for that. So there's this command called task set, which allows you to like map whatever command or application you're running to only run on specific cores. Core one is a Weka core, we're skipping that one. Core nine, we're skipping that one. And then this is running two separate tasks, one for each of our mounts. And you can see it only has the CPU cores from CPU one or CPU two, dependent on the map folder we're using. Let me run over to Weka. Cute little dashboard. Woo! It's not a hundred, but it is pretty nice. That's writing. This is a write. That's right. That's just setting. You said you were doing read. It is a retest, but it's setting up the files. I was telling Jake as we were working on the review for this script, I was like, man, I've gotten kind of numb to these numbers. You know, after all the iterations of one, it can all that. You know, a hundred gigabytes a second, this is over the network. It never gets old. Actually, you know, you're numb to the numbers until the numbers you're looking at are like 200 gigabytes a second or something. But like, no, but dude, like the first time we cracked a hundred gigabytes a second. It was all installed locally. And that was no file system, all local. This is over a network. With a file system. With a functioning file system. Real ass actual copying data. Yeah. That's crazy. It is crazy. And look at the read. The latency is two milliseconds. It's because I'm like oversaturating this. So down here, you see the front end usage. That's the cores on this machine. They're being utilized a hundred percent. I have the system set to have four NUMA nodes per CPU because that made Y Cruncher a little bit happier. If I turn that off and do one NUMA node per socket, I was able to get this up to like 150 gigabytes a second. At the time, I actually set the record for the fastest single client usage. According to the WECA guys, they since have broken that with like GPU direct storage or whatever. But this isn't even with RDMA. This is just good code. Built for NVMe. It's also good SSDs. Oh brother. Yeah. Look at this. The average usage of the drives in the array right now is 23%. Wait, nothing. Shout out Kyokesia. We're running a mix of their CD and CM series Gen 4 drives. These things are super fast with individual drive read speeds that are in excess of five gigabytes per second. And that's not even the fastest they have. You step up to their Gen 5 drives and you're talking like 12, 13, 14 gigabytes a second. They're available in self-encrypting SKUs. They have Dyke failure recovery, power loss protection. They're perfect for your next server or data center deployment. Yeah. And this entire time running this application, I didn't have a single drive, had a single issue. You got a spreadsheet for tuning Y Cruncher? Dude, dude. When he adjust the glasses, you know, getting real. Okay. Here's Y Cruncher. Let's just do a normal pie run. 32 million, 25. Let's go. So this would have taken about half an hour on my old past one. It took 0.2 seconds to compute. Really? Yeah. What? For the uninitiated, Y Cruncher is the software we use to do this run. It was developed by a guy named Alexander Yee. Super nice guy, helped us do some messing about. Also, didn't help me that much. Honestly, I asked a lot of questions he didn't answer, but I figured it out anyways, I guess. To be clear, the storage, we've talked about, oh man, we need a lot of storage. It's because Y Cruncher uses the storage like RAM because the output of digits, like that 300 trillion, is only about 120 terabytes compressed. Wow, it's not that much. Could we make that available to people? Oh, God. I don't wanna think about that. We'll try. Maybe we'll do a torrent or something. Oh, God. But when you're setting up for a run, it actually tells you how much storage you need for this swap space, which is just basically like RAM plus. That's slower. That's what it's using it for. For us, it was like, yeah, we're probably gonna use like a 1.5 petabytes of space at peak. It's pretty crazy. Okay. But what did you tune, Jake? You wanna see the tuning? Oh boy. So this is like some of the tests I did. So why Cruncher was built for direct attached storage? And in fact, it doesn't even want you to use like a RAID controller or software RAID. It does its own internal RAID. And then on top of that, it also has things you can tune like, what multi-threading algorithm do you use? And like how many threads? And what size are your IO buffers? How much memory? How much memory and how many bytes can we read per seek? Ideally, because if you're using hard drives or SSDs, it's different. Got it. It was built in an older time and the code base is huge. And Alex just does it in his spare time as far as I'm aware. So no shade, super cool project. But at some point in the future, technology has gotten good enough that we can just rely on the operating system to do this. Like that Weka speed test we just did. Let's hope for that. One day. Anyway, with everything dialed in on August 1st, 2024. Yes. It was a while ago. Yes. Jake finally hit enter on his command prompt and began our glorious journey to nerd glory. Yeah, for 12 days. And then it stopped thanks to a multi-day power outage while I was on vacation. And it was so early in the process that I said, f*** that s***. Let's just start it again. I want to get a clean run with no outages. But it was smooth sailing from then on. No, it wasn't. See, even with a cluster this chonk, calculations like this take a lot of time. The previous 202 trillion digit record took a hundred days just to compute. And whether it's bad luck or user error. I think there's a little bit of user error. Finding a space in our facilities where a machine like that can operate completely uninterrupted. I have no idea what I just done plugged. How's your edit going? I'm holding your server. What's the challenge? At first things were pretty okay in the lab server room here. We had our air conditioning working to keep things cool. We had our battery backup to keep the digits flowing during a short outage or a brownout. It's just that over the course of this run, we had multiple other power outages and none of them were small. So each time our calculation had to stop and restart. And the same goes for when the cooling failed multiple times. It's pretty mid now though. The AC is fixed and that with our sick water door that is definitely not going to leak has room at around 22, 23 degrees. And the cluster is still running. Fortunately, Y Cruncher makes checkpoints which allows resuming the calculation. But it does mean our record could have been done much faster. Like based on the log somewhere in the neighborhood of like 30, 40, 50 days faster. The 300 trillionth digit of pie is five. Really? Ha ha ha ha ha. It's done baby. Wow. It only took way longer than it should have. 190 days. Speaking of the logs, Jake's got them here right now. 100 gigabytes a second read and then you're writing for like 30, 40 gigabytes a second. So that's as it's, what? Pulling in data from the swap space which is our NVMe drives and bringing it into the three terabytes of RAM that are in the system. So the crunch, crunch, crunch, crunch, crunch, crunch, crunch and huck it back over there. Write some data over there, yeah. What I don't see in the logs here is how much power this consumed. How much did this cost? I actually haven't done the math on that. I think it roughly draws around 8,000 watts which means 24 hours a day for a year. Yeah, it was like 10 grand. That's Canadian. You know, that's, are you kidding me right now? No. Like just CPUs. We don't even have GPUs in this thing. It's like 1,500 Watts in SSDs alone. Okay, but hey, that means our record should be safe for a while then, right? Well, it's possible, maybe even probable that someone is already working on a run that would beat this record and they could probably even do it on a single machine. They totally could. But that's how it is with computing and they can never take that piece of paper away. I can, I'm taking this one home. And don't forget about the other pieces of paper. Okay, real talk though. In school, we're taught that two digits of pi is enough to approximate most calculations but obviously, depending what you're doing, you could need a few more. Is there, in your mind, any practical use for 300 trillion digits? No, I mean other than for this. But it was fun. It's about the journey, not the destination Linus. It's about doing something cool with the help of Kyoksia who builds high quality, high performance storage for the data center and who will have link down below. It's about Weka and their crazy software. It's about Y Cruncher. It's about because we fucking could. Because we fucking can't. Just like we could also shout out some of the other folks who helped us. Yeah, Josh and Bob again. Thank you so much from Weka. Gigabyte for sending us that server and I haven't made content about it in like four years. Just, just thank you. AMD, AMD sent the CPUs for the compute node like three years ago. Finally, thank you. I swore I was gonna make this video and it happened. It just took longer than I thought. Thank you, James, the writing manager for being patient with this project. And hey, if you guys wanna check out more Linus and Jake shenanigans, how about the high availability cheapo computers? That was fun. Cheapo computers? Yeah, I remember. Oh, that was cool. Yeah, that was super cool. I don't know if we didn't actually do the cheapo one. We did it, we just did the demo. Just for the intro. I know, but it was cool. Yeah, yeah, that was cool. Yeah, this was fun. I don't think I ever wanna do this again. And cut."}