WEBVTT

00:00:00.120 --> 00:00:08.120
I bet most of you forgot about the million dooll PC here and honestly I

00:00:05.680 --> 00:00:12.679
don't blame you it's kind of been a while since we touched the thing but I

00:00:09.960 --> 00:00:18.600
swear it's not our fault right before we were going to do the big demo it broke

00:00:15.639 --> 00:00:24.199
then it broke again and again it's just been the worst kind of problem to

00:00:20.680 --> 00:00:28.199
troubleshoot sometimes our full petabyte

00:00:24.199 --> 00:00:30.800
of NVMe ssds gets detected and other

00:00:28.199 --> 00:00:35.160
times some of them don't but I wouldn't make a video just to tell

00:00:33.320 --> 00:00:40.000
you it's still broken would I the solution ended up being shockingly

00:00:37.480 --> 00:00:45.520
simple and something that I bet you've actually seen before but that's in the

00:00:41.760 --> 00:00:48.559
past this is the now and it is finally

00:00:45.520 --> 00:00:52.000
time when this video is over we are

00:00:48.559 --> 00:00:55.520
going to have the largest and fastest

00:00:52.000 --> 00:00:57.160
storage server on YouTube at least until

00:00:55.520 --> 00:01:00.960
we have to find all the boxes pack it up and send it back to them did I tell you

00:00:58.559 --> 00:01:04.960
they threw away the boxes sorry what oh my God they're not behind me how are

00:01:03.320 --> 00:01:10.880
we going to deal with that how are we going to segue to our sponsor gskill G

00:01:08.080 --> 00:01:15.080
skills Trident Z5 Neo ddr5 memory is built for AMD ryzen 7,000 series

00:01:13.360 --> 00:01:19.200
processors and oneclick memory overclocking learn more at the link down

00:01:25.920 --> 00:01:32.920
below the culprit was this guy but not

00:01:29.640 --> 00:01:35.880
this guy entirely four of the drives

00:01:32.920 --> 00:01:40.560
specifically out of the 12 had this erratic Behavior where we would fire up

00:01:38.040 --> 00:01:44.600
the whole system and they wouldn't be there then we'd replug the cables and

00:01:42.600 --> 00:01:48.479
they'd come back and we could Benchmark it and test it and validate and then and

00:01:46.880 --> 00:01:52.960
then they would be gone we tried everything from moving those drives to

00:01:50.520 --> 00:01:57.240
different Bays to different servers even unplugging and replugging the cables and

00:01:54.920 --> 00:02:01.880
it just kept freaking happening and in the end it was one of

00:01:59.640 --> 00:02:06.360
the most basic troubleshooting steps in the book have you tried unplugging it

00:02:04.479 --> 00:02:11.879
and plugging it back in but not the drive not the back plane or the cables

00:02:09.599 --> 00:02:18.000
the CPU cuz what it's hard to wrap your brain around is that these are NVMe or

00:02:14.920 --> 00:02:19.680
PCI Express drives and through the

00:02:18.000 --> 00:02:25.560
connector on the back through the back plane through the motherboard to the CPU

00:02:22.400 --> 00:02:27.800
socket they actually connect directly

00:02:25.560 --> 00:02:35.000
all the way back to your processor which means that un Lo Loosely seated CPU

00:02:32.120 --> 00:02:39.560
could actually cause a broken Link in that chain and here's the thing after

00:02:38.160 --> 00:02:43.519
the servers had been running for a few minutes that's just enough thermal

00:02:41.640 --> 00:02:49.040
expansion for that CPU pin that's just not quite touching bam now it's touching

00:02:46.360 --> 00:02:52.000
boom intermittent problem yeah of course with the system already on it's not

00:02:50.440 --> 00:02:56.640
going to pick those drives up again so when we were rebooting it after reting

00:02:54.159 --> 00:03:01.560
the cables they'd show up again it makes perfect sense now but oh my God trying

00:02:59.440 --> 00:03:07.159
to figure F this out you got to think outside of the box and also also deep

00:03:04.080 --> 00:03:08.799
inside the Box you got it you got it

00:03:07.159 --> 00:03:12.640
everybody's got it oh this is not screwed on Jake what is not uh this top

00:03:11.319 --> 00:03:17.959
cover oh whatever where are we working on this we in there oh sure you have a

00:03:15.280 --> 00:03:21.760
screwdriver right I have a highly erect server that's for sure yes I have my

00:03:20.000 --> 00:03:26.040
highquality ratcheting magnetic screwdriver LTT Store.com now if we were

00:03:24.360 --> 00:03:30.879
smart folks we might have looked at what drives they were and traced it back to

00:03:28.159 --> 00:03:35.720
what CPU it was yeah that's CP but you know what I was trying to say is we're

00:03:33.120 --> 00:03:38.760
just going to reat both the CPUs damn it sure I don't have a screwdriver yet I

00:03:37.439 --> 00:03:44.159
went I put a Phillips bit so you're just going to have to do this all yourself no that's fine crap I don't have a torque

00:03:41.560 --> 00:03:48.319
set you know what sometimes hex works oh my God don't do that what don't do that

00:03:46.720 --> 00:03:51.599
that's awful you're going to strip this and then we're going to strip it you're

00:03:49.760 --> 00:03:56.360
going to make this a very long video I'm not going to strip it you could go grab

00:03:54.280 --> 00:04:00.680
another one you can make yourself useful I could the funny thing is oh God oh my

00:03:58.480 --> 00:04:04.879
God I'm kidding oh my God his face why did you even take

00:04:02.200 --> 00:04:08.920
it out just I'm just no don't clean the thermal

00:04:05.760 --> 00:04:11.720
P oh my god really yeah receed it baby

00:04:08.920 --> 00:04:19.000
no we should reop it no all right noral pce is expensive in this economy right I

00:04:14.799 --> 00:04:21.079
mean left correct left correct yes no uh

00:04:19.000 --> 00:04:26.600
no I was never much good at math why are these so finicky damn

00:04:23.520 --> 00:04:29.280
it beautiful watch the server not boot

00:04:26.600 --> 00:04:33.479
now I know right that is a very real possibility

00:04:30.240 --> 00:04:36.720
with AMD's Threadripper and epic CPUs

00:04:33.479 --> 00:04:38.600
because they're so large it is pretty

00:04:36.720 --> 00:04:42.440
easy to accidentally install them a little bit cockeyed which can cause

00:04:40.639 --> 00:04:48.919
these sorts of issues whether it's PCI Express devices having intermittent

00:04:44.160 --> 00:04:51.520
problems or RAM this isn't even on yet

00:04:48.919 --> 00:04:55.600
this is like the boot sequence this is the the ipmi for those of you who

00:04:53.919 --> 00:05:00.479
haven't seen the previous parts of these series or who understandably forget them

00:04:58.400 --> 00:05:07.680
at this point what you're looking at here is a one pyte of flash storage

00:05:04.720 --> 00:05:13.479
server which is a non-trivial thing to build because aside from just having

00:05:10.240 --> 00:05:15.800
enough Bays to put that many SSS in if

00:05:13.479 --> 00:05:23.440
you want to get anywhere near the full performance of these kokia drives you

00:05:18.960 --> 00:05:28.720
need a ton of computer so inside each of

00:05:23.440 --> 00:05:30.160
these six oneu servers here is 12 15

00:05:28.720 --> 00:05:36.639
tbte Drive so that's for a total of

00:05:32.440 --> 00:05:40.199
72 drives each of these CD 6r Enterprise

00:05:36.639 --> 00:05:43.520
drives is capable of a whopping 5.5 gab

00:05:40.199 --> 00:05:45.280
a second reads 4 GB a second rights all

00:05:43.520 --> 00:05:50.880
of these work together using a file system called W FS that's designed

00:05:47.800 --> 00:05:53.919
specifically for NVMe drives to achieve

00:05:50.880 --> 00:05:56.240
unbelievable performance of course in

00:05:53.919 --> 00:06:00.360
order to measure this performance you actually have to put some kind of load

00:05:58.880 --> 00:06:07.919
on the system that's where this guy comes in it has

00:06:02.880 --> 00:06:11.680
two 64 core epic CPUs eight of NVIDIA's

00:06:07.919 --> 00:06:15.240
a100 gpus those are critical to generate

00:06:11.680 --> 00:06:17.960
the load and it has a walking

00:06:15.240 --> 00:06:25.400
8200 gabit per second network connections and of course those run

00:06:20.720 --> 00:06:28.840
through this 32 Port 200 GB network

00:06:25.400 --> 00:06:31.720
switch to each of our servers down below

00:06:28.840 --> 00:06:36.120
that's a lot of high-speed connectivity but with great performance comes great

00:06:33.800 --> 00:06:43.160
power consumption and to run this thing full bore we needed to plug extension

00:06:38.720 --> 00:06:47.199
cords into five separate 15 amp 120 volt

00:06:43.160 --> 00:06:49.120
Breakers and a separate 208v 30 amp

00:06:47.199 --> 00:06:53.240
breaker and did I mention they're overclocked I mean I feel like I can

00:06:51.520 --> 00:06:57.800
tell from all the heat coming off of it it is flipping warm back here it's very

00:06:55.720 --> 00:07:02.520
uncomfortable to stand here another thing we ran into last time just to get

00:07:00.000 --> 00:07:05.879
the array started these servers were so out of sync just from Shipping like

00:07:04.199 --> 00:07:09.680
their time the time on the servers was so out of sync the real time clock you

00:07:07.280 --> 00:07:13.840
know could be 5 10 seconds off whatever that the array would not start and the

00:07:11.680 --> 00:07:18.080
easiest way to get them to sync up again is ntp Network time protocol and of

00:07:16.440 --> 00:07:22.560
course without networking this wasn't set up to actually connect to the

00:07:20.080 --> 00:07:27.520
internet all of these have static IPS that are assigned so in our router I had

00:07:24.520 --> 00:07:29.840
to create uh a VLAN that has the same

00:07:27.520 --> 00:07:33.879
subnet information and routed from the server room all the way over here to

00:07:31.560 --> 00:07:38.599
here and plug it into the switch had to go in the switch and tell it oh 10 gig

00:07:35.800 --> 00:07:42.120
is fine it doesn't have to be 200 gig and then they turned on no problem but I

00:07:40.280 --> 00:07:45.800
want to access the ipmi so we can see the power consumption sure so we've got

00:07:43.879 --> 00:07:48.520
to plug this bad boy into all the ipmi ports really quick oh okay so we need

00:07:47.280 --> 00:07:57.360
some patch cables patch cables are right in the front all right let's do it oh that's wild yeah the management port for

00:07:53.159 --> 00:07:59.919
this one is actually on the front oh man

00:07:57.360 --> 00:08:06.039
it's so much nicer in here I want to see all 72 drives is what I want to see you

00:08:02.240 --> 00:08:07.800
might be asking for a lot no no bare

00:08:06.039 --> 00:08:12.280
minimum when those four drives are missing the array just rebuilds oh God

00:08:10.520 --> 00:08:18.120
just give it a second oh sick it's working okay the cool thing though when those

00:08:16.000 --> 00:08:22.960
four drives are missing the array like rebuilt itself like it was nothing takes

00:08:20.080 --> 00:08:28.520
like 5 minutes cuz they're so fast fast yeah but we want the capacity want the

00:08:25.360 --> 00:08:30.840
whole point a petabyte a flash okay I I

00:08:28.520 --> 00:08:37.039
hate two burst your bubble for some reason they only configured it with a

00:08:32.680 --> 00:08:39.560
500 tbte like drive only 500

00:08:37.039 --> 00:08:43.560
terab I think we can expand it later but for the purposes of the first demo I

00:08:42.080 --> 00:08:51.080
think let's just leave it I don't want to break it I'm ssing into the the node

00:08:48.279 --> 00:08:54.880
oh look at that how many is that 1 2 3 4 5 6 7 8 good that's the correct number

00:08:53.519 --> 00:08:58.720
that's a lot of GPU it's a lot of freaking gpus is there is there NVIDIA

00:08:56.800 --> 00:09:06.040
SMI each of these is like oh look at that four 00 Watts 3200 watts of just

00:09:03.839 --> 00:09:09.640
GPU we're not connecting to these things over anything like SMB cuz that would be

00:09:08.079 --> 00:09:13.079
way too slow we're like directly connecting with the WCA interface and

00:09:11.800 --> 00:09:17.200
we're going to be using GPU direct storage which is super cool we'll talk a little bit more about that later yeah

00:09:15.399 --> 00:09:20.839
this visualization is super cool the green things I'm looking at I guess are

00:09:18.920 --> 00:09:25.120
drives and then the gray things are cores or no the gray things are drives

00:09:23.519 --> 00:09:30.240
the green things are core that makes way more sense actually and then blue is our

00:09:27.240 --> 00:09:32.440
network interfaces that is a pretty

00:09:30.240 --> 00:09:36.440
visual it oh yeah you see this look it's they're pinning cores to specific Drive

00:09:34.680 --> 00:09:40.880
slots it looks like there's some special sauce going on here I think there is

00:09:38.079 --> 00:09:44.800
some manual ass shiz that went on in the configuration of the system seriously

00:09:42.680 --> 00:09:52.079
though a lot of tuning because like I said before making a one petabyte flash

00:09:47.880 --> 00:09:54.760
server easy making one that performs

00:09:52.079 --> 00:09:58.600
near actual we're still not going to be hitting the peak you'll remember when we

00:09:56.519 --> 00:10:03.680
did the honey badger server we got like 100 gigs of second that was directly

00:10:01.399 --> 00:10:10.399
writing to individual drives there's no file system no no raid parody networking

00:10:07.600 --> 00:10:15.399
nothing was going on yeah this is a usable file system across six servers

00:10:14.279 --> 00:10:20.720
all right you ready to see some big numbers yes this one's not as exciting as you'd expect you have to you run it

00:10:19.200 --> 00:10:23.880
and then it shows you it like gives you a file with the numbers that's not that

00:10:22.480 --> 00:10:28.279
fast that's not very fast just's give it a sec okay ooh it's faster you said this

00:10:26.839 --> 00:10:32.360
was going to be cool it's trying different sizes this is probably like 4K

00:10:31.160 --> 00:10:37.839
right now the script is starting with GDs or GPU direct storage it sounds very

00:10:35.880 --> 00:10:41.120
complicated and realistically in practice setting it up was probably very

00:10:39.480 --> 00:10:45.440
complicated for somebody but the main difference is rather than taking the

00:10:43.079 --> 00:10:50.079
data from those ndme drives putting them into the CPU's memory and then into the

00:10:48.000 --> 00:10:54.720
GPU's memory we're skipping that middle step it goes right from NVMe to the

00:10:52.440 --> 00:10:57.880
GPU's memory oh there we go 15 gig second was CPU first then no it's doing

00:10:56.600 --> 00:11:06.440
GDs but like I said it's doing different block sizes you ready to see some numbers okay so this is GDs says GPU

00:11:03.920 --> 00:11:13.279
direct storage only read tests

00:11:08.800 --> 00:11:18.079
1006 gibes per second so that's like

00:11:13.279 --> 00:11:20.440
114 gigabytes per second over two with a

00:11:18.079 --> 00:11:26.519
file system Blu-rays per second like like full quality Big Boy the whole

00:11:23.000 --> 00:11:29.040
thing per second holy and again we've

00:11:26.519 --> 00:11:32.720
seen numbers like this before but not with a file system loaded on it no we

00:11:31.279 --> 00:11:36.279
could take oh you know W you could actually use this you want to like SMB

00:11:34.720 --> 00:11:39.519
for a second here I mean well we don't have a client that's anywhere near fast

00:11:38.120 --> 00:11:45.800
enough which I guess is a perfect opportunity for us to talk about what this would be for want to show us that

00:11:43.120 --> 00:11:50.399
visualization demo Humanity's desire to look beyond the stars is nothing new

00:11:48.519 --> 00:11:54.839
it's not and one of the major stepping stones in the space travel is finding a

00:11:52.680 --> 00:11:58.399
way to transport people in all seriousness though sending people to

00:11:56.279 --> 00:12:04.240
another planet is an incredibly complex challenge but one that modern Computing

00:12:01.360 --> 00:12:08.639
can help us with and one of the current proposed plans from NASA involves

00:12:05.880 --> 00:12:12.320
sending a six-person manned craft all the way to Mars after the 10-month

00:12:10.600 --> 00:12:17.839
journey the crew would transfer into a landing pod that is 16 m in diameter so

00:12:15.320 --> 00:12:22.199
roughly the size of a two-story house due to the sheer size of the Lander as

00:12:19.880 --> 00:12:26.000
well as the thin atmosphere on Mars it's not possible to use a parachute to slow

00:12:24.240 --> 00:12:31.680
down like they have on past Landings like the perseverance Rover so instead

00:12:28.720 --> 00:12:37.440
NASA wants to use retropulsion otherwise known as creating thrust in the opposite

00:12:34.440 --> 00:12:41.279
direction to slow the Lander down from

00:12:37.440 --> 00:12:44.000
12,000 M an hour to you know the ground

00:12:41.279 --> 00:12:48.000
in less than 7 minutes simulating this Landing is a very important part of the

00:12:46.199 --> 00:12:54.839
research and development for the project and with the power of the summit supercomputers 27,000 NVIDIA gpus over

00:12:53.240 --> 00:13:02.120
the course of a week they were able to build a 100 plus terabyte model of the

00:12:58.560 --> 00:13:04.320
landing that is over 1 billion Points

00:13:02.120 --> 00:13:08.600
each with seven numerical values density vorticity pressure that sort of thing

00:13:06.399 --> 00:13:14.519
and I guess we have NASA security clearance now because we have that exact

00:13:12.000 --> 00:13:19.240
model loaded up on the million dooll server and here it is and Jake's got it

00:13:16.800 --> 00:13:23.720
running right behind me now in the past Engineers would actually take this data

00:13:21.000 --> 00:13:28.240
model and render it out frame by frame into a video still super cool and very

00:13:26.199 --> 00:13:31.600
useful but if you want to tweak a parameter and see how how it affects the

00:13:29.639 --> 00:13:37.519
model guess what you're waiting hours for a new render with new technology

00:13:34.399 --> 00:13:39.600
like super fast NVMe storage uh NVMe

00:13:37.519 --> 00:13:45.639
speed file systems like w fs and the power of GPU direct well now we can have

00:13:43.240 --> 00:13:51.440
the graphics cards render the data directly from the storage bypassing the

00:13:47.560 --> 00:13:55.040
CPU allowing us to do this in basically

00:13:51.440 --> 00:13:57.399
real time real time manipulation that is

00:13:55.040 --> 00:14:02.560
I mean it's a it's a bit cinematic but we're getting we're getting like 5 FP

00:13:59.639 --> 00:14:09.560
5 FPS that's I mean a lot better than hold on hold on hold on 5 FPS each of

00:14:05.120 --> 00:14:11.680
these frames is over 14 gab of data yeah

00:14:09.560 --> 00:14:19.079
that's before it becomes a frame the gpus have to process that so each second

00:14:15.079 --> 00:14:22.279
we are streaming between 70 and 90 gab

00:14:19.079 --> 00:14:25.639
from that storage directly to those gpus

00:14:22.279 --> 00:14:29.440
so that's one to two Blu-rays each

00:14:25.639 --> 00:14:31.519
second can you like move it yeah can

00:14:29.440 --> 00:14:36.360
here let's yeah move it play it play it trust us this is really neat I wish

00:14:34.000 --> 00:14:39.120
there was a way for me to like translate how much is happening in the background

00:14:38.040 --> 00:14:44.600
I guess we could look at how much power it's drawing this is what you guys are

00:14:41.399 --> 00:14:47.120
going to want to see how many amps are

00:14:44.600 --> 00:14:53.519
going through our pdu here and that's just for this top server yeah why does

00:14:49.399 --> 00:14:56.040
it need 17 amps of 200 4,000 watts right

00:14:53.519 --> 00:14:59.399
now what the hell oh hey this is interesting look so one of our gpus you

00:14:58.000 --> 00:15:03.440
can see it's refreshing every every 2 seconds yeah they're kind of drawing a

00:15:01.240 --> 00:15:07.519
little more power now but still not anywhere near 400 W yeah that's

00:15:05.600 --> 00:15:13.720
interesting I guess this is just more storage dependent than it is actual GPU

00:15:11.320 --> 00:15:18.320
rendering dependent which makes sense that's they're pretty op gpus sort of

00:15:16.279 --> 00:15:23.399
why they sent this demo it's about the storage like moving it around a bit oh

00:15:21.000 --> 00:15:29.600
yeah they got a little Spike here you can see the utilization on the gpus is

00:15:25.920 --> 00:15:32.040
like 50 60% probably a lot of this

00:15:29.600 --> 00:15:36.040
is more the memory usage and less the actual core itself you can imagine the

00:15:34.440 --> 00:15:40.240
type of workflow Improvement you would have from being able to just mess around

00:15:38.240 --> 00:15:43.600
with this in real time rather than like rendering it waiting 3 days for it to

00:15:42.240 --> 00:15:48.120
finish rendering cuz you're at a university and you got a small budget

00:15:45.560 --> 00:15:53.120
works just like this segue to our sponsor vessie do you hate wet socks as

00:15:51.160 --> 00:15:57.399
much as I do vessie Footwear makes lightweight breathable and most

00:15:54.639 --> 00:16:02.000
importantly waterresistant shoes so no more squelchy socks their diamex

00:15:59.959 --> 00:16:06.000
material not only keeps your feet dry but keeps them warm in the winter and

00:16:03.920 --> 00:16:09.959
cool in the summer how does that work the stretchy design shows that Comfort

00:16:08.000 --> 00:16:13.959
is at the Forefront at times making you forget you're wearing shoes vessie makes

00:16:12.040 --> 00:16:19.040
cruelty-free products right down to the glue their shoes are 100% vegan whether

00:16:16.839 --> 00:16:23.160
it's a rainy city or a Rocky Trail the Herring bone tread design is there to

00:16:20.880 --> 00:16:27.040
help stop you from slipping around your feet deserve a little treat so go ahead

00:16:25.440 --> 00:16:31.839
and click the link below and use promo code Linus Tech tips to save $25 bucks

00:16:29.279 --> 00:16:35.639
on your first pair today if you guys enjoyed this video maybe check out the

00:16:33.399 --> 00:16:39.160
previous parts and uh I don't know maybe subscribe to Floatplane maybe we'll

00:16:37.079 --> 00:16:42.279
shoot an exclusive of packing them up no I don't think we'll do that I I like I

00:16:40.839 --> 00:16:46.759
got to wonder if there's anything else we could do I thought about video

00:16:43.880 --> 00:16:53.800
editing I would love to use these for the Machine Vision uh cat squirter

00:16:50.199 --> 00:16:53.800
turret project
