1
00:00:00,120 --> 00:00:08,120
I bet most of you forgot about the million dooll PC here and honestly I

2
00:00:05,680 --> 00:00:12,679
don't blame you it's kind of been a while since we touched the thing but I

3
00:00:09,960 --> 00:00:18,600
swear it's not our fault right before we were going to do the big demo it broke

4
00:00:15,639 --> 00:00:24,199
then it broke again and again it's just been the worst kind of problem to

5
00:00:20,680 --> 00:00:28,199
troubleshoot sometimes our full petabyte

6
00:00:24,199 --> 00:00:30,800
of NVMe ssds gets detected and other

7
00:00:28,199 --> 00:00:35,160
times some of them don't but I wouldn't make a video just to tell

8
00:00:33,320 --> 00:00:40,000
you it's still broken would I the solution ended up being shockingly

9
00:00:37,480 --> 00:00:45,520
simple and something that I bet you've actually seen before but that's in the

10
00:00:41,760 --> 00:00:48,559
past this is the now and it is finally

11
00:00:45,520 --> 00:00:52,000
time when this video is over we are

12
00:00:48,559 --> 00:00:55,520
going to have the largest and fastest

13
00:00:52,000 --> 00:00:57,160
storage server on YouTube at least until

14
00:00:55,520 --> 00:01:00,960
we have to find all the boxes pack it up and send it back to them did I tell you

15
00:00:58,559 --> 00:01:04,960
they threw away the boxes sorry what oh my God they're not behind me how are

16
00:01:03,320 --> 00:01:10,880
we going to deal with that how are we going to segue to our sponsor gskill G

17
00:01:08,080 --> 00:01:15,080
skills Trident Z5 Neo ddr5 memory is built for AMD ryzen 7,000 series

18
00:01:13,360 --> 00:01:19,200
processors and oneclick memory overclocking learn more at the link down

19
00:01:25,920 --> 00:01:32,920
below the culprit was this guy but not

20
00:01:29,640 --> 00:01:35,880
this guy entirely four of the drives

21
00:01:32,920 --> 00:01:40,560
specifically out of the 12 had this erratic Behavior where we would fire up

22
00:01:38,040 --> 00:01:44,600
the whole system and they wouldn't be there then we'd replug the cables and

23
00:01:42,600 --> 00:01:48,479
they'd come back and we could Benchmark it and test it and validate and then and

24
00:01:46,880 --> 00:01:52,960
then they would be gone we tried everything from moving those drives to

25
00:01:50,520 --> 00:01:57,240
different Bays to different servers even unplugging and replugging the cables and

26
00:01:54,920 --> 00:02:01,880
it just kept freaking happening and in the end it was one of

27
00:01:59,640 --> 00:02:06,360
the most basic troubleshooting steps in the book have you tried unplugging it

28
00:02:04,479 --> 00:02:11,879
and plugging it back in but not the drive not the back plane or the cables

29
00:02:09,599 --> 00:02:18,000
the CPU cuz what it's hard to wrap your brain around is that these are NVMe or

30
00:02:14,920 --> 00:02:19,680
PCI Express drives and through the

31
00:02:18,000 --> 00:02:25,560
connector on the back through the back plane through the motherboard to the CPU

32
00:02:22,400 --> 00:02:27,800
socket they actually connect directly

33
00:02:25,560 --> 00:02:35,000
all the way back to your processor which means that un Lo Loosely seated CPU

34
00:02:32,120 --> 00:02:39,560
could actually cause a broken Link in that chain and here's the thing after

35
00:02:38,160 --> 00:02:43,519
the servers had been running for a few minutes that's just enough thermal

36
00:02:41,640 --> 00:02:49,040
expansion for that CPU pin that's just not quite touching bam now it's touching

37
00:02:46,360 --> 00:02:52,000
boom intermittent problem yeah of course with the system already on it's not

38
00:02:50,440 --> 00:02:56,640
going to pick those drives up again so when we were rebooting it after reting

39
00:02:54,159 --> 00:03:01,560
the cables they'd show up again it makes perfect sense now but oh my God trying

40
00:02:59,440 --> 00:03:07,159
to figure F this out you got to think outside of the box and also also deep

41
00:03:04,080 --> 00:03:08,799
inside the Box you got it you got it

42
00:03:07,159 --> 00:03:12,640
everybody's got it oh this is not screwed on Jake what is not uh this top

43
00:03:11,319 --> 00:03:17,959
cover oh whatever where are we working on this we in there oh sure you have a

44
00:03:15,280 --> 00:03:21,760
screwdriver right I have a highly erect server that's for sure yes I have my

45
00:03:20,000 --> 00:03:26,040
highquality ratcheting magnetic screwdriver LTT Store.com now if we were

46
00:03:24,360 --> 00:03:30,879
smart folks we might have looked at what drives they were and traced it back to

47
00:03:28,159 --> 00:03:35,720
what CPU it was yeah that's CP but you know what I was trying to say is we're

48
00:03:33,120 --> 00:03:38,760
just going to reat both the CPUs damn it sure I don't have a screwdriver yet I

49
00:03:37,439 --> 00:03:44,159
went I put a Phillips bit so you're just going to have to do this all yourself no that's fine crap I don't have a torque

50
00:03:41,560 --> 00:03:48,319
set you know what sometimes hex works oh my God don't do that what don't do that

51
00:03:46,720 --> 00:03:51,599
that's awful you're going to strip this and then we're going to strip it you're

52
00:03:49,760 --> 00:03:56,360
going to make this a very long video I'm not going to strip it you could go grab

53
00:03:54,280 --> 00:04:00,680
another one you can make yourself useful I could the funny thing is oh God oh my

54
00:03:58,480 --> 00:04:04,879
God I'm kidding oh my God his face why did you even take

55
00:04:02,200 --> 00:04:08,920
it out just I'm just no don't clean the thermal

56
00:04:05,760 --> 00:04:11,720
P oh my god really yeah receed it baby

57
00:04:08,920 --> 00:04:19,000
no we should reop it no all right noral pce is expensive in this economy right I

58
00:04:14,799 --> 00:04:21,079
mean left correct left correct yes no uh

59
00:04:19,000 --> 00:04:26,600
no I was never much good at math why are these so finicky damn

60
00:04:23,520 --> 00:04:29,280
it beautiful watch the server not boot

61
00:04:26,600 --> 00:04:33,479
now I know right that is a very real possibility

62
00:04:30,240 --> 00:04:36,720
with AMD's Threadripper and epic CPUs

63
00:04:33,479 --> 00:04:38,600
because they're so large it is pretty

64
00:04:36,720 --> 00:04:42,440
easy to accidentally install them a little bit cockeyed which can cause

65
00:04:40,639 --> 00:04:48,919
these sorts of issues whether it's PCI Express devices having intermittent

66
00:04:44,160 --> 00:04:51,520
problems or RAM this isn't even on yet

67
00:04:48,919 --> 00:04:55,600
this is like the boot sequence this is the the ipmi for those of you who

68
00:04:53,919 --> 00:05:00,479
haven't seen the previous parts of these series or who understandably forget them

69
00:04:58,400 --> 00:05:07,680
at this point what you're looking at here is a one pyte of flash storage

70
00:05:04,720 --> 00:05:13,479
server which is a non-trivial thing to build because aside from just having

71
00:05:10,240 --> 00:05:15,800
enough Bays to put that many SSS in if

72
00:05:13,479 --> 00:05:23,440
you want to get anywhere near the full performance of these kokia drives you

73
00:05:18,960 --> 00:05:28,720
need a ton of computer so inside each of

74
00:05:23,440 --> 00:05:30,160
these six oneu servers here is 12 15

75
00:05:28,720 --> 00:05:36,639
tbte Drive so that's for a total of

76
00:05:32,440 --> 00:05:40,199
72 drives each of these CD 6r Enterprise

77
00:05:36,639 --> 00:05:43,520
drives is capable of a whopping 5.5 gab

78
00:05:40,199 --> 00:05:45,280
a second reads 4 GB a second rights all

79
00:05:43,520 --> 00:05:50,880
of these work together using a file system called W FS that's designed

80
00:05:47,800 --> 00:05:53,919
specifically for NVMe drives to achieve

81
00:05:50,880 --> 00:05:56,240
unbelievable performance of course in

82
00:05:53,919 --> 00:06:00,360
order to measure this performance you actually have to put some kind of load

83
00:05:58,880 --> 00:06:07,919
on the system that's where this guy comes in it has

84
00:06:02,880 --> 00:06:11,680
two 64 core epic CPUs eight of NVIDIA's

85
00:06:07,919 --> 00:06:15,240
a100 gpus those are critical to generate

86
00:06:11,680 --> 00:06:17,960
the load and it has a walking

87
00:06:15,240 --> 00:06:25,400
8200 gabit per second network connections and of course those run

88
00:06:20,720 --> 00:06:28,840
through this 32 Port 200 GB network

89
00:06:25,400 --> 00:06:31,720
switch to each of our servers down below

90
00:06:28,840 --> 00:06:36,120
that's a lot of high-speed connectivity but with great performance comes great

91
00:06:33,800 --> 00:06:43,160
power consumption and to run this thing full bore we needed to plug extension

92
00:06:38,720 --> 00:06:47,199
cords into five separate 15 amp 120 volt

93
00:06:43,160 --> 00:06:49,120
Breakers and a separate 208v 30 amp

94
00:06:47,199 --> 00:06:53,240
breaker and did I mention they're overclocked I mean I feel like I can

95
00:06:51,520 --> 00:06:57,800
tell from all the heat coming off of it it is flipping warm back here it's very

96
00:06:55,720 --> 00:07:02,520
uncomfortable to stand here another thing we ran into last time just to get

97
00:07:00,000 --> 00:07:05,879
the array started these servers were so out of sync just from Shipping like

98
00:07:04,199 --> 00:07:09,680
their time the time on the servers was so out of sync the real time clock you

99
00:07:07,280 --> 00:07:13,840
know could be 5 10 seconds off whatever that the array would not start and the

100
00:07:11,680 --> 00:07:18,080
easiest way to get them to sync up again is ntp Network time protocol and of

101
00:07:16,440 --> 00:07:22,560
course without networking this wasn't set up to actually connect to the

102
00:07:20,080 --> 00:07:27,520
internet all of these have static IPS that are assigned so in our router I had

103
00:07:24,520 --> 00:07:29,840
to create uh a VLAN that has the same

104
00:07:27,520 --> 00:07:33,879
subnet information and routed from the server room all the way over here to

105
00:07:31,560 --> 00:07:38,599
here and plug it into the switch had to go in the switch and tell it oh 10 gig

106
00:07:35,800 --> 00:07:42,120
is fine it doesn't have to be 200 gig and then they turned on no problem but I

107
00:07:40,280 --> 00:07:45,800
want to access the ipmi so we can see the power consumption sure so we've got

108
00:07:43,879 --> 00:07:48,520
to plug this bad boy into all the ipmi ports really quick oh okay so we need

109
00:07:47,280 --> 00:07:57,360
some patch cables patch cables are right in the front all right let's do it oh that's wild yeah the management port for

110
00:07:53,159 --> 00:07:59,919
this one is actually on the front oh man

111
00:07:57,360 --> 00:08:06,039
it's so much nicer in here I want to see all 72 drives is what I want to see you

112
00:08:02,240 --> 00:08:07,800
might be asking for a lot no no bare

113
00:08:06,039 --> 00:08:12,280
minimum when those four drives are missing the array just rebuilds oh God

114
00:08:10,520 --> 00:08:18,120
just give it a second oh sick it's working okay the cool thing though when those

115
00:08:16,000 --> 00:08:22,960
four drives are missing the array like rebuilt itself like it was nothing takes

116
00:08:20,080 --> 00:08:28,520
like 5 minutes cuz they're so fast fast yeah but we want the capacity want the

117
00:08:25,360 --> 00:08:30,840
whole point a petabyte a flash okay I I

118
00:08:28,520 --> 00:08:37,039
hate two burst your bubble for some reason they only configured it with a

119
00:08:32,680 --> 00:08:39,560
500 tbte like drive only 500

120
00:08:37,039 --> 00:08:43,560
terab I think we can expand it later but for the purposes of the first demo I

121
00:08:42,080 --> 00:08:51,080
think let's just leave it I don't want to break it I'm ssing into the the node

122
00:08:48,279 --> 00:08:54,880
oh look at that how many is that 1 2 3 4 5 6 7 8 good that's the correct number

123
00:08:53,519 --> 00:08:58,720
that's a lot of GPU it's a lot of freaking gpus is there is there NVIDIA

124
00:08:56,800 --> 00:09:06,040
SMI each of these is like oh look at that four 00 Watts 3200 watts of just

125
00:09:03,839 --> 00:09:09,640
GPU we're not connecting to these things over anything like SMB cuz that would be

126
00:09:08,079 --> 00:09:13,079
way too slow we're like directly connecting with the WCA interface and

127
00:09:11,800 --> 00:09:17,200
we're going to be using GPU direct storage which is super cool we'll talk a little bit more about that later yeah

128
00:09:15,399 --> 00:09:20,839
this visualization is super cool the green things I'm looking at I guess are

129
00:09:18,920 --> 00:09:25,120
drives and then the gray things are cores or no the gray things are drives

130
00:09:23,519 --> 00:09:30,240
the green things are core that makes way more sense actually and then blue is our

131
00:09:27,240 --> 00:09:32,440
network interfaces that is a pretty

132
00:09:30,240 --> 00:09:36,440
visual it oh yeah you see this look it's they're pinning cores to specific Drive

133
00:09:34,680 --> 00:09:40,880
slots it looks like there's some special sauce going on here I think there is

134
00:09:38,079 --> 00:09:44,800
some manual ass shiz that went on in the configuration of the system seriously

135
00:09:42,680 --> 00:09:52,079
though a lot of tuning because like I said before making a one petabyte flash

136
00:09:47,880 --> 00:09:54,760
server easy making one that performs

137
00:09:52,079 --> 00:09:58,600
near actual we're still not going to be hitting the peak you'll remember when we

138
00:09:56,519 --> 00:10:03,680
did the honey badger server we got like 100 gigs of second that was directly

139
00:10:01,399 --> 00:10:10,399
writing to individual drives there's no file system no no raid parody networking

140
00:10:07,600 --> 00:10:15,399
nothing was going on yeah this is a usable file system across six servers

141
00:10:14,279 --> 00:10:20,720
all right you ready to see some big numbers yes this one's not as exciting as you'd expect you have to you run it

142
00:10:19,200 --> 00:10:23,880
and then it shows you it like gives you a file with the numbers that's not that

143
00:10:22,480 --> 00:10:28,279
fast that's not very fast just's give it a sec okay ooh it's faster you said this

144
00:10:26,839 --> 00:10:32,360
was going to be cool it's trying different sizes this is probably like 4K

145
00:10:31,160 --> 00:10:37,839
right now the script is starting with GDs or GPU direct storage it sounds very

146
00:10:35,880 --> 00:10:41,120
complicated and realistically in practice setting it up was probably very

147
00:10:39,480 --> 00:10:45,440
complicated for somebody but the main difference is rather than taking the

148
00:10:43,079 --> 00:10:50,079
data from those ndme drives putting them into the CPU's memory and then into the

149
00:10:48,000 --> 00:10:54,720
GPU's memory we're skipping that middle step it goes right from NVMe to the

150
00:10:52,440 --> 00:10:57,880
GPU's memory oh there we go 15 gig second was CPU first then no it's doing

151
00:10:56,600 --> 00:11:06,440
GDs but like I said it's doing different block sizes you ready to see some numbers okay so this is GDs says GPU

152
00:11:03,920 --> 00:11:13,279
direct storage only read tests

153
00:11:08,800 --> 00:11:18,079
1006 gibes per second so that's like

154
00:11:13,279 --> 00:11:20,440
114 gigabytes per second over two with a

155
00:11:18,079 --> 00:11:26,519
file system Blu-rays per second like like full quality Big Boy the whole

156
00:11:23,000 --> 00:11:29,040
thing per second holy and again we've

157
00:11:26,519 --> 00:11:32,720
seen numbers like this before but not with a file system loaded on it no we

158
00:11:31,279 --> 00:11:36,279
could take oh you know W you could actually use this you want to like SMB

159
00:11:34,720 --> 00:11:39,519
for a second here I mean well we don't have a client that's anywhere near fast

160
00:11:38,120 --> 00:11:45,800
enough which I guess is a perfect opportunity for us to talk about what this would be for want to show us that

161
00:11:43,120 --> 00:11:50,399
visualization demo Humanity's desire to look beyond the stars is nothing new

162
00:11:48,519 --> 00:11:54,839
it's not and one of the major stepping stones in the space travel is finding a

163
00:11:52,680 --> 00:11:58,399
way to transport people in all seriousness though sending people to

164
00:11:56,279 --> 00:12:04,240
another planet is an incredibly complex challenge but one that modern Computing

165
00:12:01,360 --> 00:12:08,639
can help us with and one of the current proposed plans from NASA involves

166
00:12:05,880 --> 00:12:12,320
sending a six-person manned craft all the way to Mars after the 10-month

167
00:12:10,600 --> 00:12:17,839
journey the crew would transfer into a landing pod that is 16 m in diameter so

168
00:12:15,320 --> 00:12:22,199
roughly the size of a two-story house due to the sheer size of the Lander as

169
00:12:19,880 --> 00:12:26,000
well as the thin atmosphere on Mars it's not possible to use a parachute to slow

170
00:12:24,240 --> 00:12:31,680
down like they have on past Landings like the perseverance Rover so instead

171
00:12:28,720 --> 00:12:37,440
NASA wants to use retropulsion otherwise known as creating thrust in the opposite

172
00:12:34,440 --> 00:12:41,279
direction to slow the Lander down from

173
00:12:37,440 --> 00:12:44,000
12,000 M an hour to you know the ground

174
00:12:41,279 --> 00:12:48,000
in less than 7 minutes simulating this Landing is a very important part of the

175
00:12:46,199 --> 00:12:54,839
research and development for the project and with the power of the summit supercomputers 27,000 NVIDIA gpus over

176
00:12:53,240 --> 00:13:02,120
the course of a week they were able to build a 100 plus terabyte model of the

177
00:12:58,560 --> 00:13:04,320
landing that is over 1 billion Points

178
00:13:02,120 --> 00:13:08,600
each with seven numerical values density vorticity pressure that sort of thing

179
00:13:06,399 --> 00:13:14,519
and I guess we have NASA security clearance now because we have that exact

180
00:13:12,000 --> 00:13:19,240
model loaded up on the million dooll server and here it is and Jake's got it

181
00:13:16,800 --> 00:13:23,720
running right behind me now in the past Engineers would actually take this data

182
00:13:21,000 --> 00:13:28,240
model and render it out frame by frame into a video still super cool and very

183
00:13:26,199 --> 00:13:31,600
useful but if you want to tweak a parameter and see how how it affects the

184
00:13:29,639 --> 00:13:37,519
model guess what you're waiting hours for a new render with new technology

185
00:13:34,399 --> 00:13:39,600
like super fast NVMe storage uh NVMe

186
00:13:37,519 --> 00:13:45,639
speed file systems like w fs and the power of GPU direct well now we can have

187
00:13:43,240 --> 00:13:51,440
the graphics cards render the data directly from the storage bypassing the

188
00:13:47,560 --> 00:13:55,040
CPU allowing us to do this in basically

189
00:13:51,440 --> 00:13:57,399
real time real time manipulation that is

190
00:13:55,040 --> 00:14:02,560
I mean it's a it's a bit cinematic but we're getting we're getting like 5 FP

191
00:13:59,639 --> 00:14:09,560
5 FPS that's I mean a lot better than hold on hold on hold on 5 FPS each of

192
00:14:05,120 --> 00:14:11,680
these frames is over 14 gab of data yeah

193
00:14:09,560 --> 00:14:19,079
that's before it becomes a frame the gpus have to process that so each second

194
00:14:15,079 --> 00:14:22,279
we are streaming between 70 and 90 gab

195
00:14:19,079 --> 00:14:25,639
from that storage directly to those gpus

196
00:14:22,279 --> 00:14:29,440
so that's one to two Blu-rays each

197
00:14:25,639 --> 00:14:31,519
second can you like move it yeah can

198
00:14:29,440 --> 00:14:36,360
here let's yeah move it play it play it trust us this is really neat I wish

199
00:14:34,000 --> 00:14:39,120
there was a way for me to like translate how much is happening in the background

200
00:14:38,040 --> 00:14:44,600
I guess we could look at how much power it's drawing this is what you guys are

201
00:14:41,399 --> 00:14:47,120
going to want to see how many amps are

202
00:14:44,600 --> 00:14:53,519
going through our pdu here and that's just for this top server yeah why does

203
00:14:49,399 --> 00:14:56,040
it need 17 amps of 200 4,000 watts right

204
00:14:53,519 --> 00:14:59,399
now what the hell oh hey this is interesting look so one of our gpus you

205
00:14:58,000 --> 00:15:03,440
can see it's refreshing every every 2 seconds yeah they're kind of drawing a

206
00:15:01,240 --> 00:15:07,519
little more power now but still not anywhere near 400 W yeah that's

207
00:15:05,600 --> 00:15:13,720
interesting I guess this is just more storage dependent than it is actual GPU

208
00:15:11,320 --> 00:15:18,320
rendering dependent which makes sense that's they're pretty op gpus sort of

209
00:15:16,279 --> 00:15:23,399
why they sent this demo it's about the storage like moving it around a bit oh

210
00:15:21,000 --> 00:15:29,600
yeah they got a little Spike here you can see the utilization on the gpus is

211
00:15:25,920 --> 00:15:32,040
like 50 60% probably a lot of this

212
00:15:29,600 --> 00:15:36,040
is more the memory usage and less the actual core itself you can imagine the

213
00:15:34,440 --> 00:15:40,240
type of workflow Improvement you would have from being able to just mess around

214
00:15:38,240 --> 00:15:43,600
with this in real time rather than like rendering it waiting 3 days for it to

215
00:15:42,240 --> 00:15:48,120
finish rendering cuz you're at a university and you got a small budget

216
00:15:45,560 --> 00:15:53,120
works just like this segue to our sponsor vessie do you hate wet socks as

217
00:15:51,160 --> 00:15:57,399
much as I do vessie Footwear makes lightweight breathable and most

218
00:15:54,639 --> 00:16:02,000
importantly waterresistant shoes so no more squelchy socks their diamex

219
00:15:59,959 --> 00:16:06,000
material not only keeps your feet dry but keeps them warm in the winter and

220
00:16:03,920 --> 00:16:09,959
cool in the summer how does that work the stretchy design shows that Comfort

221
00:16:08,000 --> 00:16:13,959
is at the Forefront at times making you forget you're wearing shoes vessie makes

222
00:16:12,040 --> 00:16:19,040
cruelty-free products right down to the glue their shoes are 100% vegan whether

223
00:16:16,839 --> 00:16:23,160
it's a rainy city or a Rocky Trail the Herring bone tread design is there to

224
00:16:20,880 --> 00:16:27,040
help stop you from slipping around your feet deserve a little treat so go ahead

225
00:16:25,440 --> 00:16:31,839
and click the link below and use promo code Linus Tech tips to save $25 bucks

226
00:16:29,279 --> 00:16:35,639
on your first pair today if you guys enjoyed this video maybe check out the

227
00:16:33,399 --> 00:16:39,160
previous parts and uh I don't know maybe subscribe to Floatplane maybe we'll

228
00:16:37,079 --> 00:16:42,279
shoot an exclusive of packing them up no I don't think we'll do that I I like I

229
00:16:40,839 --> 00:16:46,759
got to wonder if there's anything else we could do I thought about video

230
00:16:43,880 --> 00:16:53,800
editing I would love to use these for the Machine Vision uh cat squirter

231
00:16:50,199 --> 00:16:53,800
turret project
