1
00:00:00,080 --> 00:00:07,279
back when we installed a petabyte worth of hard drives in our server closet we

2
00:00:04,400 --> 00:00:12,160
were sure that with that much storage we'd be good for a long time and in

3
00:00:10,000 --> 00:00:14,960
fairness i guess two years worth of red footage

4
00:00:13,200 --> 00:00:20,560
is pretty good but it finally happened we are

5
00:00:18,080 --> 00:00:25,519
critically low on space i've got less than five percent available on our main

6
00:00:22,800 --> 00:00:31,279
editing server but with only 20 terabytes available on the vault there

7
00:00:27,680 --> 00:00:33,920
is nowhere to dump it to fortunately

8
00:00:31,279 --> 00:00:38,160
i've got a band-aid solution the very best kind of solution

9
00:00:35,600 --> 00:00:42,719
seagate actually sent over these 15

10
00:00:39,360 --> 00:00:46,879
12 terabyte iron wolf pro drives for a

11
00:00:42,719 --> 00:00:49,760
totally unrelated project that um well

12
00:00:46,879 --> 00:00:53,520
didn't actually go very well so instead of using them for that we're

13
00:00:51,520 --> 00:00:57,840
going to use them to add more capacity to the vault

14
00:00:55,680 --> 00:01:02,320
speaking of the vault i keep all my segways in a vault ridge wallet is the

15
00:01:00,320 --> 00:01:06,479
sleek way to keep wallet bulge down with its compact frame and rfid blocking

16
00:01:04,239 --> 00:01:11,560
inner plates use the offer code ltt september to save 10

17
00:01:08,240 --> 00:01:19,600
and get free worldwide shipping

18
00:01:19,600 --> 00:01:26,159
all right so while i wait for anthony to come down i'm gonna run through how this

19
00:01:24,240 --> 00:01:31,920
whole thing is going to work with some really crude diagrams here our petabyte

20
00:01:28,960 --> 00:01:37,119
cluster uses a file system it's open source and it's called gluster fs that's

21
00:01:34,720 --> 00:01:43,280
what allows these two independent servers here to present to the rest of

22
00:01:39,360 --> 00:01:45,759
the network as a single large share now

23
00:01:43,280 --> 00:01:49,119
we could increase our capacity by adding more server boxes

24
00:01:48,000 --> 00:01:53,520
but that's a project for another day

25
00:01:51,280 --> 00:01:58,960
literally that's a project for another day we've got a video coming with a full

26
00:01:56,000 --> 00:02:03,280
petabyte of storage in a single server rather than two so make sure you're

27
00:02:00,719 --> 00:02:09,119
subscribed so you don't miss that today we're going to take the empty 15 bays in

28
00:02:06,640 --> 00:02:15,599
server delta 2 down here and we're going to expand our storage with another

29
00:02:12,200 --> 00:02:18,640
180 terabytes of raw capacity we don't

30
00:02:15,599 --> 00:02:20,480
get to use all of this capacity though

31
00:02:18,640 --> 00:02:24,400
our gloucester fs implementation is geared towards raw capacity

32
00:02:22,400 --> 00:02:30,239
rather than redundancy so the only fail-safe built into our local network

33
00:02:26,480 --> 00:02:31,360
here is the raid z2v devs in each of our

34
00:02:30,239 --> 00:02:37,280
machines what that means is that out of our 15

35
00:02:33,599 --> 00:02:40,400
drives only 13 of them count towards our

36
00:02:37,280 --> 00:02:43,360
capacity with the rest these two taken

37
00:02:40,400 --> 00:02:48,319
up by parody data that protects us from data loss in the event of a physical

38
00:02:45,360 --> 00:02:51,519
drive failure or a cable failure speaking of which

39
00:02:49,680 --> 00:02:56,560
how about we replace that uh bad cable in delta one oh that's gonna be quite a

40
00:02:53,840 --> 00:03:01,840
project uh guys uh the vault is going offline for

41
00:02:59,440 --> 00:03:06,319
probably about two hours tm just vault so wanaka is up and these

42
00:03:04,159 --> 00:03:10,480
servers are so heavy so we're gonna have to empty all the drives out of them onto

43
00:03:09,040 --> 00:03:14,560
one of these carts then take the server out put it on the

44
00:03:12,560 --> 00:03:18,159
cart and wheel it over to the island where we can work on it i'm still super

45
00:03:16,319 --> 00:03:22,080
proud of this cabinet door holding open innovation we drilled a hole in the side

46
00:03:20,159 --> 00:03:26,000
of the keyboard tray and now we can get the servers out super easily you know

47
00:03:24,159 --> 00:03:29,920
you can just take the door off right no i let the door has a filter

48
00:03:28,800 --> 00:03:33,920
like that's the purpose of the door for us anyway

49
00:03:32,319 --> 00:03:37,680
yes i know i could take the door off i mean with the sides off the back's off

50
00:03:36,159 --> 00:03:41,360
clearly we figured that out thank you youtube comments

51
00:03:39,280 --> 00:03:46,640
it's quick to label a drive it's slow to label 60 drives and this actually has to

52
00:03:44,000 --> 00:03:48,959
be legible that's important why don't you actually work your way from the

53
00:03:47,840 --> 00:03:53,360
front and i'll work my way from the back

54
00:03:51,200 --> 00:03:58,080
okay and we'll do this like uh you know lady and the eating the spaghetti

55
00:03:55,120 --> 00:04:01,599
style uh that's a strange mental image but okay speaking of

56
00:04:00,080 --> 00:04:05,680
terrible segues sure is cold in this server room good

57
00:04:03,200 --> 00:04:08,879
thing i'm wearing my ltt swacket it's a sweater it's a jacket ltt store.com

58
00:04:07,599 --> 00:04:14,879
don't worry about it i don't think most people feel bad for me struggling to

59
00:04:12,560 --> 00:04:19,440
stack all my hard drives that are very time consuming to stack

60
00:04:17,040 --> 00:04:23,919
why can't i stack all these hard drives if you could grab this end though and

61
00:04:21,759 --> 00:04:27,759
help me up onto the cart that would be swell

62
00:04:25,840 --> 00:04:32,240
oh it's hooked on my it's hooked on my fly okay i got it that's fine

63
00:04:30,560 --> 00:04:37,520
one thing we want to do as we're carting this over to the kitchen there is be

64
00:04:35,280 --> 00:04:40,320
really really gentle with the way that we're moving this

65
00:04:38,800 --> 00:04:45,360
this is about what

66
00:04:42,120 --> 00:04:47,440
350 terabytes of our company's valuable

67
00:04:45,360 --> 00:04:51,040
data on here right now and i remember patrick from serve the home telling me

68
00:04:49,040 --> 00:04:55,120
that yahoo had an incident where they moved their data center like across the

69
00:04:52,720 --> 00:04:59,440
parking lot rolling hard drives on carts not unlike this one and all the

70
00:04:57,120 --> 00:05:03,680
vibration killed like a significant portion of their drives so we could lose

71
00:05:01,120 --> 00:05:07,199
up to eight of them depending on oh seven because one of them is already

72
00:05:05,280 --> 00:05:10,080
degraded but but we don't want to do that so i guess this is the part of the

73
00:05:08,639 --> 00:05:15,280
video where we explain what happens when a cable fails in a storenator now most

74
00:05:13,520 --> 00:05:18,800
bulk storage servers with like lots and lots of hard drives use what's called a

75
00:05:16,800 --> 00:05:25,440
backplane so they'll take fewer connections off of your

76
00:05:21,440 --> 00:05:27,360
sata or your sas adapter or your raid

77
00:05:25,440 --> 00:05:31,120
card or whatever the case may be and then they will take those and they will

78
00:05:29,199 --> 00:05:36,000
split that bandwidth across multiple drives 45 drives takes a bit of a

79
00:05:33,360 --> 00:05:40,479
different approach so they wire every single drive up individually across less

80
00:05:39,039 --> 00:05:44,880
of a back plane and more of an underplane the advantage is you get the

81
00:05:42,560 --> 00:05:49,360
full bandwidth another advantage is that in the event that you a connection fails

82
00:05:47,680 --> 00:05:55,039
you're not replacing an entire costly back plane but the disadvantage is that if a cable

83
00:05:53,759 --> 00:05:59,919
fails you are digging this entire apparatus

84
00:05:58,080 --> 00:06:05,520
out to replace one flaky table so there were two drives

85
00:06:03,120 --> 00:06:09,280
that were dead uh one of them it was fine when we replaced it it was it came

86
00:06:07,280 --> 00:06:12,479
back up everything rebuilt and everything was rosy but then there was

87
00:06:10,960 --> 00:06:17,680
the other one we replaced it a couple times we actually replaced the controller cards itself and tried

88
00:06:16,080 --> 00:06:20,960
different ports on different controllers that we knew worked

89
00:06:19,440 --> 00:06:25,120
but it still had the same issue and what's weird is sometimes it would kind

90
00:06:23,039 --> 00:06:28,319
of work and we could start rebuilding the data on it but like what kind of

91
00:06:26,639 --> 00:06:32,639
data speeds were we getting it would start out at like you know 300

92
00:06:30,639 --> 00:06:36,400
400 megabytes per second which is kind of low but fine but then it would go

93
00:06:34,720 --> 00:06:42,080
down to like 10. yeah and the eta was like a year

94
00:06:39,280 --> 00:06:42,080
yeah like

95
00:06:42,560 --> 00:06:49,840
come on by the way evidence that my dust filter works just

96
00:06:47,199 --> 00:06:53,360
great yeah it looks brand new there is one little bit here that i

97
00:06:51,199 --> 00:06:57,840
noticed but that's it okay so let's not hate on my filtered front cabinet door

98
00:06:55,840 --> 00:07:03,919
there okay okay so this guy needs to come out now this

99
00:07:01,599 --> 00:07:09,360
is a little tricky does that entire plane need to come out i hope not yeah

100
00:07:06,960 --> 00:07:12,639
cause like i'm looking at this and this needs to go up under there in order to

101
00:07:10,880 --> 00:07:18,240
get screwed in there and in order to do that this like straight sucks

102
00:07:16,479 --> 00:07:21,039
this is exactly why i haven't had time to do it until now

103
00:07:22,080 --> 00:07:27,360
how long did we tell the editors this was going to be down for two hours

104
00:07:26,400 --> 00:07:31,520
um one thing i did notice though is that

105
00:07:29,520 --> 00:07:35,919
with how tightly integrated it is into the bottom of the case i

106
00:07:34,080 --> 00:07:38,800
like i don't know how i can't really get it very well

107
00:07:37,360 --> 00:07:43,199
okay they're shooting tech link now so we're going to have to do asmr server

108
00:07:40,400 --> 00:07:48,240
upgrade so i pulled these out and i can see where the cable goes so

109
00:07:45,840 --> 00:07:54,080
yes i will in fact have to pull out this and this i don't see anything obviously

110
00:07:51,199 --> 00:07:54,080
defective about it

111
00:07:54,879 --> 00:08:02,160
that's an angry episode of teclint someone must have removed some headphone

112
00:07:58,400 --> 00:08:02,160
jacks i hope it wasn't samsung

113
00:08:03,759 --> 00:08:11,440
okay so this is it we begin the funeral procession again and this way

114
00:08:08,720 --> 00:08:15,039
it's an open casket oh yeah should we close this server

115
00:08:13,919 --> 00:08:17,919
maybe like now is as good a time as any to do

116
00:08:16,960 --> 00:08:23,199
that uh if we're gonna do full yellow and assume

117
00:08:21,039 --> 00:08:26,479
that we have everything right then yes otherwise no

118
00:08:24,400 --> 00:08:30,319
otherwise no let's go now let's put some hard drives in

119
00:08:28,319 --> 00:08:34,959
yeah having to keep track of everything sucks should have just got a jellyfish

120
00:08:33,200 --> 00:08:40,000
so that drive with the rubbed off label that i wasn't 100 sure about i got to

121
00:08:37,279 --> 00:08:44,640
the end and the only thing left is 131 which is clearly not that unless i have

122
00:08:41,839 --> 00:08:48,000
a wicked case of dyslexia and i put 131 over here so i think it's time for a

123
00:08:46,320 --> 00:08:51,600
sanity check here we go ladies and gentlemen

124
00:08:49,839 --> 00:08:55,600
okay uh anthony you got the drives do you want me to turn it on first should

125
00:08:53,760 --> 00:08:59,839
we do it yeah it's not hot swap if it's not on all

126
00:08:58,240 --> 00:09:05,040
right we'll hot swap it we'll hot swap it i'm turning it on

127
00:09:02,720 --> 00:09:09,839
okay here we go uh anthony you want to take the wheel here sure all right

128
00:09:11,200 --> 00:09:16,800
let's see z pool status okay so guys you

129
00:09:14,720 --> 00:09:24,080
can actually see here what was going on with one of our raid z2s so each of the

130
00:09:20,080 --> 00:09:28,399
15 drives is a raid z2 v-dev so this

131
00:09:24,080 --> 00:09:30,399
raid z2 raids 2-0 is online you can see

132
00:09:28,399 --> 00:09:35,440
the whole z-pool is degraded though that's because raid z2 won here drive

133
00:09:33,279 --> 00:09:38,640
117 the one that we just replaced the cable for

134
00:09:36,560 --> 00:09:42,800
is unavailable and then these are all the previous attempts at rebuilding it

135
00:09:40,640 --> 00:09:48,160
with different drives now we're going to try again but with a new cable so we

136
00:09:45,600 --> 00:09:52,720
fixed 117 but now we've got four drives offline oh balls that's like way down

137
00:09:51,200 --> 00:09:56,560
there you want to let me know if anything changed five six seven and

138
00:09:55,040 --> 00:10:01,839
eight disappeared they're unavailable so that's the wrong one then yep

139
00:09:58,640 --> 00:10:03,760
damn it okay so we're back

140
00:10:01,839 --> 00:10:09,200
and all the drives are here but they're not showing up with their um

141
00:10:06,880 --> 00:10:13,200
like their 45 drive storinator friendly ids here and also

142
00:10:11,200 --> 00:10:17,040
five of them are re-silvering that seems bad how do you re-silver five drives

143
00:10:15,279 --> 00:10:21,760
these are re-silvering because they got cut without being offline first uh in

144
00:10:19,440 --> 00:10:26,079
the meantime should we add the other 15 drives to delta ii

145
00:10:23,519 --> 00:10:29,519
what could go wrong what could go wrong so do we need to make a brick

146
00:10:29,920 --> 00:10:35,519
because that's gluster's crap no we don't need to do that with buster it's

147
00:10:33,200 --> 00:10:37,920
all in under slash z pool pretty sure okay

148
00:10:36,560 --> 00:10:42,959
i don't see anything here that looks like just making a v dab

149
00:10:40,000 --> 00:10:47,760
yeah me neither oh okay so we just do z pull create z pool

150
00:10:45,360 --> 00:10:53,360
raid z2 and then the paths to the disks but if our disks don't show up

151
00:10:50,640 --> 00:10:58,160
okay hold on so let's do this and it is online

152
00:10:55,360 --> 00:11:01,519
so we might need to restart oh okay so maybe they're not hot

153
00:10:59,600 --> 00:11:05,600
swappable it should be but maybe not

154
00:11:04,079 --> 00:11:10,959
i don't know if they're configured the server to not find them yeah their special driver

155
00:11:08,800 --> 00:11:15,519
might actually not do that okay well let's see

156
00:11:12,959 --> 00:11:20,320
in the meantime we can check in on delta one and see if it's re-silvering a

157
00:11:17,360 --> 00:11:24,000
little faster now it is not so now what we've replaced literally

158
00:11:22,480 --> 00:11:28,560
everything the drive the cable the controller i

159
00:11:26,959 --> 00:11:32,959
mean do we want to just pop the drive out and pop it in one more

160
00:11:30,320 --> 00:11:35,200
time and see what happens we can try it okay

161
00:11:36,000 --> 00:11:43,200
brand new drive okay so delta ii is rebooted now and

162
00:11:41,120 --> 00:11:48,959
we've got 1.1 okay so i guess

163
00:11:45,760 --> 00:11:50,399
throw them all in yeah and see if we get

164
00:11:48,959 --> 00:11:56,320
them all and if so we'll create the raid z2 then at least

165
00:11:53,200 --> 00:11:58,800
it may be degraded but it's bigger

166
00:11:56,320 --> 00:12:02,640
so all 15 drives for the expansion are in delta ii now and i switched back over

167
00:12:01,200 --> 00:12:06,880
to delta one and i have good news

168
00:12:04,640 --> 00:12:10,000
our re-silvering is going at 1.91 gigabytes a second

169
00:12:08,560 --> 00:12:14,880
which is pretty sweet that means it should only take a few days

170
00:12:11,920 --> 00:12:17,519
so 117 it's not there is it it's there okay but

171
00:12:16,639 --> 00:12:22,800
but when i try to replace it new device is a different optimal sector size what the

172
00:12:21,120 --> 00:12:28,240
crap so i need to figure out what that is well hold on hold on that's not like a

173
00:12:26,000 --> 00:12:31,279
4k sector drive is it

174
00:12:29,600 --> 00:12:35,200
it might be because that would probably i don't i don't know this for a fact

175
00:12:33,680 --> 00:12:40,880
but i suspect mixing 4k sector drives with

176
00:12:37,760 --> 00:12:43,279
512 sector drives in in some kind of an

177
00:12:40,880 --> 00:12:46,480
array is probably super terrible so do you want it to soft or it's offline it's

178
00:12:44,800 --> 00:12:50,639
already offline yet okay did we accidentally buy the wrong drives

179
00:12:48,000 --> 00:12:55,440
anthony oh my god that couldn't be like the problem could

180
00:12:52,880 --> 00:12:58,560
it they're not are these advanced format yeah yeah

181
00:12:57,360 --> 00:13:03,600
wait you have got to be kidding me

182
00:13:01,519 --> 00:13:07,920
is that why these replace operations haven't been working possibly you know

183
00:13:06,240 --> 00:13:11,680
what we can check the original video unboxing the petabyte project it's a

184
00:13:09,760 --> 00:13:15,360
good thing our entire life exists on youtube at least mine does

185
00:13:13,519 --> 00:13:19,440
okay where's some b-roll of a hard drive here oh no they are advanced format so

186
00:13:17,760 --> 00:13:22,800
they're all there but they're not there

187
00:13:26,480 --> 00:13:34,880
there we go new raid z2 raid z23

188
00:13:31,360 --> 00:13:37,040
includes 15 drives all online 117 is now

189
00:13:34,880 --> 00:13:41,440
back online re-silvering and it's doing it at almost 900 megabytes per second

190
00:13:39,839 --> 00:13:45,760
so we're good so

191
00:13:43,519 --> 00:13:49,760
uh i did notice curiously

192
00:13:47,200 --> 00:13:52,560
that the vault is still not any bigger right

193
00:13:50,959 --> 00:13:55,839
i believe the gluster at best service either needs to be restarted or i might

194
00:13:54,320 --> 00:13:59,680
need to adjust the config real quick okay uh so let me just quickly look at

195
00:13:58,880 --> 00:14:05,519
this in delta wait is it one of these uh disc.conf is it this dot config here oh

196
00:14:03,760 --> 00:14:11,440
yeah there it is hey i helped

197
00:14:07,600 --> 00:14:11,440
did i help on a linux thing yes or no

198
00:14:12,480 --> 00:14:18,000
i did usually i'm not nearly as helpful when

199
00:14:15,600 --> 00:14:23,360
it comes to server stuff okay so final update guys it was just a matter of

200
00:14:20,160 --> 00:14:26,160
getting volume four's brick integrated

201
00:14:23,360 --> 00:14:30,720
into gluster so you can see everything's here volume one two three one two three

202
00:14:29,199 --> 00:14:35,279
four four sure whatever doesn't matter no big deal

203
00:14:33,360 --> 00:14:38,399
so now we're gonna fire over to our other server this is wanic server it

204
00:14:36,800 --> 00:14:42,240
runs windows whatever want to fight about it

205
00:14:40,240 --> 00:14:45,839
we've got our z drive which is actually even tighter for storage than it was

206
00:14:44,160 --> 00:14:52,399
before we have less than a terabyte of space left but that's no problem the

207
00:14:48,800 --> 00:14:54,959
vault 163 terabytes ready to freaking go

208
00:14:52,399 --> 00:15:01,120
and as for delta one it is re-silvering at 375 megabytes a second i can push

209
00:14:58,160 --> 00:15:05,600
this back in very ever so slowly and gently and we're done bud by the way if

210
00:15:03,839 --> 00:15:10,240
you guys liked this video we actually did a video rebuilding our water cooled

211
00:15:07,920 --> 00:15:13,279
render server down here uh you can check that out up there

212
00:15:11,680 --> 00:15:17,760
and if you're not into that you can check out our sponsor for today's video

213
00:15:15,519 --> 00:15:22,240
ifixit the ifixit essential electronics toolkit is compact so it can go anywhere

214
00:15:19,760 --> 00:15:25,360
you can and help you fix almost anything it includes their most popular precision

215
00:15:23,839 --> 00:15:29,279
bits and they're held in place with high density foam so you can throw it around

216
00:15:27,519 --> 00:15:34,160
without any of the bits falling out and of course it comes with ifixit's

217
00:15:31,199 --> 00:15:38,320
lifetime warranty it's just 24.99 at ifixit.com forward slash linus so go

218
00:15:36,800 --> 00:15:43,199
check it out today that's it for this video i will see you guys next time when

219
00:15:40,959 --> 00:15:45,680
we're installing the single box petabyte project
