100TB for $2,850??? - Are Archive Drives Useless?
Linus Tech Tips
·Linus Tech Tips
·2017-05-06
·
1,355 words · ~6 min read
0:00
Seagate archive drives. These things are
0:03
freaking cheap for how much capacity you
0:06
get. I can actually link my drive cost calculator spreadsheet that I used to
0:12
make this chart under the video, by the way. But when I started looking into
0:16
picking up some of these drives for our long-term storage NAS, I heard the
0:22
performance totally sucked. So, I asked Seagate to send a few of them over, and
0:27
I went on a mission to figure out if
0:30
there's a way to mask their performance penalty while still getting the cost
0:35
benefit to build the cheapest 100 TBTE
0:39
storage box possible. Well, my original
0:42
concept ended up totally not working. That's a new one, right? But I learned a
0:47
bunch of interesting stuff in the process, and here it is.
1:00
Cooler Master's Mastercase Maker 5 features their free form modular system,
1:05
allowing you to customize, adjust, and upgrade. Make it yours at the link in
1:09
the video description. Now, before I can explain why archive
1:14
drives are so cheap and at the same time
1:18
why their performance is less than ideal
1:21
for certain applications, we need a little bit of background. Without
1:25
getting into too much grimy detail, data is stored on hard drives by arranging
1:30
the polarity of the tiny magnets that
1:34
cover the hard discshaped thing inside called a platter according to the
1:39
instructions given by your operating system. A magnetized bit is interpreted
1:44
as a one and a non-magnetized bit is interpreted as a zero. So you lay down a
1:49
few billion ones and zeros in the right order, read them back, and boom, next
1:54
thing you know, you're playing Crisis 3. Okay, then. So, traditionally, these
1:59
little magnets were arranged laying flat
2:02
in concentric circles on the platter.
2:05
This is called longitudinal magnetic recording. It's easier. But eventually,
2:11
hard drive manufacturers ran out of room
2:14
and couldn't increase capacity anymore without making their platters so big
2:19
that the latency penalty of moving the read and write heads around would be too
2:24
high. Not to mention that I'm pretty sure that no one wants a 10 terbte disc
2:29
in their laptop if it has to be the size of a vinyl freaking record. So the first
2:33
solution then was perpendicular magnetic recording. standing those magnets up
2:39
instead of laying them down. This required more complex read and write
2:44
heads, the uh the record needle type ARM that moves around and makes that ticking
2:49
noise whenever your drive is working hard, but has gotten us all the way to
2:54
10 terabytes so far with maybe a little
2:58
bit more headroom left before the magnets again just can't get any
3:02
smaller, which is where shingled magnetic recording comes in. Now the
3:08
read component of the head, remember the record needle thing, is narrower than
3:13
the right component. So by layering the magnetic tracks half on top of each
3:19
other, like the shingles on a roof, much more data can be stored without moving
3:24
to more exotic materials to make the magnets smaller or even drastically
3:29
redesigning the heads. Unfortunately, this means that while you can read at
3:35
pretty much full speed, the 8 TBTE archive drives that we used for our test
3:40
are rated at 190 megabytes per second reads, way more than enough for the
3:45
gigabit networks that most home and small office users are running. Write
3:49
speeds can be devastatingly slow, especially when they're random. You see,
3:55
the right head is so wide that it would
3:58
actually overwrite both the intended track and the next one over on the
4:04
drive. So, it has to read the data that it's going to accidentally overwrite.
4:09
store that somewhere else, either in a solid state cache or in a reserved part
4:14
of the disk platter somewhere else, organize it, and then finally
4:19
sequentially write back both the data it's supposed to be writing in the first
4:22
place and that data it had to shuffle.
4:26
This is called a read modify write and it can be slow as all hell. So, let's
4:32
talk then about my idea. I wanted to use
4:35
the reasonable read speeds, the low cost, and the 247 operation ratings of
4:41
archive drives in one of my lime unrade
4:45
systems. I wanted to combine that with
4:48
the reliability and all-around high performance of enterprise capacity
4:53
drives to get the best of both worlds. So the way unrade works is that your
4:58
data is actually written directly to the individual discs in the array which is
5:03
great because in the event of a catastrophic failure let's say you lose
5:07
two drives simultaneously at least anything written to the rest of the
5:10
drives is still there and an additional
5:14
drive or two drives acts as a parody
5:17
disc that lets data from a single or two
5:20
depending how many parody discs you have failed discs be rebuilt in the event of
5:25
a less catastrophic failure. The problem
5:28
is that while archive drives seem to be okay as standalone individual discs, the
5:34
worst use case I could find for them was in parody protected RAID arrays with
5:39
their poor random performance being pointed to as an unnecessary risk during
5:45
a rebuild operation. So the data rebuilding process actually puts more
5:50
strain than normal on the rest of the drives. And so the data across all the
5:55
discs is in jeopardy until the corrupted or failed drives data has been rebuilt.
6:01
So now we're 70% of the way through the video and we finally come to my idea. I
6:06
figured by using archive drives in the array and an enterprise drive for parody
6:13
and to replace any failed archive drives, I could mask both the poor
6:18
random write performance and the slow
6:21
rebuild times of the archive drives. And
6:25
as you'll see from these performance numbers, it didn't work out that way at
6:29
all. So uh my heterogeneous drive mixture configuration had worse
6:34
performance than both all enterprise
6:37
capacity drives which I expected and
6:41
worse than a pure archive drive setup
6:44
which I suspect is due to the mismatched disc spindle speed. So, that's kind of a
6:49
drag, I guess. But there's some good news here for me anyway. And that is
6:53
that in an unrade environment, I can either settle for 50 megabyte per second
6:58
write speeds, about half of what a gigabit network can handle, in the
7:02
default configuration where it spins up only the disc to which it's writing
7:05
directly, and the parody disc to reduce power consumption and discare at the
7:09
cost of performing read modify write operations all the time. Or if I use
7:13
their turborite mode that spins all the discs during access, allowing for much
7:18
faster reconstruct writes, I can still,
7:21
even with the cheapest drives I could find that are rated for 247 operation,
7:26
get my 100 megabytes per second since I'm not striping data the way that I
7:31
would in a more traditional RAID, which to be clear, archive drives still are
7:36
not recommended for. So, thanks for watching, guys. If this video sucked,
7:40
you know what to do. But if it was awesome, get subscribed, hit that like button, or maybe even check out the link
7:44
to where to buy the stuff that we featured at Amazon. In the video
7:47
description, I have my full hard drive like NAS capacity and price calculator
7:52
Excel sheet down there, which you can, you're more than welcome to try out.
7:56
Also linked in the description is our merch store, which has cool shirts like this one, and our community forum, which
8:01
you should totally join. Now that you're done doing all that stuff, you're probably wondering what to watch next.
8:05
So, click that little button in the top right corner to check out our video from
8:09
last year, which inspired a lot of this storage server stuff that I've been
8:13
doing, where we lost pretty much all of our data
8:17
temporarily. Or did I ruin the suspense? I don't know.