100TB for $2,850??? - Are Archive Drives Useless?

Linus Tech Tips ·Linus Tech Tips ·2017-05-06 · 1,355 words · ~6 min read
Floatplane YouTube

Transcript

JSON SRT VTT 110
0:00 Seagate archive drives. These things are
0:03 freaking cheap for how much capacity you
0:06 get. I can actually link my drive cost calculator spreadsheet that I used to
0:12 make this chart under the video, by the way. But when I started looking into
0:16 picking up some of these drives for our long-term storage NAS, I heard the
0:22 performance totally sucked. So, I asked Seagate to send a few of them over, and
0:27 I went on a mission to figure out if
0:30 there's a way to mask their performance penalty while still getting the cost
0:35 benefit to build the cheapest 100 TBTE
0:39 storage box possible. Well, my original
0:42 concept ended up totally not working. That's a new one, right? But I learned a
0:47 bunch of interesting stuff in the process, and here it is.
1:00 Cooler Master's Mastercase Maker 5 features their free form modular system,
1:05 allowing you to customize, adjust, and upgrade. Make it yours at the link in
1:09 the video description. Now, before I can explain why archive
1:14 drives are so cheap and at the same time
1:18 why their performance is less than ideal
1:21 for certain applications, we need a little bit of background. Without
1:25 getting into too much grimy detail, data is stored on hard drives by arranging
1:30 the polarity of the tiny magnets that
1:34 cover the hard discshaped thing inside called a platter according to the
1:39 instructions given by your operating system. A magnetized bit is interpreted
1:44 as a one and a non-magnetized bit is interpreted as a zero. So you lay down a
1:49 few billion ones and zeros in the right order, read them back, and boom, next
1:54 thing you know, you're playing Crisis 3. Okay, then. So, traditionally, these
1:59 little magnets were arranged laying flat
2:02 in concentric circles on the platter.
2:05 This is called longitudinal magnetic recording. It's easier. But eventually,
2:11 hard drive manufacturers ran out of room
2:14 and couldn't increase capacity anymore without making their platters so big
2:19 that the latency penalty of moving the read and write heads around would be too
2:24 high. Not to mention that I'm pretty sure that no one wants a 10 terbte disc
2:29 in their laptop if it has to be the size of a vinyl freaking record. So the first
2:33 solution then was perpendicular magnetic recording. standing those magnets up
2:39 instead of laying them down. This required more complex read and write
2:44 heads, the uh the record needle type ARM that moves around and makes that ticking
2:49 noise whenever your drive is working hard, but has gotten us all the way to
2:54 10 terabytes so far with maybe a little
2:58 bit more headroom left before the magnets again just can't get any
3:02 smaller, which is where shingled magnetic recording comes in. Now the
3:08 read component of the head, remember the record needle thing, is narrower than
3:13 the right component. So by layering the magnetic tracks half on top of each
3:19 other, like the shingles on a roof, much more data can be stored without moving
3:24 to more exotic materials to make the magnets smaller or even drastically
3:29 redesigning the heads. Unfortunately, this means that while you can read at
3:35 pretty much full speed, the 8 TBTE archive drives that we used for our test
3:40 are rated at 190 megabytes per second reads, way more than enough for the
3:45 gigabit networks that most home and small office users are running. Write
3:49 speeds can be devastatingly slow, especially when they're random. You see,
3:55 the right head is so wide that it would
3:58 actually overwrite both the intended track and the next one over on the
4:04 drive. So, it has to read the data that it's going to accidentally overwrite.
4:09 store that somewhere else, either in a solid state cache or in a reserved part
4:14 of the disk platter somewhere else, organize it, and then finally
4:19 sequentially write back both the data it's supposed to be writing in the first
4:22 place and that data it had to shuffle.
4:26 This is called a read modify write and it can be slow as all hell. So, let's
4:32 talk then about my idea. I wanted to use
4:35 the reasonable read speeds, the low cost, and the 247 operation ratings of
4:41 archive drives in one of my lime unrade
4:45 systems. I wanted to combine that with
4:48 the reliability and all-around high performance of enterprise capacity
4:53 drives to get the best of both worlds. So the way unrade works is that your
4:58 data is actually written directly to the individual discs in the array which is
5:03 great because in the event of a catastrophic failure let's say you lose
5:07 two drives simultaneously at least anything written to the rest of the
5:10 drives is still there and an additional
5:14 drive or two drives acts as a parody
5:17 disc that lets data from a single or two
5:20 depending how many parody discs you have failed discs be rebuilt in the event of
5:25 a less catastrophic failure. The problem
5:28 is that while archive drives seem to be okay as standalone individual discs, the
5:34 worst use case I could find for them was in parody protected RAID arrays with
5:39 their poor random performance being pointed to as an unnecessary risk during
5:45 a rebuild operation. So the data rebuilding process actually puts more
5:50 strain than normal on the rest of the drives. And so the data across all the
5:55 discs is in jeopardy until the corrupted or failed drives data has been rebuilt.
6:01 So now we're 70% of the way through the video and we finally come to my idea. I
6:06 figured by using archive drives in the array and an enterprise drive for parody
6:13 and to replace any failed archive drives, I could mask both the poor
6:18 random write performance and the slow
6:21 rebuild times of the archive drives. And
6:25 as you'll see from these performance numbers, it didn't work out that way at
6:29 all. So uh my heterogeneous drive mixture configuration had worse
6:34 performance than both all enterprise
6:37 capacity drives which I expected and
6:41 worse than a pure archive drive setup
6:44 which I suspect is due to the mismatched disc spindle speed. So, that's kind of a
6:49 drag, I guess. But there's some good news here for me anyway. And that is
6:53 that in an unrade environment, I can either settle for 50 megabyte per second
6:58 write speeds, about half of what a gigabit network can handle, in the
7:02 default configuration where it spins up only the disc to which it's writing
7:05 directly, and the parody disc to reduce power consumption and discare at the
7:09 cost of performing read modify write operations all the time. Or if I use
7:13 their turborite mode that spins all the discs during access, allowing for much
7:18 faster reconstruct writes, I can still,
7:21 even with the cheapest drives I could find that are rated for 247 operation,
7:26 get my 100 megabytes per second since I'm not striping data the way that I
7:31 would in a more traditional RAID, which to be clear, archive drives still are
7:36 not recommended for. So, thanks for watching, guys. If this video sucked,
7:40 you know what to do. But if it was awesome, get subscribed, hit that like button, or maybe even check out the link
7:44 to where to buy the stuff that we featured at Amazon. In the video
7:47 description, I have my full hard drive like NAS capacity and price calculator
7:52 Excel sheet down there, which you can, you're more than welcome to try out.
7:56 Also linked in the description is our merch store, which has cool shirts like this one, and our community forum, which
8:01 you should totally join. Now that you're done doing all that stuff, you're probably wondering what to watch next.
8:05 So, click that little button in the top right corner to check out our video from
8:09 last year, which inspired a lot of this storage server stuff that I've been
8:13 doing, where we lost pretty much all of our data
8:17 temporarily. Or did I ruin the suspense? I don't know.