{"video_id":"xWjOh0Ph8uM","title":"This Server Deployment was HORRIBLE","channel":"Linus Tech Tips","show":"Linus Tech Tips","published_at":"2020-05-05T14:53:29Z","duration_s":1061,"segments":[{"start_s":0.16,"end_s":7.2,"text":"when i signed off at the end of the video about our amazing fast new","speaker":null,"is_sponsor":0},{"start_s":4.4,"end_s":11.759,"text":"all SSD storage server i thought it was as simple as okay let's","speaker":null,"is_sponsor":0},{"start_s":9.76,"end_s":14.88,"text":"load the final os on this thing chuck it in the server room we're ready to start","speaker":null,"is_sponsor":0},{"start_s":13.04,"end_s":20.24,"text":"editing off of it but it wasn't so our story begins with","speaker":null,"is_sponsor":0},{"start_s":18.08,"end_s":24.48,"text":"some short video clips that i sent over to wendell from level one text","speaker":null,"is_sponsor":0},{"start_s":21.76,"end_s":32.88,"text":"complaining hey about Windows storage spaces on our new 24 drive NVMe server","speaker":null,"is_sponsor":0},{"start_s":29.84,"end_s":35.12,"text":"machine here because what was happening","speaker":null,"is_sponsor":0},{"start_s":32.88,"end_s":40.239,"text":"was while i was copying files to what should be one of the fastest storage","speaker":null,"is_sponsor":0},{"start_s":37.52,"end_s":44.96,"text":"servers on the freaking planet i was getting great performance sometimes and","speaker":null,"is_sponsor":0},{"start_s":42.879,"end_s":50.32,"text":"then rock bottom performance others we're talking like","speaker":null,"is_sponsor":0},{"start_s":46.64,"end_s":52.48,"text":"10 20 30 megabytes a second so wendell","speaker":null,"is_sponsor":0},{"start_s":50.32,"end_s":58.079,"text":"dug into the system logs and discovered that there was some kind of a problem at","speaker":null,"is_sponsor":0},{"start_s":54.719,"end_s":61.68,"text":"the driver or pci express level where it","speaker":null,"is_sponsor":0},{"start_s":58.079,"end_s":64.159,"text":"was actually resetting individual drives","speaker":null,"is_sponsor":0},{"start_s":61.68,"end_s":69.68,"text":"like they were effectively timing out for seconds at a time while the data was","speaker":null,"is_sponsor":0},{"start_s":67.439,"end_s":74.0,"text":"in flight and then the poor array would be sitting there trying to figure out","speaker":null,"is_sponsor":0},{"start_s":71.28,"end_s":78.72,"text":"what to do while a drive is effectively mia then the drive reset would finish","speaker":null,"is_sponsor":0},{"start_s":77.04,"end_s":82.4,"text":"which is essentially like if you were pulling a drive out for like two seconds","speaker":null,"is_sponsor":0},{"start_s":80.32,"end_s":87.04,"text":"and then popping it back in and then the transfer would roll along at multiple","speaker":null,"is_sponsor":0},{"start_s":84.64,"end_s":92.159,"text":"hundreds of megabytes a second or we even saw at times numbers as high as 20","speaker":null,"is_sponsor":0},{"start_s":90.08,"end_s":95.2,"text":"plus gigabytes a second in crystal disk mark","speaker":null,"is_sponsor":0},{"start_s":93.36,"end_s":100.4,"text":"then it would hitch again rinse and repeat obviously i can't deploy it like","speaker":null,"is_sponsor":0},{"start_s":97.759,"end_s":104.96,"text":"that so i thought it was my knowledge of Windows storage spaces or lack thereof","speaker":null,"is_sponsor":0},{"start_s":102.88,"end_s":110.799,"text":"and that i had configured it wrong but then the mystery deepened so this dropping","speaker":null,"is_sponsor":0},{"start_s":108.64,"end_s":116.159,"text":"out behavior actually happened with a simple Windows software raid with just","speaker":null,"is_sponsor":0},{"start_s":113.84,"end_s":120.719,"text":"four devices in it i mean that's a relatively pedestrian","speaker":null,"is_sponsor":0},{"start_s":118.079,"end_s":123.92,"text":"16 gigabytes a second by the way if guys our sponsor for this","speaker":null,"is_sponsor":0},{"start_s":122.24,"end_s":128.479,"text":"video pulse way with pulseway you can remotely monitor manage and control all","speaker":null,"is_sponsor":0},{"start_s":125.84,"end_s":132.959,"text":"your Windows mac and Linux machines from one app create your free account today","speaker":null,"is_sponsor":0},{"start_s":130.56,"end_s":136.64,"text":"at the link below so we tried all the usual things we tried updating the","speaker":null,"is_sponsor":0},{"start_s":135.04,"end_s":141.12,"text":"drivers it was using the microsoft drivers we put the latest Intel drivers","speaker":null,"is_sponsor":0},{"start_s":138.879,"end_s":144.64,"text":"for these NVMe devices onto the system that didn't work we tried tweaking the","speaker":null,"is_sponsor":0},{"start_s":142.8,"end_s":149.44,"text":"power management to prevent the pci express lanes from switching to lower","speaker":null,"is_sponsor":0},{"start_s":147.599,"end_s":153.84,"text":"speeds when we were accessing all the drives and that could be a desirable","speaker":null,"is_sponsor":0},{"start_s":152.0,"end_s":157.76,"text":"behavior because there's so many drives in here that you're going to run into","speaker":null,"is_sponsor":0},{"start_s":155.519,"end_s":162.319,"text":"other system bottlenecks before you could possibly hope to use all the","speaker":null,"is_sponsor":0},{"start_s":159.2,"end_s":164.879,"text":"bandwidth of even a pci gen 3 link so","speaker":null,"is_sponsor":0},{"start_s":162.319,"end_s":169.28,"text":"gen 2 could be a pretty good bet but when it's happening automatically this","speaker":null,"is_sponsor":0},{"start_s":166.879,"end_s":173.84,"text":"speed switching takes time and that could be part of","speaker":null,"is_sponsor":0},{"start_s":171.599,"end_s":177.12,"text":"what's causing the problems but neither of those things or both of them were","speaker":null,"is_sponsor":0},{"start_s":175.36,"end_s":182.159,"text":"able to solve the problem and we only got a small improvement in the behavior","speaker":null,"is_sponsor":0},{"start_s":179.04,"end_s":185.12,"text":"so wendell suggested gee why don't we go","speaker":null,"is_sponsor":0},{"start_s":182.159,"end_s":189.76,"text":"over to Linux as he tends to do but then get this we got the same dropouts on","speaker":null,"is_sponsor":0},{"start_s":188.48,"end_s":194.48,"text":"Linux that seemed to suggest a hardware issue","speaker":null,"is_sponsor":0},{"start_s":192.48,"end_s":198.08,"text":"of some sort so guys this is why i ultimately made this","speaker":null,"is_sponsor":0},{"start_s":196.159,"end_s":201.84,"text":"video about it because this is pretty dry technical stuff for a lot of people","speaker":null,"is_sponsor":0},{"start_s":199.92,"end_s":208.48,"text":"but i thought it was fascinating NVMe is already so fast that a lot of","speaker":null,"is_sponsor":0},{"start_s":205.76,"end_s":212.0,"text":"stuff particularly software is not engineered for it which is turning out","speaker":null,"is_sponsor":0},{"start_s":210.159,"end_s":216.56,"text":"to be a bit of an industry-wide problem and when you take 24 of these drives","speaker":null,"is_sponsor":0},{"start_s":214.56,"end_s":222.56,"text":"that are capable of multiple gigabytes a second on paper that is now 24 times the","speaker":null,"is_sponsor":0},{"start_s":220.239,"end_s":227.04,"text":"problem think about it this way even with eight channels of memory which is","speaker":null,"is_sponsor":0},{"start_s":224.64,"end_s":233.12,"text":"pretty impressive the theoretical maximum memory bandwidth of our system","speaker":null,"is_sponsor":0},{"start_s":229.599,"end_s":234.959,"text":"here is around 200 gigabytes a second","speaker":null,"is_sponsor":0},{"start_s":233.12,"end_s":240.319,"text":"and real world you're looking at more like 100 to 150 gigabytes a second","speaker":null,"is_sponsor":0},{"start_s":238.56,"end_s":245.599,"text":"now let's talk about this storage array here this","speaker":null,"is_sponsor":0},{"start_s":242.0,"end_s":248.72,"text":"is capable on paper of about a hundred","speaker":null,"is_sponsor":0},{"start_s":245.599,"end_s":252.239,"text":"gigabytes a second in reads so we would","speaker":null,"is_sponsor":0},{"start_s":248.72,"end_s":253.68,"text":"need assuming perfect efficiency which","speaker":null,"is_sponsor":0},{"start_s":252.239,"end_s":259.519,"text":"obviously never happens in the real world nearly half of our memory bandwidth just","speaker":null,"is_sponsor":0},{"start_s":258.4,"end_s":264.639,"text":"to handle shifting data around when we're reading","speaker":null,"is_sponsor":0},{"start_s":262.0,"end_s":268.88,"text":"or writing to our storage array that's ridiculous and even the Linux","speaker":null,"is_sponsor":0},{"start_s":267.52,"end_s":273.6,"text":"kernel is going to be on the struggle bus when you're talking about that much","speaker":null,"is_sponsor":0},{"start_s":271.04,"end_s":277.36,"text":"data as wendell so succinctly put it because","speaker":null,"is_sponsor":0},{"start_s":274.4,"end_s":282.72,"text":"here's the way it's supposed to work the operating system kernel asks for","speaker":null,"is_sponsor":0},{"start_s":280.08,"end_s":287.12,"text":"some chunk of data let's say a loot of your wife to enjoy on your lunch break","speaker":null,"is_sponsor":0},{"start_s":284.4,"end_s":291.44,"text":"all right the disk says yep no problem but nan flash is pretty slow so i'm","speaker":null,"is_sponsor":0},{"start_s":289.36,"end_s":295.28,"text":"going to need a sec to load that into my buffer i'll let you know when it's ready","speaker":null,"is_sponsor":0},{"start_s":293.52,"end_s":300.32,"text":"the disk gets everything ready loaded into the buffer and then it sends what's","speaker":null,"is_sponsor":0},{"start_s":297.28,"end_s":302.479,"text":"called an interrupt to the CPU to say","speaker":null,"is_sponsor":0},{"start_s":300.32,"end_s":306.08,"text":"hey all right it's chill you can swing by and grab that data now","speaker":null,"is_sponsor":0},{"start_s":304.8,"end_s":312.4,"text":"but here's the problem we're running into if the CPU core that the interrupt was","speaker":null,"is_sponsor":0},{"start_s":309.84,"end_s":317.44,"text":"intended for is too busy doing something else or it gets put to sleep or it gets","speaker":null,"is_sponsor":0},{"start_s":315.36,"end_s":322.4,"text":"reassigned to some other task in the middle of this process which can be","speaker":null,"is_sponsor":0},{"start_s":319.6,"end_s":328.16,"text":"quite common on multi-core cpus that interrupt never arrives your processor","speaker":null,"is_sponsor":0},{"start_s":325.759,"end_s":335.039,"text":"never goes and gets the data and the whole train comes to a screeching halt","speaker":null,"is_sponsor":0},{"start_s":331.52,"end_s":338.0,"text":"and that is why we had no issues last","speaker":null,"is_sponsor":0},{"start_s":335.039,"end_s":343.199,"text":"video slamming the individual drives with data but then as soon as we put a","speaker":null,"is_sponsor":0},{"start_s":341.759,"end_s":347.52,"text":"file system you know as soon as we started running a","speaker":null,"is_sponsor":0},{"start_s":344.88,"end_s":350.88,"text":"zfs raid and our CPU was doing parity calculations while we were reading and","speaker":null,"is_sponsor":0},{"start_s":349.039,"end_s":357.28,"text":"writing to the array making the CPU actually do any work we were getting","speaker":null,"is_sponsor":0},{"start_s":353.199,"end_s":360.72,"text":"crippling errors all over the place","speaker":null,"is_sponsor":0},{"start_s":357.28,"end_s":363.12,"text":"so aws just rolled out NVMe and there","speaker":null,"is_sponsor":0},{"start_s":360.72,"end_s":367.36,"text":"are a ton of threads about issues under heavy loads suggesting that this appears","speaker":null,"is_sponsor":0},{"start_s":365.28,"end_s":372.24,"text":"to be an industry-wide problem and the dumbest part of this is that i don't","speaker":null,"is_sponsor":0},{"start_s":369.759,"end_s":377.199,"text":"actually even need my server to be this fast i'm only hitting it with a 40","speaker":null,"is_sponsor":0},{"start_s":375.039,"end_s":382.08,"text":"gigabit connection here that's only four gigabytes a second maximum so wendell","speaker":null,"is_sponsor":0},{"start_s":379.68,"end_s":386.96,"text":"actually even thought of turning down the pci express links to gen 2 and just","speaker":null,"is_sponsor":0},{"start_s":384.8,"end_s":391.039,"text":"leaving them there Gigabyte meanwhile the makers of this server was like sorry","speaker":null,"is_sponsor":0},{"start_s":388.96,"end_s":394.16,"text":"wait you want a speed limiter on this thing but then wendell ended up finding","speaker":null,"is_sponsor":0},{"start_s":392.72,"end_s":399.039,"text":"a software way to do it but then it turned out there was a kernel bug something something something ultimately","speaker":null,"is_sponsor":0},{"start_s":397.039,"end_s":403.199,"text":"it didn't pan out and it didn't work anyway that's okay because Linux already","speaker":null,"is_sponsor":0},{"start_s":401.919,"end_s":409.84,"text":"has kind of a solution to this now very very high","speaker":null,"is_sponsor":0},{"start_s":407.919,"end_s":414.96,"text":"speed devices like RAM based caching devices operate in a","speaker":null,"is_sponsor":0},{"start_s":412.479,"end_s":420.08,"text":"completely different mode called polling where the kernel essentially assumes","speaker":null,"is_sponsor":0},{"start_s":417.28,"end_s":424.479,"text":"that the device is so fast that the data is going to be ready right away and it","speaker":null,"is_sponsor":0},{"start_s":422.319,"end_s":427.599,"text":"would add a lot of overhead to do this on slower drives because there'd be a","speaker":null,"is_sponsor":0},{"start_s":425.84,"end_s":433.199,"text":"lot of pointless hey are you done yet hey are you done yet so a single NVMe","speaker":null,"is_sponsor":0},{"start_s":430.08,"end_s":434.96,"text":"doesn't need to be pulled but 24","speaker":null,"is_sponsor":0},{"start_s":433.199,"end_s":438.08,"text":"oh there's an argument to be made for operating in that mode","speaker":null,"is_sponsor":0},{"start_s":436.479,"end_s":442.8,"text":"so here's the mitigation that wendell implemented when possible the kernel is","speaker":null,"is_sponsor":0},{"start_s":440.8,"end_s":446.88,"text":"going to wait for the interrupt because that's the most efficient thing but if","speaker":null,"is_sponsor":0},{"start_s":444.72,"end_s":450.96,"text":"it waits for too long the queuing algorithm will just have the CPU pull","speaker":null,"is_sponsor":0},{"start_s":448.72,"end_s":456.319,"text":"the drive rapidly and say hey do you have that do you have that okay","speaker":null,"is_sponsor":0},{"start_s":452.88,"end_s":456.319,"text":"great i'm going to take that now","speaker":null,"is_sponsor":0},{"start_s":457.36,"end_s":462.24,"text":"all that tweaking and learning means that our final config ended up being","speaker":null,"is_sponsor":0},{"start_s":460.639,"end_s":468.319,"text":"quite different from the initial intention so we're using the latest version of proxmox a Linux distro that's","speaker":null,"is_sponsor":0},{"start_s":466.16,"end_s":473.84,"text":"designed for virtualization with zfs support out of the box and while we had","speaker":null,"is_sponsor":0},{"start_s":470.639,"end_s":477.44,"text":"actually initially intended to use zfs","speaker":null,"is_sponsor":0},{"start_s":473.84,"end_s":480.639,"text":"we were hitting 100 utilization on a 24","speaker":null,"is_sponsor":0},{"start_s":477.44,"end_s":483.919,"text":"core 48 thread CPU and doing","speaker":null,"is_sponsor":0},{"start_s":480.639,"end_s":485.199,"text":"best case scenario assuming that the bug","speaker":null,"is_sponsor":0},{"start_s":483.919,"end_s":489.759,"text":"didn't surface 10 gigs a second reads 4 gigs of second","speaker":null,"is_sponsor":0},{"start_s":487.599,"end_s":493.12,"text":"lights which would have actually been fine remember we've only got a 40","speaker":null,"is_sponsor":0},{"start_s":491.28,"end_s":497.36,"text":"gigabit network connection except that the access latency was not really","speaker":null,"is_sponsor":0},{"start_s":495.199,"end_s":503.759,"text":"suitable for a multi-user video editing environment it was over 150 microseconds","speaker":null,"is_sponsor":0},{"start_s":501.36,"end_s":509.12,"text":"and the craziest part of that is that we actually hit those numbers even with","speaker":null,"is_sponsor":0},{"start_s":506.319,"end_s":514.399,"text":"some pretty esoteric tweaks like disabling arc compression i mean most","speaker":null,"is_sponsor":0},{"start_s":511.759,"end_s":519.279,"text":"seasoned cfs users would freak out about doing that but the problem is that arc","speaker":null,"is_sponsor":0},{"start_s":517.2,"end_s":523.76,"text":"compression makes three copies of the data in memory while you are writing and","speaker":null,"is_sponsor":0},{"start_s":522.24,"end_s":527.839,"text":"remember how much left over memory bandwidth we have","speaker":null,"is_sponsor":0},{"start_s":525.36,"end_s":532.56,"text":"so yeah tripling the load there ain't gonna fly so new plan","speaker":null,"is_sponsor":0},{"start_s":530.56,"end_s":537.279,"text":"Linux multi-disk ain't perfect it's Linux's own built-in","speaker":null,"is_sponsor":0},{"start_s":535.04,"end_s":542.08,"text":"software raid and the main disadvantage is that in the event of an unexpected","speaker":null,"is_sponsor":0},{"start_s":539.12,"end_s":546.399,"text":"shutdown it'll be really slow for the 30 minutes or so that it takes to resync","speaker":null,"is_sponsor":0},{"start_s":544.24,"end_s":551.68,"text":"four terabytes of data but that should be fine i mean that's what","speaker":null,"is_sponsor":0},{"start_s":548.399,"end_s":553.44,"text":"the seventeen thousand dollar battery","speaker":null,"is_sponsor":0},{"start_s":551.68,"end_s":559.92,"text":"backup in this room is supposed to be for so we settled on four striped","speaker":null,"is_sponsor":0},{"start_s":557.36,"end_s":564.8,"text":"software raid fives and the next experiment was to play","speaker":null,"is_sponsor":0},{"start_s":562.24,"end_s":569.44,"text":"around with the chunk size so that's how the blocks of data are broken up on the","speaker":null,"is_sponsor":0},{"start_s":567.12,"end_s":574.56,"text":"raid as well as the block size which is on the file system level so the default","speaker":null,"is_sponsor":0},{"start_s":571.6,"end_s":578.32,"text":"raid chunk size is 512k and the file system is 64.","speaker":null,"is_sponsor":0},{"start_s":576.64,"end_s":583.68,"text":"but when we were running benchmarks based on an editor usage pattern we","speaker":null,"is_sponsor":0},{"start_s":580.64,"end_s":585.92,"text":"actually found that the 512k chunks were","speaker":null,"is_sponsor":0},{"start_s":583.68,"end_s":590.399,"text":"a little bit higher latency than we'd like to see which is really really","speaker":null,"is_sponsor":0},{"start_s":588.8,"end_s":594.48,"text":"important when you're you know scrubbing through files on a timeline so we","speaker":null,"is_sponsor":0},{"start_s":592.0,"end_s":599.519,"text":"actually ended up using 128k for both which happens to line up with","speaker":null,"is_sponsor":0},{"start_s":596.56,"end_s":602.16,"text":"the buffer size on these devices perfect","speaker":null,"is_sponsor":0},{"start_s":600.56,"end_s":606.72,"text":"now the conventional wisdom for accessing very large files over a","speaker":null,"is_sponsor":0},{"start_s":604.24,"end_s":612.0,"text":"network share would actually be to use a very large chunk size like even one","speaker":null,"is_sponsor":0},{"start_s":609.68,"end_s":617.519,"text":"megabyte but while that would be great for ingesting like big batches of new","speaker":null,"is_sponsor":0},{"start_s":615.04,"end_s":621.76,"text":"footage when we're skipping around rather than reading them sequentially","speaker":null,"is_sponsor":0},{"start_s":619.36,"end_s":627.04,"text":"with many users doing that all the same time it actually makes sense that this","speaker":null,"is_sponsor":0},{"start_s":624.8,"end_s":631.76,"text":"would work well and experimentally so far it seems pretty good i just","speaker":null,"is_sponsor":0},{"start_s":630.16,"end_s":635.519,"text":"realized i forgot to drive upstairs i'm gonna go grab that","speaker":null,"is_sponsor":0},{"start_s":633.76,"end_s":640.399,"text":"with multi-disc we ended up with a maximum throughput of around 16","speaker":null,"is_sponsor":0},{"start_s":637.519,"end_s":643.839,"text":"gigabytes per second reads and eight gigabytes per second rights which","speaker":null,"is_sponsor":0},{"start_s":642.16,"end_s":648.079,"text":"obviously is way less than the maximum this hardware","speaker":null,"is_sponsor":0},{"start_s":646.32,"end_s":651.36,"text":"can theoretically do but there's a lot of overhead to contend","speaker":null,"is_sponsor":0},{"start_s":649.839,"end_s":656.399,"text":"with and besides that doesn't mean that there's no benefit to having all of this","speaker":null,"is_sponsor":0},{"start_s":654.48,"end_s":660.399,"text":"performance in reserve so the latency advantage is something","speaker":null,"is_sponsor":0},{"start_s":658.64,"end_s":664.24,"text":"that we've already talked about we've actually seen that high latency storage","speaker":null,"is_sponsor":0},{"start_s":662.56,"end_s":669.839,"text":"can cause instability in the video editing software that's accessing it but","speaker":null,"is_sponsor":0},{"start_s":666.48,"end_s":673.519,"text":"another benefit counter intuitively is","speaker":null,"is_sponsor":0},{"start_s":669.839,"end_s":676.48,"text":"that because the storage is so fast","speaker":null,"is_sponsor":0},{"start_s":673.519,"end_s":682.24,"text":"no one chiplet on our CPU can keep up with it a disadvantage of a chiplet","speaker":null,"is_sponsor":0},{"start_s":679.279,"end_s":688.959,"text":"design is that it's got huge horsepower but it's hard to harness all of it for","speaker":null,"is_sponsor":0},{"start_s":685.2,"end_s":691.44,"text":"one single task like a file copy from","speaker":null,"is_sponsor":0},{"start_s":688.959,"end_s":696.72,"text":"one user over the network with that said it's great as a multi-user experience","speaker":null,"is_sponsor":0},{"start_s":694.32,"end_s":701.36,"text":"because each discrete user like let's say a camera operator who's dumping red","speaker":null,"is_sponsor":0},{"start_s":699.04,"end_s":706.0,"text":"footage and a video editor who also has to work at the same time","speaker":null,"is_sponsor":0},{"start_s":703.04,"end_s":711.2,"text":"end up having their access spread over multiple chiplets that are individually","speaker":null,"is_sponsor":0},{"start_s":709.12,"end_s":717.68,"text":"kind of limited so we had remember guys 150 gigabytes","speaker":null,"is_sponsor":0},{"start_s":715.2,"end_s":722.16,"text":"per second of memory bandwidth one chiplet can't get at all of it so when","speaker":null,"is_sponsor":0},{"start_s":720.16,"end_s":727.76,"text":"we have one user copying a file over the network that user can only get to one or","speaker":null,"is_sponsor":0},{"start_s":725.279,"end_s":731.839,"text":"two cores so there's no way that user can monopolize all the resources on the","speaker":null,"is_sponsor":0},{"start_s":730.079,"end_s":736.399,"text":"system because of the way the whole thing is architected all of this in","speaker":null,"is_sponsor":0},{"start_s":734.24,"end_s":739.68,"text":"theory so far we haven't actually thrown our editors at it so let's go see if","speaker":null,"is_sponsor":0},{"start_s":737.839,"end_s":742.8,"text":"it's booted up and get them to try it","speaker":null,"is_sponsor":0},{"start_s":741.04,"end_s":747.04,"text":"taran was about to eat but now he has something very important to do what um","speaker":null,"is_sponsor":0},{"start_s":745.519,"end_s":751.2,"text":"you laughed dennis but you need to help too","speaker":null,"is_sponsor":0},{"start_s":748.8,"end_s":757.76,"text":"uh okay we're off to a good is this new yes this is pneumonic","speaker":null,"is_sponsor":0},{"start_s":753.68,"end_s":759.839,"text":"hi alex hi how's it going","speaker":null,"is_sponsor":0},{"start_s":757.76,"end_s":764.24,"text":"hi alex i have a new server to log you into i wouldn't even do that at this","speaker":null,"is_sponsor":0},{"start_s":761.68,"end_s":770.16,"text":"point nope what would you do no no this is not real i'm acting out come on","speaker":null,"is_sponsor":0},{"start_s":768.16,"end_s":773.92,"text":"and i'm not acknowledging it i add you too wait are we supposed to work off of","speaker":null,"is_sponsor":0},{"start_s":771.839,"end_s":777.12,"text":"this or just just just i just want to know if it works","speaker":null,"is_sponsor":0},{"start_s":775.2,"end_s":781.68,"text":"so you're supposed to work but like not important work i'm gonna mirror old","speaker":null,"is_sponsor":0},{"start_s":779.44,"end_s":784.88,"text":"wanik over to this one one more time okay so anything you do here will be","speaker":null,"is_sponsor":0},{"start_s":783.2,"end_s":788.8,"text":"overwritten so we're not supposed to use it look do you want me to do this","speaker":null,"is_sponsor":0},{"start_s":786.959,"end_s":792.0,"text":"so do them but then we're going to just wipe it out okay when are you going to","speaker":null,"is_sponsor":0},{"start_s":790.0,"end_s":795.36,"text":"swipe it what part of test is not clear just open up a project","speaker":null,"is_sponsor":0},{"start_s":793.519,"end_s":798.72,"text":"how's it going oh seems fine","speaker":null,"is_sponsor":0},{"start_s":797.12,"end_s":802.639,"text":"it's you know let's see if we can pump it up to full","speaker":null,"is_sponsor":0},{"start_s":800.48,"end_s":806.48,"text":"res well that's less of a network bottleneck thing and more of a you know","speaker":null,"is_sponsor":0},{"start_s":804.48,"end_s":810.32,"text":"the rest of the system but okay it's playing it though","speaker":null,"is_sponsor":0},{"start_s":808.0,"end_s":814.959,"text":"which is kind of surprising what","speaker":null,"is_sponsor":0},{"start_s":811.36,"end_s":816.959,"text":"well Linus uh you having wanting to do","speaker":null,"is_sponsor":0},{"start_s":814.959,"end_s":820.56,"text":"increasingly ambitious projects i appreciate that we now have more space","speaker":null,"is_sponsor":0},{"start_s":819.04,"end_s":823.36,"text":"for them us running out of space has been a large","speaker":null,"is_sponsor":0},{"start_s":822.24,"end_s":828.079,"text":"large large problem good work it's not broken","speaker":null,"is_sponsor":0},{"start_s":826.0,"end_s":833.519,"text":"so this does this feel any different than it was before it might be a little","speaker":null,"is_sponsor":0},{"start_s":830.48,"end_s":835.6,"text":"snippier snappier better you don't have","speaker":null,"is_sponsor":0},{"start_s":833.519,"end_s":839.68,"text":"to lie to me but i don't know i mean it i don't really see much difference this","speaker":null,"is_sponsor":0},{"start_s":837.76,"end_s":845.839,"text":"is at 1 8 res though what if you crank it a bit um okay thank you","speaker":null,"is_sponsor":0},{"start_s":843.36,"end_s":849.199,"text":"but is it better why are you asking me i'm asking you that's the whole point of","speaker":null,"is_sponsor":0},{"start_s":847.76,"end_s":852.72,"text":"this exercise you can't do anything participating oh","speaker":null,"is_sponsor":0},{"start_s":851.04,"end_s":858.0,"text":"okay fine from what i can tell it's actually","speaker":null,"is_sponsor":0},{"start_s":855.36,"end_s":861.76,"text":"a lot snappier than what i remember the editors say it's good enough and","speaker":null,"is_sponsor":0},{"start_s":860.16,"end_s":864.8,"text":"we're not getting any data corruption and the performance is","speaker":null,"is_sponsor":0},{"start_s":863.92,"end_s":871.199,"text":"fine but every one of these line items is an","speaker":null,"is_sponsor":0},{"start_s":867.199,"end_s":872.639,"text":"NVMe device timing out and we actually","speaker":null,"is_sponsor":0},{"start_s":871.199,"end_s":876.72,"text":"did some troubleshooting that i haven't talked about yet so one of the first","speaker":null,"is_sponsor":0},{"start_s":875.12,"end_s":881.92,"text":"things that we did was we swapped out the 24 core CPU that i originally","speaker":null,"is_sponsor":0},{"start_s":878.8,"end_s":884.48,"text":"configured the server with for a 64 core","speaker":null,"is_sponsor":0},{"start_s":881.92,"end_s":889.519,"text":"one because we found that with the 24 core the CPU during heavy reads and","speaker":null,"is_sponsor":0},{"start_s":887.04,"end_s":895.12,"text":"writes was getting hit with 50 or more buffer flushing tasks that were each","speaker":null,"is_sponsor":0},{"start_s":891.839,"end_s":898.0,"text":"pulling 20 usage of a single core just","speaker":null,"is_sponsor":0},{"start_s":895.12,"end_s":901.76,"text":"choking the poor thing and 64 cores did help significantly","speaker":null,"is_sponsor":0},{"start_s":899.839,"end_s":906.56,"text":"but i also didn't want to allocate a four or five thousand dollar CPU to the","speaker":null,"is_sponsor":0},{"start_s":903.839,"end_s":912.32,"text":"server so we dialed back to 32 and that ended up being a big improvement as well","speaker":null,"is_sponsor":0},{"start_s":909.36,"end_s":917.519,"text":"so bottom line the 32 core so adding just another eight cores and then","speaker":null,"is_sponsor":0},{"start_s":914.399,"end_s":920.0,"text":"tweaking the timing between going from","speaker":null,"is_sponsor":0},{"start_s":917.519,"end_s":924.72,"text":"interrupt base to polling based access to the drives gave us good enough","speaker":null,"is_sponsor":0},{"start_s":922.639,"end_s":928.639,"text":"performance that we've seen three gigabytes a second when we're hitting it","speaker":null,"is_sponsor":0},{"start_s":926.639,"end_s":932.16,"text":"with three different clients at a time in the real world without any","speaker":null,"is_sponsor":0},{"start_s":930.8,"end_s":936.8,"text":"significant jumps in access latency or dips in","speaker":null,"is_sponsor":0},{"start_s":934.399,"end_s":940.48,"text":"transfer speeds so we're rolling with it but there's something to be said for","speaker":null,"is_sponsor":0},{"start_s":938.48,"end_s":946.399,"text":"like a dual socket approach to this with more spare pci express lanes and even","speaker":null,"is_sponsor":0},{"start_s":943.12,"end_s":948.399,"text":"more CPU cores or oh i don't know AMD","speaker":null,"is_sponsor":0},{"start_s":946.399,"end_s":952.639,"text":"working with their oems to make sure that you know when you actually hit","speaker":null,"is_sponsor":0},{"start_s":950.16,"end_s":957.04,"text":"their pci express lanes it doesn't cause a bunch of traffic jams elsewhere in the","speaker":null,"is_sponsor":0},{"start_s":954.72,"end_s":960.72,"text":"CPU a massive shout out to wendell from level one text by the way that guy's","speaker":null,"is_sponsor":0},{"start_s":958.959,"end_s":964.56,"text":"anything but level one i would strongly recommend going and subscribing to him","speaker":null,"is_sponsor":0},{"start_s":962.639,"end_s":968.56,"text":"if you love this kind of deep dive server stuff linode provides virtual","speaker":null,"is_sponsor":0},{"start_s":966.8,"end_s":973.279,"text":"servers that make it easy and affordable to host your own app site service or","speaker":null,"is_sponsor":0},{"start_s":971.04,"end_s":976.399,"text":"whatever in the cloud other entry-level hostings work when you start up but","speaker":null,"is_sponsor":0},{"start_s":974.959,"end_s":980.959,"text":"you'll eventually want to get something powerful customizable and easy to use","speaker":null,"is_sponsor":0},{"start_s":979.04,"end_s":984.8,"text":"for cloud computing they've got a diy option if you want a full custom setup","speaker":null,"is_sponsor":0},{"start_s":982.639,"end_s":990.0,"text":"or you can easily set up your own server with their one-click apps you can deploy","speaker":null,"is_sponsor":0},{"start_s":986.959,"end_s":992.0,"text":"minecraft cs go servers wordpress and","speaker":null,"is_sponsor":0},{"start_s":990.0,"end_s":996.399,"text":"much more and you can even spin up your own vpn and have plenty of space to host","speaker":null,"is_sponsor":0},{"start_s":994.079,"end_s":1000.24,"text":"a website app or game server they've got affordable pricing with no hidden fees","speaker":null,"is_sponsor":0},{"start_s":998.32,"end_s":1006.48,"text":"that try to sneak onto your monthly bill and they've got 100 human 24 7 365","speaker":null,"is_sponsor":0},{"start_s":1004.0,"end_s":1009.92,"text":"customer service via phone or support tickets get twenty dollars in free","speaker":null,"is_sponsor":0},{"start_s":1008.32,"end_s":1014.0,"text":"credit on your new account with code minus 20 or by clicking the link in the","speaker":null,"is_sponsor":0},{"start_s":1011.92,"end_s":1017.519,"text":"video description so thanks for watching guys if you're looking for another","speaker":null,"is_sponsor":0},{"start_s":1015.36,"end_s":1021.36,"text":"survey video to check out maybe uh have a look at our petabyte project update","speaker":null,"is_sponsor":0},{"start_s":1019.92,"end_s":1025.439,"text":"and actually we're going to have another petabyte project coming soon so make","speaker":null,"is_sponsor":0},{"start_s":1023.12,"end_s":1031.199,"text":"sure you're subscribed so you don't miss it and remember how much memory","speaker":null,"is_sponsor":0},{"start_s":1028.16,"end_s":1032.0,"text":"no i need to scroll down okay no problem","speaker":null,"is_sponsor":0},{"start_s":1031.199,"end_s":1036.959,"text":"but give me a second um","speaker":null,"is_sponsor":0},{"start_s":1035.039,"end_s":1041.52,"text":"[ __ ] off and remember how much leftover memory","speaker":null,"is_sponsor":0},{"start_s":1038.72,"end_s":1044.799,"text":"bandwidth we have so yeah [ __ ] off","speaker":null,"is_sponsor":0},{"start_s":1043.28,"end_s":1049.28,"text":"why then your CPU goes [ __ ] off why isn't","speaker":null,"is_sponsor":0},{"start_s":1047.439,"end_s":1055.84,"text":"this working we're using the latest version of proxmox a Linux distribute","speaker":null,"is_sponsor":0},{"start_s":1052.64,"end_s":1058.08,"text":"[ __ ] off i need this to work what the","speaker":null,"is_sponsor":0},{"start_s":1055.84,"end_s":1061.08,"text":"[ __ ] okay","speaker":null,"is_sponsor":0}],"full_text":"when i signed off at the end of the video about our amazing fast new all SSD storage server i thought it was as simple as okay let's load the final os on this thing chuck it in the server room we're ready to start editing off of it but it wasn't so our story begins with some short video clips that i sent over to wendell from level one text complaining hey about Windows storage spaces on our new 24 drive NVMe server machine here because what was happening was while i was copying files to what should be one of the fastest storage servers on the freaking planet i was getting great performance sometimes and then rock bottom performance others we're talking like 10 20 30 megabytes a second so wendell dug into the system logs and discovered that there was some kind of a problem at the driver or pci express level where it was actually resetting individual drives like they were effectively timing out for seconds at a time while the data was in flight and then the poor array would be sitting there trying to figure out what to do while a drive is effectively mia then the drive reset would finish which is essentially like if you were pulling a drive out for like two seconds and then popping it back in and then the transfer would roll along at multiple hundreds of megabytes a second or we even saw at times numbers as high as 20 plus gigabytes a second in crystal disk mark then it would hitch again rinse and repeat obviously i can't deploy it like that so i thought it was my knowledge of Windows storage spaces or lack thereof and that i had configured it wrong but then the mystery deepened so this dropping out behavior actually happened with a simple Windows software raid with just four devices in it i mean that's a relatively pedestrian 16 gigabytes a second by the way if guys our sponsor for this video pulse way with pulseway you can remotely monitor manage and control all your Windows mac and Linux machines from one app create your free account today at the link below so we tried all the usual things we tried updating the drivers it was using the microsoft drivers we put the latest Intel drivers for these NVMe devices onto the system that didn't work we tried tweaking the power management to prevent the pci express lanes from switching to lower speeds when we were accessing all the drives and that could be a desirable behavior because there's so many drives in here that you're going to run into other system bottlenecks before you could possibly hope to use all the bandwidth of even a pci gen 3 link so gen 2 could be a pretty good bet but when it's happening automatically this speed switching takes time and that could be part of what's causing the problems but neither of those things or both of them were able to solve the problem and we only got a small improvement in the behavior so wendell suggested gee why don't we go over to Linux as he tends to do but then get this we got the same dropouts on Linux that seemed to suggest a hardware issue of some sort so guys this is why i ultimately made this video about it because this is pretty dry technical stuff for a lot of people but i thought it was fascinating NVMe is already so fast that a lot of stuff particularly software is not engineered for it which is turning out to be a bit of an industry-wide problem and when you take 24 of these drives that are capable of multiple gigabytes a second on paper that is now 24 times the problem think about it this way even with eight channels of memory which is pretty impressive the theoretical maximum memory bandwidth of our system here is around 200 gigabytes a second and real world you're looking at more like 100 to 150 gigabytes a second now let's talk about this storage array here this is capable on paper of about a hundred gigabytes a second in reads so we would need assuming perfect efficiency which obviously never happens in the real world nearly half of our memory bandwidth just to handle shifting data around when we're reading or writing to our storage array that's ridiculous and even the Linux kernel is going to be on the struggle bus when you're talking about that much data as wendell so succinctly put it because here's the way it's supposed to work the operating system kernel asks for some chunk of data let's say a loot of your wife to enjoy on your lunch break all right the disk says yep no problem but nan flash is pretty slow so i'm going to need a sec to load that into my buffer i'll let you know when it's ready the disk gets everything ready loaded into the buffer and then it sends what's called an interrupt to the CPU to say hey all right it's chill you can swing by and grab that data now but here's the problem we're running into if the CPU core that the interrupt was intended for is too busy doing something else or it gets put to sleep or it gets reassigned to some other task in the middle of this process which can be quite common on multi-core cpus that interrupt never arrives your processor never goes and gets the data and the whole train comes to a screeching halt and that is why we had no issues last video slamming the individual drives with data but then as soon as we put a file system you know as soon as we started running a zfs raid and our CPU was doing parity calculations while we were reading and writing to the array making the CPU actually do any work we were getting crippling errors all over the place so aws just rolled out NVMe and there are a ton of threads about issues under heavy loads suggesting that this appears to be an industry-wide problem and the dumbest part of this is that i don't actually even need my server to be this fast i'm only hitting it with a 40 gigabit connection here that's only four gigabytes a second maximum so wendell actually even thought of turning down the pci express links to gen 2 and just leaving them there Gigabyte meanwhile the makers of this server was like sorry wait you want a speed limiter on this thing but then wendell ended up finding a software way to do it but then it turned out there was a kernel bug something something something ultimately it didn't pan out and it didn't work anyway that's okay because Linux already has kind of a solution to this now very very high speed devices like RAM based caching devices operate in a completely different mode called polling where the kernel essentially assumes that the device is so fast that the data is going to be ready right away and it would add a lot of overhead to do this on slower drives because there'd be a lot of pointless hey are you done yet hey are you done yet so a single NVMe doesn't need to be pulled but 24 oh there's an argument to be made for operating in that mode so here's the mitigation that wendell implemented when possible the kernel is going to wait for the interrupt because that's the most efficient thing but if it waits for too long the queuing algorithm will just have the CPU pull the drive rapidly and say hey do you have that do you have that okay great i'm going to take that now all that tweaking and learning means that our final config ended up being quite different from the initial intention so we're using the latest version of proxmox a Linux distro that's designed for virtualization with zfs support out of the box and while we had actually initially intended to use zfs we were hitting 100 utilization on a 24 core 48 thread CPU and doing best case scenario assuming that the bug didn't surface 10 gigs a second reads 4 gigs of second lights which would have actually been fine remember we've only got a 40 gigabit network connection except that the access latency was not really suitable for a multi-user video editing environment it was over 150 microseconds and the craziest part of that is that we actually hit those numbers even with some pretty esoteric tweaks like disabling arc compression i mean most seasoned cfs users would freak out about doing that but the problem is that arc compression makes three copies of the data in memory while you are writing and remember how much left over memory bandwidth we have so yeah tripling the load there ain't gonna fly so new plan Linux multi-disk ain't perfect it's Linux's own built-in software raid and the main disadvantage is that in the event of an unexpected shutdown it'll be really slow for the 30 minutes or so that it takes to resync four terabytes of data but that should be fine i mean that's what the seventeen thousand dollar battery backup in this room is supposed to be for so we settled on four striped software raid fives and the next experiment was to play around with the chunk size so that's how the blocks of data are broken up on the raid as well as the block size which is on the file system level so the default raid chunk size is 512k and the file system is 64. but when we were running benchmarks based on an editor usage pattern we actually found that the 512k chunks were a little bit higher latency than we'd like to see which is really really important when you're you know scrubbing through files on a timeline so we actually ended up using 128k for both which happens to line up with the buffer size on these devices perfect now the conventional wisdom for accessing very large files over a network share would actually be to use a very large chunk size like even one megabyte but while that would be great for ingesting like big batches of new footage when we're skipping around rather than reading them sequentially with many users doing that all the same time it actually makes sense that this would work well and experimentally so far it seems pretty good i just realized i forgot to drive upstairs i'm gonna go grab that with multi-disc we ended up with a maximum throughput of around 16 gigabytes per second reads and eight gigabytes per second rights which obviously is way less than the maximum this hardware can theoretically do but there's a lot of overhead to contend with and besides that doesn't mean that there's no benefit to having all of this performance in reserve so the latency advantage is something that we've already talked about we've actually seen that high latency storage can cause instability in the video editing software that's accessing it but another benefit counter intuitively is that because the storage is so fast no one chiplet on our CPU can keep up with it a disadvantage of a chiplet design is that it's got huge horsepower but it's hard to harness all of it for one single task like a file copy from one user over the network with that said it's great as a multi-user experience because each discrete user like let's say a camera operator who's dumping red footage and a video editor who also has to work at the same time end up having their access spread over multiple chiplets that are individually kind of limited so we had remember guys 150 gigabytes per second of memory bandwidth one chiplet can't get at all of it so when we have one user copying a file over the network that user can only get to one or two cores so there's no way that user can monopolize all the resources on the system because of the way the whole thing is architected all of this in theory so far we haven't actually thrown our editors at it so let's go see if it's booted up and get them to try it taran was about to eat but now he has something very important to do what um you laughed dennis but you need to help too uh okay we're off to a good is this new yes this is pneumonic hi alex hi how's it going hi alex i have a new server to log you into i wouldn't even do that at this point nope what would you do no no this is not real i'm acting out come on and i'm not acknowledging it i add you too wait are we supposed to work off of this or just just just i just want to know if it works so you're supposed to work but like not important work i'm gonna mirror old wanik over to this one one more time okay so anything you do here will be overwritten so we're not supposed to use it look do you want me to do this so do them but then we're going to just wipe it out okay when are you going to swipe it what part of test is not clear just open up a project how's it going oh seems fine it's you know let's see if we can pump it up to full res well that's less of a network bottleneck thing and more of a you know the rest of the system but okay it's playing it though which is kind of surprising what well Linus uh you having wanting to do increasingly ambitious projects i appreciate that we now have more space for them us running out of space has been a large large large problem good work it's not broken so this does this feel any different than it was before it might be a little snippier snappier better you don't have to lie to me but i don't know i mean it i don't really see much difference this is at 1 8 res though what if you crank it a bit um okay thank you but is it better why are you asking me i'm asking you that's the whole point of this exercise you can't do anything participating oh okay fine from what i can tell it's actually a lot snappier than what i remember the editors say it's good enough and we're not getting any data corruption and the performance is fine but every one of these line items is an NVMe device timing out and we actually did some troubleshooting that i haven't talked about yet so one of the first things that we did was we swapped out the 24 core CPU that i originally configured the server with for a 64 core one because we found that with the 24 core the CPU during heavy reads and writes was getting hit with 50 or more buffer flushing tasks that were each pulling 20 usage of a single core just choking the poor thing and 64 cores did help significantly but i also didn't want to allocate a four or five thousand dollar CPU to the server so we dialed back to 32 and that ended up being a big improvement as well so bottom line the 32 core so adding just another eight cores and then tweaking the timing between going from interrupt base to polling based access to the drives gave us good enough performance that we've seen three gigabytes a second when we're hitting it with three different clients at a time in the real world without any significant jumps in access latency or dips in transfer speeds so we're rolling with it but there's something to be said for like a dual socket approach to this with more spare pci express lanes and even more CPU cores or oh i don't know AMD working with their oems to make sure that you know when you actually hit their pci express lanes it doesn't cause a bunch of traffic jams elsewhere in the CPU a massive shout out to wendell from level one text by the way that guy's anything but level one i would strongly recommend going and subscribing to him if you love this kind of deep dive server stuff linode provides virtual servers that make it easy and affordable to host your own app site service or whatever in the cloud other entry-level hostings work when you start up but you'll eventually want to get something powerful customizable and easy to use for cloud computing they've got a diy option if you want a full custom setup or you can easily set up your own server with their one-click apps you can deploy minecraft cs go servers wordpress and much more and you can even spin up your own vpn and have plenty of space to host a website app or game server they've got affordable pricing with no hidden fees that try to sneak onto your monthly bill and they've got 100 human 24 7 365 customer service via phone or support tickets get twenty dollars in free credit on your new account with code minus 20 or by clicking the link in the video description so thanks for watching guys if you're looking for another survey video to check out maybe uh have a look at our petabyte project update and actually we're going to have another petabyte project coming soon so make sure you're subscribed so you don't miss it and remember how much memory no i need to scroll down okay no problem but give me a second um [ __ ] off and remember how much leftover memory bandwidth we have so yeah [ __ ] off why then your CPU goes [ __ ] off why isn't this working we're using the latest version of proxmox a Linux distribute [ __ ] off i need this to work what the [ __ ] okay"}