Unboxing Canada's BIGGEST Supercomputer!

Linus Tech Tips ·Linus Tech Tips ·2018-05-06 · 2,120 words · ~10 min read
Floatplane YouTube

Transcript

JSON SRT VTT 204
0:00 behind me right now is the biggest
0:03 supercomputer in the country it'll be serving researchers across Canada
0:09 studying the human genome in bioinformatics particle physics
0:13 materials research even humani research
0:16 it's called Cedar it cost the federal government through the Canadian
0:20 foundation for Innovation over $16
0:23 million and we get to be the first to
0:26 unbox this Beast
0:40 Savage jerky is created without the use of nitrates or preservatives use offer
0:46 code LTT to save 10% at the link in the
0:49 video description so Cedar is a big data
0:54 machine it takes up a quar of the 5,000
0:59 ft Data Center it occupies meaning actually that there's room for it to
1:03 grow but right now it has
1:07 27,000 Intel Zeon processing course
1:11 190 terab of RAM 64 petabytes of storage
1:18 584 gpus and a total power draw of
1:24 560,000 wats though with that said it's
1:29 efficiency is a shocking
1:32 1.07 on the Pue scale where one would be
1:37 perfect and a typical data center would be 1 and 1/2 to two we'll get into how
1:42 they did that a little bit later though so our tour starts right here behind me
1:47 are what they call the high availability racks so everything back there has dual
1:53 power supplies for redundancy with a battery backup for that and a diesel
1:58 generator back up that everything back
2:02 here is Mission critical things like networking login servers and management
2:07 servers are all here and this is also
2:11 where you'll find the bulk of Cedar's
2:14 storage let's get in for a closer look at Cedar's connection to the outside
2:19 world this networking Appliance from
2:22 Huawei has a street price of around a
2:27 million doll W and right here this is
2:32 where it gets really bananas these guys are seedar
2:37 dual 100 gabit connections through
2:41 Vancouver and then as if that wasn't enough these orange ones here are dual
2:48 40 gbit connections through nearby sui
2:52 just in case somebody puts a back ho through one of these other fiber lines
2:56 and they would have otherwise lost their internet connectivity I mean that's
3:00 their Backup backup but Ethernet is not really the
3:05 way you want to connect high performance Computing nodes this this right here is
3:12 the true networking heart of Cedar these
3:17 are 48 Port omn path switches and
3:21 they're configured in what's called an island topology so the island is in
3:26 almost all cases 32 compute nodes each
3:32 of those compute nodes is connected to 32 ports on one of these switches in its
3:38 rack then the remaining 16 ports come
3:42 back to here that means that every
3:46 island gets a dedicated line to each of
3:51 the core switches giving you failover
3:54 and massive bandwidth each one of these fiber links
4:00 right here is capable of 100 gbit per
4:04 second so even though between islands we
4:07 are let's say bottlenecked by our 16
4:11 connections so that's only half the total theoretical speed within an island
4:15 we're still talking over 100 gabyt per
4:19 second so it's not really an issue okay now let's move on to SFU and
4:26 compute Canada's version of petabyte project spoiler alert theirs is better
4:32 in every conceivable way so in the five cabinets behind me we've got Cedar 50
4:39 petabyte IBM tape Library System they
4:43 have a 40 gabit link to the rest of the
4:46 supercomputer and each of the 5,000 10
4:50 tbte magnetic tapes inside can be grabbed out of storage moved with like a
4:57 robotic ARM into a reader and the data
5:00 can then be accessed when needed and this is done
5:03 automatically cool right okay yeah but due to the slowness
5:08 of that swapping process this is still
5:12 what we would consider to be cold or
5:15 archival storage next up here is general purpose
5:20 storage land where any data that's being
5:24 used for any current research project
5:27 would be housed so here here they're
5:30 using offthe shelf five U racks Each of
5:35 which contains let's see if we can crack One open here a total of two kind of
5:40 trays here and 84 8 terab uh let's have a look here
5:48 Enterprise capacity SAS drives from
5:53 Seagate but there's actually more to this system than meets the eye every
5:58 four of of these storage nodes requires
6:02 two nodes of what they're calling object
6:05 storage servers these act as a high-speed cache with their SAS 10,000
6:11 RPM drives as well as as kind of like a
6:15 a a traffic cough for everything behind
6:18 it so every single read or write to
6:21 these hard drives actually goes through these nodes so right now General Storage
6:28 Land is 10 ped byes but in the near to Mid future it will be expanding to 20
6:35 20 now that DIY approach to storage is
6:38 great for scaling up at a low cost but when it comes to Performance they went
6:43 for this data direct network storage Appliance because it has got the real
6:50 Goods now in the rack next to this brain
6:54 you'll find a mere 4 pedabytes of actual
6:58 storage due to its higher cost but thanks to its proprietary Hardware
7:03 custom software and solid state burst buffers this thing can handle up to 40
7:10 gabt per second of sustained throughput
7:13 making it perfect for data intensive
7:16 applications that rely on humongous data sets now let's get into
7:23 compute there are about half a dozen
7:26 different types of compute nodes all
7:30 connected to the same high-speed Omni paath Network backbone that are
7:34 optimized for different types of
7:38 research we'll begin with the base
7:41 compute note there are a whopping
7:45 576 of these each of these is a computer
7:49 so there's actually four in a single toou shell Each of which contains two
7:55 Zeon E5 2683 16 core processors 128 gigs
7:59 of RAM and about a terabyte of raid zero SSD storage for Scratch so each rack
8:06 here contains two islands so that's a
8:10 total of 64 compute nodes giving us a
8:14 whopping 2,48 compute units per rack so these
8:20 nodes are the basic Workhorse of Cedar
8:23 handling everything from Monte Carlo simulations for Material Science to sim
8:29 ating Dynamic processes in nature with a high degree of Randomness like snowfall
8:34 or rainfall they would also be used in
8:38 any highly parallelized workload because
8:42 if you need you know 10,000 CPU cores for one job there
8:47 aren't enough cores in any other class of server to handle that kind of
8:52 load moving right on up we've got the big memory nodes there are 48 of these
8:59 and half of them are just like the basic nodes except with 512 gigs of RAM while
9:06 the other half of them these puppies
9:09 have 1 and half terabytes of system
9:15 memory these ones take up twice as much
9:18 rack space though each of these one use
9:21 is a single dual socket system because
9:24 you know what there just wasn't enough go darn room for old 24 64 gig memory
9:32 modules that are required for that much
9:35 RAM first world problem
9:39 yes these guys are really special these
9:43 are the aptly named 3 tbte nodes there
9:47 are only a handful of them but these are quad socket machines with Zeon
9:53 4809 v4s four of them but wait a tick
9:58 those are only eight core processors these don't even have more processing
10:04 cores than those little tiny ones that take up half of you what's the deal here
10:10 well it turns out that some
10:13 bioinformatics workloads like genome sequencing don't actually scale very
10:19 well with more processors they just need
10:22 massive amounts of memory to hold the data sets that they need to work on so
10:28 while the team here probably isn't super stoked on using up for use just so they
10:34 can stuff more memory into the system until until optane reaches a higher
10:39 level of maturity this is the only choice they have now finally we're
10:44 getting to my favorite nodes the most expensive nodes these are the GPU nodes
10:53 and while they're actually quite similar
10:56 to the base nodes with respect to their CPU and RAM configurations what's got
11:02 the researchers in the fields of molecular Dynamics Ai and machine
11:07 learning all amped up about these are the quad NVIDIA Tesla p100 graphics
11:13 cards that they have cram into each one
11:16 I mean seriously with 1,500 watts of power being consumed by
11:22 each one of these is it it is an engineering Marvel that they've crammed
11:27 enough power and cool to make this whole thing work so
11:32 actually now that you think about it how exactly did they do that so the Keen ey
11:38 among you might have already caught a couple of hints earlier in this video
11:42 but the secret lies in the rear doors on
11:46 the server racks look how thick this is
11:49 yes my friends this entire door is a
11:54 gigantic heat exchanger so their servers
11:58 don't actually have have water blocks that would be more expensive what
12:03 they're doing is they've just got the fronts of the racks all sealed up so
12:06 there's no Backdraft pressure and they've got normal air cooled servers
12:12 that pass the air from the front where they just draw in room temperature air
12:16 in here and it comes out hot like 30 plus de and push it through this heat
12:21 exchanger where it is actually cool to my skin that's how efficient these are
12:28 and that cooling system system is massively expandable too you can
12:32 actually see above me I am standing
12:35 where we've got a blue and green cooling pipe connected to a whole bunch of quick
12:42 release fittings ready to add more racks
12:46 right here but to see what they actually do with the heat we're actually going to
12:50 have to go upstairs where we'll find the final and
12:57 perhaps the uh coolest stop in our tour
13:01 here this is the mechanical room where
13:05 the pumps and these freaking pipes take
13:10 all the water from downstairs and dump
13:13 it into three cooling towers outside the
13:17 building now right now the weather is
13:20 favorable to cooling the ambient temperature is quite low so it's just
13:25 operating as gigantic radiators but get
13:28 this when the conditions become less favorable in the summer they kick things
13:34 into high gear with an automated system
13:37 that sprays water onto the fins of the
13:41 radiators in the cooling towers and if you watched our bong cooling video which
13:47 you can check out right here you'll be familiar with this concept already but
13:51 this is called evaporative cooling and by these means even in ambient
13:55 temperatures up to 30° C
13:59 they can achieve the 17° coolant levels
14:04 that they need to without employing the
14:07 massive Chiller unit that they have over
14:10 on the other side of the
14:17 room Squarespace is the way to build a
14:21 website whether it's for your small business or for your you know uh local
14:27 freaking book club it does doesn't matter if you want a web presence
14:31 affordably and quickly Squarespace gets
14:35 it done for you it starts at just 12 bucks a month and you get a free domain
14:39 if you buy Squarespace for the year you just pick one of their templates and
14:44 boom you upload some pictures you fill in some text it's all cloud-based and
14:50 your website will simple as that look
14:53 great on any device every website comes
14:57 with a free online store and their cover pages feature allows you to set up a
15:02 beautiful one-page online presence in just minutes so start a trial with no
15:08 credit card required and start building your website today then when you decide
15:14 to sign up for Squarespace don't forget head over to squarespace.com
15:19 LTT and use offer code LTT to get 10%
15:23 off your first purchase so a massive thank you to SFU
15:27 and compute Canada for allowing us to uh run a muck in their data center thanks
15:33 to you guys for watching if you dislike this video you know what to do but if
15:36 you liked it hit that like button get subscribed maybe consider checking out
15:39 where to buy the stuff we featured at the link in the video description also
15:44 down there you'll find a link to our merch store which has cool shirts like this one and our community Forum which
15:48 you should totally join