Unboxing Canada's BIGGEST Supercomputer!
Linus Tech Tips
·Linus Tech Tips
·2018-05-06
·
2,120 words · ~10 min read
0:00
behind me right now is the biggest
0:03
supercomputer in the country it'll be serving researchers across Canada
0:09
studying the human genome in bioinformatics particle physics
0:13
materials research even humani research
0:16
it's called Cedar it cost the federal government through the Canadian
0:20
foundation for Innovation over $16
0:23
million and we get to be the first to
0:26
unbox this Beast
0:40
Savage jerky is created without the use of nitrates or preservatives use offer
0:46
code LTT to save 10% at the link in the
0:49
video description so Cedar is a big data
0:54
machine it takes up a quar of the 5,000
0:59
ft Data Center it occupies meaning actually that there's room for it to
1:03
grow but right now it has
1:07
27,000 Intel Zeon processing course
1:11
190 terab of RAM 64 petabytes of storage
1:18
584 gpus and a total power draw of
1:24
560,000 wats though with that said it's
1:29
efficiency is a shocking
1:32
1.07 on the Pue scale where one would be
1:37
perfect and a typical data center would be 1 and 1/2 to two we'll get into how
1:42
they did that a little bit later though so our tour starts right here behind me
1:47
are what they call the high availability racks so everything back there has dual
1:53
power supplies for redundancy with a battery backup for that and a diesel
1:58
generator back up that everything back
2:02
here is Mission critical things like networking login servers and management
2:07
servers are all here and this is also
2:11
where you'll find the bulk of Cedar's
2:14
storage let's get in for a closer look at Cedar's connection to the outside
2:19
world this networking Appliance from
2:22
Huawei has a street price of around a
2:27
million doll W and right here this is
2:32
where it gets really bananas these guys are seedar
2:37
dual 100 gabit connections through
2:41
Vancouver and then as if that wasn't enough these orange ones here are dual
2:48
40 gbit connections through nearby sui
2:52
just in case somebody puts a back ho through one of these other fiber lines
2:56
and they would have otherwise lost their internet connectivity I mean that's
3:00
their Backup backup but Ethernet is not really the
3:05
way you want to connect high performance Computing nodes this this right here is
3:12
the true networking heart of Cedar these
3:17
are 48 Port omn path switches and
3:21
they're configured in what's called an island topology so the island is in
3:26
almost all cases 32 compute nodes each
3:32
of those compute nodes is connected to 32 ports on one of these switches in its
3:38
rack then the remaining 16 ports come
3:42
back to here that means that every
3:46
island gets a dedicated line to each of
3:51
the core switches giving you failover
3:54
and massive bandwidth each one of these fiber links
4:00
right here is capable of 100 gbit per
4:04
second so even though between islands we
4:07
are let's say bottlenecked by our 16
4:11
connections so that's only half the total theoretical speed within an island
4:15
we're still talking over 100 gabyt per
4:19
second so it's not really an issue okay now let's move on to SFU and
4:26
compute Canada's version of petabyte project spoiler alert theirs is better
4:32
in every conceivable way so in the five cabinets behind me we've got Cedar 50
4:39
petabyte IBM tape Library System they
4:43
have a 40 gabit link to the rest of the
4:46
supercomputer and each of the 5,000 10
4:50
tbte magnetic tapes inside can be grabbed out of storage moved with like a
4:57
robotic ARM into a reader and the data
5:00
can then be accessed when needed and this is done
5:03
automatically cool right okay yeah but due to the slowness
5:08
of that swapping process this is still
5:12
what we would consider to be cold or
5:15
archival storage next up here is general purpose
5:20
storage land where any data that's being
5:24
used for any current research project
5:27
would be housed so here here they're
5:30
using offthe shelf five U racks Each of
5:35
which contains let's see if we can crack One open here a total of two kind of
5:40
trays here and 84 8 terab uh let's have a look here
5:48
Enterprise capacity SAS drives from
5:53
Seagate but there's actually more to this system than meets the eye every
5:58
four of of these storage nodes requires
6:02
two nodes of what they're calling object
6:05
storage servers these act as a high-speed cache with their SAS 10,000
6:11
RPM drives as well as as kind of like a
6:15
a a traffic cough for everything behind
6:18
it so every single read or write to
6:21
these hard drives actually goes through these nodes so right now General Storage
6:28
Land is 10 ped byes but in the near to Mid future it will be expanding to 20
6:35
20 now that DIY approach to storage is
6:38
great for scaling up at a low cost but when it comes to Performance they went
6:43
for this data direct network storage Appliance because it has got the real
6:50
Goods now in the rack next to this brain
6:54
you'll find a mere 4 pedabytes of actual
6:58
storage due to its higher cost but thanks to its proprietary Hardware
7:03
custom software and solid state burst buffers this thing can handle up to 40
7:10
gabt per second of sustained throughput
7:13
making it perfect for data intensive
7:16
applications that rely on humongous data sets now let's get into
7:23
compute there are about half a dozen
7:26
different types of compute nodes all
7:30
connected to the same high-speed Omni paath Network backbone that are
7:34
optimized for different types of
7:38
research we'll begin with the base
7:41
compute note there are a whopping
7:45
576 of these each of these is a computer
7:49
so there's actually four in a single toou shell Each of which contains two
7:55
Zeon E5 2683 16 core processors 128 gigs
7:59
of RAM and about a terabyte of raid zero SSD storage for Scratch so each rack
8:06
here contains two islands so that's a
8:10
total of 64 compute nodes giving us a
8:14
whopping 2,48 compute units per rack so these
8:20
nodes are the basic Workhorse of Cedar
8:23
handling everything from Monte Carlo simulations for Material Science to sim
8:29
ating Dynamic processes in nature with a high degree of Randomness like snowfall
8:34
or rainfall they would also be used in
8:38
any highly parallelized workload because
8:42
if you need you know 10,000 CPU cores for one job there
8:47
aren't enough cores in any other class of server to handle that kind of
8:52
load moving right on up we've got the big memory nodes there are 48 of these
8:59
and half of them are just like the basic nodes except with 512 gigs of RAM while
9:06
the other half of them these puppies
9:09
have 1 and half terabytes of system
9:15
memory these ones take up twice as much
9:18
rack space though each of these one use
9:21
is a single dual socket system because
9:24
you know what there just wasn't enough go darn room for old 24 64 gig memory
9:32
modules that are required for that much
9:35
RAM first world problem
9:39
yes these guys are really special these
9:43
are the aptly named 3 tbte nodes there
9:47
are only a handful of them but these are quad socket machines with Zeon
9:53
4809 v4s four of them but wait a tick
9:58
those are only eight core processors these don't even have more processing
10:04
cores than those little tiny ones that take up half of you what's the deal here
10:10
well it turns out that some
10:13
bioinformatics workloads like genome sequencing don't actually scale very
10:19
well with more processors they just need
10:22
massive amounts of memory to hold the data sets that they need to work on so
10:28
while the team here probably isn't super stoked on using up for use just so they
10:34
can stuff more memory into the system until until optane reaches a higher
10:39
level of maturity this is the only choice they have now finally we're
10:44
getting to my favorite nodes the most expensive nodes these are the GPU nodes
10:53
and while they're actually quite similar
10:56
to the base nodes with respect to their CPU and RAM configurations what's got
11:02
the researchers in the fields of molecular Dynamics Ai and machine
11:07
learning all amped up about these are the quad NVIDIA Tesla p100 graphics
11:13
cards that they have cram into each one
11:16
I mean seriously with 1,500 watts of power being consumed by
11:22
each one of these is it it is an engineering Marvel that they've crammed
11:27
enough power and cool to make this whole thing work so
11:32
actually now that you think about it how exactly did they do that so the Keen ey
11:38
among you might have already caught a couple of hints earlier in this video
11:42
but the secret lies in the rear doors on
11:46
the server racks look how thick this is
11:49
yes my friends this entire door is a
11:54
gigantic heat exchanger so their servers
11:58
don't actually have have water blocks that would be more expensive what
12:03
they're doing is they've just got the fronts of the racks all sealed up so
12:06
there's no Backdraft pressure and they've got normal air cooled servers
12:12
that pass the air from the front where they just draw in room temperature air
12:16
in here and it comes out hot like 30 plus de and push it through this heat
12:21
exchanger where it is actually cool to my skin that's how efficient these are
12:28
and that cooling system system is massively expandable too you can
12:32
actually see above me I am standing
12:35
where we've got a blue and green cooling pipe connected to a whole bunch of quick
12:42
release fittings ready to add more racks
12:46
right here but to see what they actually do with the heat we're actually going to
12:50
have to go upstairs where we'll find the final and
12:57
perhaps the uh coolest stop in our tour
13:01
here this is the mechanical room where
13:05
the pumps and these freaking pipes take
13:10
all the water from downstairs and dump
13:13
it into three cooling towers outside the
13:17
building now right now the weather is
13:20
favorable to cooling the ambient temperature is quite low so it's just
13:25
operating as gigantic radiators but get
13:28
this when the conditions become less favorable in the summer they kick things
13:34
into high gear with an automated system
13:37
that sprays water onto the fins of the
13:41
radiators in the cooling towers and if you watched our bong cooling video which
13:47
you can check out right here you'll be familiar with this concept already but
13:51
this is called evaporative cooling and by these means even in ambient
13:55
temperatures up to 30° C
13:59
they can achieve the 17° coolant levels
14:04
that they need to without employing the
14:07
massive Chiller unit that they have over
14:10
on the other side of the
14:17
room Squarespace is the way to build a
14:21
website whether it's for your small business or for your you know uh local
14:27
freaking book club it does doesn't matter if you want a web presence
14:31
affordably and quickly Squarespace gets
14:35
it done for you it starts at just 12 bucks a month and you get a free domain
14:39
if you buy Squarespace for the year you just pick one of their templates and
14:44
boom you upload some pictures you fill in some text it's all cloud-based and
14:50
your website will simple as that look
14:53
great on any device every website comes
14:57
with a free online store and their cover pages feature allows you to set up a
15:02
beautiful one-page online presence in just minutes so start a trial with no
15:08
credit card required and start building your website today then when you decide
15:14
to sign up for Squarespace don't forget head over to squarespace.com
15:19
LTT and use offer code LTT to get 10%
15:23
off your first purchase so a massive thank you to SFU
15:27
and compute Canada for allowing us to uh run a muck in their data center thanks
15:33
to you guys for watching if you dislike this video you know what to do but if
15:36
you liked it hit that like button get subscribed maybe consider checking out
15:39
where to buy the stuff we featured at the link in the video description also
15:44
down there you'll find a link to our merch store which has cool shirts like this one and our community Forum which
15:48
you should totally join