WEBVTT

00:00:00.240 --> 00:00:01.560
As many of you probably know,

00:00:01.560 --> 00:00:04.680
the last few years have been a heck of a ride

00:00:04.680 --> 00:00:06.120
for data enthusiasts,

00:00:06.120 --> 00:00:08.920
with high speed storage becoming increasingly accessible.

00:00:08.920 --> 00:00:12.940
I mean, something like Samsung's 960 Pro SSD.

00:00:12.940 --> 00:00:14.880
It's tiny,

00:00:14.880 --> 00:00:17.700
it costs about a third of what I paid per Gigabyte

00:00:17.700 --> 00:00:19.820
for my first large boot SSD,

00:00:19.820 --> 00:00:23.660
and it's over an order of a magnitude faster

00:00:23.660 --> 00:00:26.220
when it comes to real world performance.

00:00:26.220 --> 00:00:28.540
But what some of you might not know

00:00:28.540 --> 00:00:31.760
is that networking has been keeping pace.

00:00:31.760 --> 00:00:35.320
So these bad boys right here

00:00:35.320 --> 00:00:38.380
are Mellanox ConnectX 4 cards.

00:00:38.380 --> 00:00:41.160
And even though they're already two generations old,

00:00:41.160 --> 00:00:44.680
they can reach transfer speeds of 100 gigabit,

00:00:44.680 --> 00:00:47.720
or about 12 gigabytes per second.

00:00:47.720 --> 00:00:51.140
That is fast enough to download Fortnite's install files,

00:00:51.140 --> 00:00:52.040
yes, we went there,

00:00:52.040 --> 00:00:55.570
in literally one second.

00:00:55.570 --> 00:00:57.490
Holy .

00:00:57.490 --> 00:00:59.050
So let's check them out.

00:00:59.090 --> 00:01:08.750
Before we get to the new crazy stuff,

00:01:08.750 --> 00:01:09.850
let's do a quick refresher

00:01:09.850 --> 00:01:11.770
on traditional networking hardware.

00:01:11.770 --> 00:01:14.170
So I've brought along a couple of examples here.

00:01:14.170 --> 00:01:16.230
These two cards run at gigabit

00:01:16.230 --> 00:01:19.050
and 10 gigabit speeds respectively.

00:01:19.050 --> 00:01:21.870
So this guy right here is about 10 times faster

00:01:21.870 --> 00:01:22.890
than this one.

00:01:22.890 --> 00:01:25.730
But other than that, they've got a lot in common.

00:01:25.730 --> 00:01:28.570
So they've both got Intel controllers on board,

00:01:28.570 --> 00:01:31.970
they both plug into a PCIe 2.0 slot,

00:01:31.970 --> 00:01:33.950
and they're both Ethernet.

00:01:33.950 --> 00:01:34.790
So,

00:01:34.790 --> 00:01:36.770
thanks to their use of intercompatible

00:01:36.770 --> 00:01:41.270
communication standards and the ubiquitous RJ45 connector,

00:01:41.270 --> 00:01:43.670
they can talk to each other directly

00:01:43.670 --> 00:01:46.530
or through a network switch like this one,

00:01:46.530 --> 00:01:51.390
albeit only at the speed of the slowest link in the chain,

00:01:51.390 --> 00:01:53.310
be it on this one gig card

00:01:53.310 --> 00:01:56.110
or through this one gig network switch.

00:01:56.110 --> 00:02:00.270
And honestly, either of these, especially this one,

00:02:00.270 --> 00:02:02.550
should be more than enough for the average person

00:02:02.550 --> 00:02:04.290
for quite some time.

00:02:04.290 --> 00:02:07.430
But this isn't average person land,

00:02:07.430 --> 00:02:09.590
which sounds like the world's most boring amusement park.

00:02:09.590 --> 00:02:13.010
So we've decided to go totally overkill

00:02:13.010 --> 00:02:14.610
and take it to the next step,

00:02:14.610 --> 00:02:17.130
to greater than 10 gigabit speeds,

00:02:17.130 --> 00:02:21.560
which brings us then back to our ConnectX 4s.

00:02:21.560 --> 00:02:24.220
So the first difference, the ports.

00:02:24.220 --> 00:02:26.640
This beefy looking thing right here

00:02:26.640 --> 00:02:30.420
is what's called a QSFP plus connector.

00:02:30.420 --> 00:02:33.460
And as you probably figured out on your own,

00:02:33.460 --> 00:02:34.360
you can't just plug in a QSFP plus connector

00:02:34.360 --> 00:02:39.360
and plug a standard network cable into this port.

00:02:39.720 --> 00:02:42.460
And even if you could, well, I guess,

00:02:42.460 --> 00:02:45.140
that brings us to the next difference.

00:02:45.140 --> 00:02:47.400
The fact that out of the box,

00:02:47.400 --> 00:02:52.080
these cards are designed to run not on Ethernet networks,

00:02:52.080 --> 00:02:54.440
but on InfiniBand networks.

00:02:54.440 --> 00:02:58.360
So even if you could plug it into your network switch,

00:02:58.360 --> 00:03:00.380
it wouldn't be able to communicate with it

00:03:00.380 --> 00:03:02.340
without some configuration.

00:03:02.340 --> 00:03:04.340
And then finally, this one's actually

00:03:04.360 --> 00:03:05.520
pretty interesting.

00:03:05.520 --> 00:03:08.760
These cards, yes, my friends,

00:03:08.760 --> 00:03:13.760
these network cards use a full fat PCI Express Gen 3

00:03:16.120 --> 00:03:19.140
times 16 connection.

00:03:19.140 --> 00:03:22.200
That is the same as your graphics card.

00:03:22.200 --> 00:03:24.900
And they actually need it,

00:03:24.900 --> 00:03:27.140
probably more so than your GPU,

00:03:27.140 --> 00:03:29.600
if all you're doing is gaming.

00:03:29.600 --> 00:03:31.320
So then with that in mind,

00:03:31.320 --> 00:03:34.320
we will not be using just your average gaming machine

00:03:34.320 --> 00:03:35.260
for our testing.

00:03:35.260 --> 00:03:37.380
So on one side of our link,

00:03:37.380 --> 00:03:39.740
we've got Intel's flagship 18 core processor

00:03:39.740 --> 00:03:43.840
with a Rampage 6 Extreme motherboard, 128 gigs of RAM.

00:03:43.840 --> 00:03:44.680
Mm, yeah.

00:03:44.680 --> 00:03:45.960
And then in the other corner,

00:03:45.960 --> 00:03:47.280
we had to slum it a little,

00:03:47.280 --> 00:03:52.200
with a 16 core 7960X and an ASUS X299 Deluxe,

00:03:52.200 --> 00:03:54.020
but then with the same amount of RAM.

00:03:54.020 --> 00:03:55.580
And of course it's RGB across the board.

00:03:55.580 --> 00:03:59.020
Now, the reason that we're using the X299 platform

00:03:59.020 --> 00:04:02.020
with Core i9 processors is that we need to make sure

00:04:02.020 --> 00:04:04.280
that we have enough PCI Express lanes coming,

00:04:04.280 --> 00:04:06.080
directly off the CPU.

00:04:06.080 --> 00:04:11.000
So 44 should give us enough for 16 lanes for networking,

00:04:11.000 --> 00:04:15.620
16 lanes for our quad NVMe storage devices,

00:04:15.620 --> 00:04:19.720
and then, you know, some leftovers for the graphics card.

00:04:19.720 --> 00:04:22.420
So then, for our NVMe storage,

00:04:22.420 --> 00:04:27.320
we scraped together four Samsung 960 Pros for our first one,

00:04:27.320 --> 00:04:31.180
and then four Corsair MP500s for the other,

00:04:31.180 --> 00:04:32.600
with both of them running,

00:04:32.600 --> 00:04:33.800
with quad,

00:04:34.280 --> 00:04:37.960
SSDs in RAID 0.

00:04:37.960 --> 00:04:40.400
Today's video is about how to go fast,

00:04:40.400 --> 00:04:42.300
not about how to put on your seatbelt.

00:04:44.700 --> 00:04:46.220
This is like hilarious to me.

00:04:46.220 --> 00:04:48.240
I never thought I would see the day

00:04:48.240 --> 00:04:53.240
when the GPU in a gaming rig is the lowest priority,

00:04:53.600 --> 00:04:56.300
like tier PCI Express device

00:04:56.300 --> 00:04:58.140
sitting at the bottom of the board.

00:04:58.140 --> 00:05:00.040
Anyway, for our OS,

00:05:00.040 --> 00:05:02.620
we had to go with Windows Server 2016,

00:05:02.620 --> 00:05:04.940
because as much as we wanted to try out Windows 10,

00:05:05.140 --> 00:05:06.560
and Pro for Workstation,

00:05:06.560 --> 00:05:09.500
which is supposed to support RDMA,

00:05:09.500 --> 00:05:12.800
the tech that allows for these super high speed transfers,

00:05:12.800 --> 00:05:17.020
it just didn't seem to be working for us for some reason.

00:05:17.020 --> 00:05:19.760
All right, now, at this point, before we go further,

00:05:19.760 --> 00:05:22.320
wanna give a big shout out to the guys over at Mellanox

00:05:22.320 --> 00:05:24.760
for hooking us up with these 100 gigabit cards,

00:05:24.760 --> 00:05:28.440
as well as a pair of 100 gigabit capable,

00:05:28.440 --> 00:05:31.100
passive copper direct attach cables.

00:05:31.100 --> 00:05:32.720
Wow, that just hit the bench.

00:05:32.720 --> 00:05:33.560
Oh!

00:05:33.560 --> 00:05:35.300
So if you wanna try this out at home,

00:05:35.300 --> 00:05:37.320
these cards are actually available on eBay

00:05:37.320 --> 00:05:39.120
for like two, 300 bucks a pop,

00:05:39.120 --> 00:05:41.080
and then you'll pay about $60

00:05:41.080 --> 00:05:43.000
for a three meter cable like this one.

00:05:43.000 --> 00:05:44.100
It is worth mentioning though,

00:05:44.100 --> 00:05:45.660
that if you're planning on running anything

00:05:45.660 --> 00:05:47.200
further than five meters,

00:05:47.200 --> 00:05:49.980
you have to use an active fiber cable,

00:05:49.980 --> 00:05:53.920
which could cost upwards of $2,500 new.

00:05:53.920 --> 00:05:55.610
So...

00:05:55.610 --> 00:05:56.450
Damn!

00:05:56.450 --> 00:05:57.610
All right, moving on.

00:05:57.610 --> 00:05:59.870
Configuration then is our last step.

00:05:59.870 --> 00:06:03.750
So while Jake does that behind me here, thank you.

00:06:03.750 --> 00:06:04.590
Jake.

00:06:04.590 --> 00:06:05.630
Let's talk about some of the technology

00:06:05.630 --> 00:06:07.810
behind this networking magic.

00:06:07.810 --> 00:06:10.010
So these cards are designed for use

00:06:10.010 --> 00:06:13.010
with two different network fabrics,

00:06:13.010 --> 00:06:14.990
InfiniBand and Ethernet.

00:06:14.990 --> 00:06:17.070
And what makes InfiniBand special

00:06:17.070 --> 00:06:20.250
is that compared to even the sub millisecond latency

00:06:20.250 --> 00:06:22.210
of a typical Ethernet network,

00:06:22.210 --> 00:06:26.230
InfiniBand networks can have less than 25% as much,

00:06:26.230 --> 00:06:28.010
making them suitable for use cases

00:06:28.010 --> 00:06:30.590
like over network storage access

00:06:30.590 --> 00:06:32.910
and combining the processing power

00:06:32.910 --> 00:06:34.530
of multiple servers,

00:06:34.530 --> 00:06:37.770
just like in a data center or supercomputer.

00:06:37.770 --> 00:06:39.550
And this is cool.

00:06:39.550 --> 00:06:42.070
When you configure InfiniBand correctly,

00:06:42.070 --> 00:06:45.070
it also forms what's called a lossless network,

00:06:45.070 --> 00:06:49.250
meaning that packet loss should basically never happen.

00:06:49.250 --> 00:06:50.570
For compatibility though,

00:06:50.570 --> 00:06:52.890
we're going to be using them in Ethernet mode

00:06:52.890 --> 00:06:55.310
alongside a technology called RDMA

00:06:55.310 --> 00:06:58.070
or remote direct memory access.

00:06:58.070 --> 00:06:59.670
When you put these together,

00:06:59.670 --> 00:07:02.750
the setup is called ROCE or RDMA over conversion.

00:07:02.750 --> 00:07:04.070
That's a network-converged Ethernet.

00:07:04.070 --> 00:07:05.970
Now regular Ethernet implementations

00:07:05.970 --> 00:07:07.370
require a lot of hoop jumping

00:07:07.370 --> 00:07:08.850
in order to transfer data

00:07:08.850 --> 00:07:11.650
as any information sent must be first moved

00:07:11.650 --> 00:07:13.990
through the transport protocols driver,

00:07:13.990 --> 00:07:15.210
then through sockets

00:07:15.210 --> 00:07:17.210
before it can reach the applications memory,

00:07:17.210 --> 00:07:20.430
eating up CPU cycles and increasing latency in the process.

00:07:20.430 --> 00:07:22.670
However, with RDMA,

00:07:22.670 --> 00:07:25.810
the network adapters are able to access data

00:07:25.810 --> 00:07:28.510
directly from application memory,

00:07:28.510 --> 00:07:31.310
offloading much of the processing from your CPU

00:07:31.310 --> 00:07:32.370
onto the actual process, which is called RDMA,

00:07:32.370 --> 00:07:32.490
and the process is called GPC.

00:07:32.490 --> 00:07:32.630
And that's basically it.

00:07:32.630 --> 00:07:32.730
That's basically it.

00:07:32.730 --> 00:07:40.210
Processor that sits right on your network adapter. So these are known as zero copy transfers and they allow for

00:07:40.830 --> 00:07:48.170
Incredibly fast transfers that are no longer limited by CPU processing power. Pretty dang snazzy. So are we ready to go?

00:07:48.690 --> 00:07:53.870
We should be. Each card has an IP and a 50 Gigabyte RAM disk

00:07:54.030 --> 00:07:59.950
So we should be able to do some pretty quick Windows transfers. So are we going directly from RAM disk to RAM disk right now?

00:07:59.950 --> 00:08:04.210
Uh, so I think what we'll do, we got to see make sure it's working first, right?

00:08:04.850 --> 00:08:08.890
That would be good. I mean that's supposed to be your job, but. Fingers crossed

00:08:09.510 --> 00:08:14.030
You never, the second you try to do a networking demo, like

00:08:14.590 --> 00:08:20.500
We got our 40 Gigabyte text file. What? 36 or 37 gigabytes. Why?

00:08:21.660 --> 00:08:24.340
Why even? It's just a big file, okay?

00:08:25.220 --> 00:08:29.840
Oh, that's not bad. That's not 10 gig, or 100 gig.

00:08:30.340 --> 00:08:36.180
So we've got just over two gigabytes per second. So we're reading from that system's

00:08:37.360 --> 00:08:42.180
NVMe array. Okay. And then we're dumping to the RAM disk on this system. So let's try going the other way.

00:08:42.180 --> 00:08:47.940
So now we can go RAM disk to RAM disk. Is that right? No, that's still the same thing, but just the other way around. Whoa!

00:08:50.080 --> 00:08:54.540
Just shy of four gigabytes per second.

00:08:54.540 --> 00:09:00.140
I mean, let's put what just happened there in context. That is a 40, well,

00:09:00.140 --> 00:09:02.720
just shy of 40 Gigabyte file.

00:09:02.900 --> 00:09:07.080
So like, okay, DOOM. DOOM on PC is like 60 gigs.

00:09:07.080 --> 00:09:13.700
And it takes, you know, probably for your internet connection at home, like what, an hour to download and install? That just happened in real time.

00:09:14.600 --> 00:09:22.220
Okay, what's next? RAM disk to RAM disk? I gotta check. I think one of our RAM disks isn't working. I might have broken something.

00:09:23.100 --> 00:09:29.480
Okay, so check this out. We've got the RAM disks working, but for now, this is just another quick benchmark. So when we aren't limited,

00:09:30.300 --> 00:09:34.240
by the overhead of Windows File Explorer and Windows File Transfers,

00:09:34.240 --> 00:09:38.740
if we're just using a straight disk performance benchmark,

00:09:38.740 --> 00:09:40.300
I wanna show you the kinds of numbers we're looking at.

00:09:40.300 --> 00:09:44.740
So anyone who's familiar with Atto is gonna already know

00:09:44.740 --> 00:09:46.600
that this is freaking nuts.

00:09:46.600 --> 00:09:51.380
At two kilobyte sizes, we are already seeing speeds in excess

00:09:51.380 --> 00:09:55.280
of 100 megabytes per second, because we are actually reading

00:09:55.280 --> 00:10:00.120
and writing off of the four Corsair SSD array that's on the iPad.

00:10:00.300 --> 00:10:05.780
So we're gonna have to figure out how many megabytes per second we're gonna have to write off of the other machine.

00:10:05.780 --> 00:10:07.780
This is stupid.

00:10:07.780 --> 00:10:08.780
It's pretty cool, actually.

00:10:08.780 --> 00:10:12.780
It's really cool. And it's getting stupider as time goes on.

00:10:12.780 --> 00:10:16.780
We are already hitting 600 megabytes a second writes.

00:10:16.780 --> 00:10:17.780
Oh, that's weak, man.

00:10:17.780 --> 00:10:19.780
At 16 kilobytes, though.

00:10:19.780 --> 00:10:20.840
Just wait.

00:10:20.840 --> 00:10:25.840
That's the key. And it just keeps getting crazier.

00:10:25.840 --> 00:10:29.000
We're at one Gigabyte per second already.

00:10:29.000 --> 00:10:30.000
Three!

00:10:30.000 --> 00:10:31.000
Doubled!

00:10:31.000 --> 00:10:32.000
Oh, wow.

00:10:32.000 --> 00:10:33.000
Oh, man.

00:10:33.000 --> 00:10:35.000
We just cracked four gigs a second.

00:10:35.000 --> 00:10:38.700
We just cracked five gigs a second!

00:10:38.700 --> 00:10:41.700
And this is on a remote machine.

00:10:41.700 --> 00:10:43.700
This is not a local array.

00:10:43.700 --> 00:10:45.700
This is over the network.

00:10:45.700 --> 00:10:47.700
This is a slow test.

00:10:47.700 --> 00:10:48.700
Yeah.

00:10:48.700 --> 00:10:49.700
It's a really fast, slow test.

00:10:49.700 --> 00:10:50.700
Yeah, it takes a while.

00:10:50.700 --> 00:10:51.700
A really slow, fast test.

00:10:51.700 --> 00:10:53.700
It's in the billions of bytes.

00:10:53.700 --> 00:10:54.700
Billions of bytes.

00:10:54.700 --> 00:10:57.700
It's, like, not even readable anymore at that point.

00:10:57.700 --> 00:10:59.700
No, it's just like, what is going on?

00:10:59.700 --> 00:11:00.700
How do I maths?

00:11:00.700 --> 00:11:01.700
Oh, meanwhile.

00:11:01.700 --> 00:11:02.700
Are we there?

00:11:02.700 --> 00:11:03.700
Just cracked it.

00:11:03.700 --> 00:11:05.700
10 gigabytes a second.

00:11:05.700 --> 00:11:07.500
Okay.

00:11:07.500 --> 00:11:08.500
So, what's our next test?

00:11:08.500 --> 00:11:09.500
Uh, to RAM disk, I guess?

00:11:09.500 --> 00:11:10.500
Sure.

00:11:10.500 --> 00:11:11.500
Let's do it.

00:11:11.500 --> 00:11:20.790
So, what's interesting here is that we actually hit our peak speed earlier, but then we level

00:11:20.790 --> 00:11:22.790
off to the RAM disk.

00:11:22.790 --> 00:11:23.790
Yeah, but our writes are, like, way better.

00:11:23.790 --> 00:11:24.790
Writes are better.

00:11:24.790 --> 00:11:29.790
And if we look at CPU usage, it will show in here processor 2.

00:11:29.790 --> 00:11:32.790
It will show in here processor time.

00:11:32.790 --> 00:11:34.790
It was almost zero the whole time.

00:11:34.790 --> 00:11:35.890
Crazy.

00:11:35.890 --> 00:11:39.690
So, like, if you're using traditional Ethernet without RDMA, that would be, like, pretty

00:11:39.690 --> 00:11:40.690
high pegged up there.

00:11:40.690 --> 00:11:43.890
So, why don't we do as fast a Windows transfer as we can, then?

00:11:43.890 --> 00:11:45.890
We'll go straight to the RAM disk.

00:11:45.890 --> 00:11:48.890
Sure doesn't take long to move 40 Gigabyte files around like this, eh?

00:11:48.890 --> 00:11:50.890
Even on, like, a relatively slow...

00:11:50.890 --> 00:11:51.890
Relatively slow.

00:11:51.890 --> 00:11:53.890
Oh, 2 gigabytes a second.

00:11:53.890 --> 00:11:55.890
How will we ever manage?

00:11:55.890 --> 00:11:57.780
Painful.

00:11:57.780 --> 00:11:58.780
Yeah.

00:11:58.780 --> 00:11:59.780
So, about the same.

00:11:59.780 --> 00:12:00.780
Yeah.

00:12:00.780 --> 00:12:01.780
I think it's a Windows thing.

00:12:01.780 --> 00:12:06.780
So, pretty much what's going on here is that we have reached...

00:12:06.780 --> 00:12:09.780
Because we've seen faster transfer speeds in Atto.

00:12:09.780 --> 00:12:15.780
So, we have reached pretty much the limit of what Windows can handle for the time being.

00:12:15.780 --> 00:12:22.780
And I don't think it's gonna be in Microsoft's list of high priorities anytime soon to figure

00:12:22.780 --> 00:12:28.780
out how people can use Windows Explorer to copy files at faster than about 4 gigabytes

00:12:28.780 --> 00:12:29.780
a second.

00:12:29.780 --> 00:12:34.780
But that's okay because this was a lot of fun and hopefully you guys enjoyed it.

00:12:34.780 --> 00:12:39.480
If you wanna see what this kind of tech gets used for in the real world, because this is

00:12:39.480 --> 00:12:45.780
just not what it's really for, check out our kind of unboxing of SFU's Cedar Data Center.

00:12:45.780 --> 00:12:50.900
We'll have that linked in the description because that's this kind of technology on

00:12:50.900 --> 00:12:52.780
a whole other level.

00:12:52.780 --> 00:12:58.780
And with that said, I mean, even for someone like us, actually, it could be useful.

00:12:58.780 --> 00:13:04.780
Maybe we could try to rig up like a crazy way to improve the responsiveness of scrubbing

00:13:04.780 --> 00:13:06.780
in Adobe Premiere for the editors or something like this.

00:13:06.780 --> 00:13:07.780
100 gigabit for all the editors?

00:13:07.780 --> 00:13:08.780
Yeah, 100 gig...

00:13:08.780 --> 00:13:10.780
Well, no, but we could have like a 100 gigabit trunk.

00:13:10.780 --> 00:13:11.780
Yeah.

00:13:11.780 --> 00:13:12.780
And then they could all go 10 gig.

00:13:12.780 --> 00:13:13.780
What is it right now?

00:13:13.780 --> 00:13:15.780
It's bonded 10 gigs.

00:13:15.780 --> 00:13:16.780
Oh, like...

00:13:16.780 --> 00:13:17.780
So, you know what?

00:13:17.780 --> 00:13:18.780
Tell you what.

00:13:18.780 --> 00:13:22.780
Let us know in the comments if you'd like to see a video on what we can figure out to

00:13:22.780 --> 00:13:25.780
do with these in an actual deployment.

00:13:25.780 --> 00:13:28.330
So, thanks again for watching, guys.

00:13:28.330 --> 00:13:29.330
If you disliked this video, hit that button.

00:13:29.330 --> 00:13:30.330
But if you liked it, hit like, get subscribed, or maybe consider checking out where to buy

00:13:30.330 --> 00:13:31.330
the stuff we featured at the link below.

00:13:31.330 --> 00:13:32.330
Probably eBay links, I guess.

00:13:32.330 --> 00:13:33.330
Yeah.

00:13:33.330 --> 00:13:34.330
While you're down there, you can check out our merch store, which has cool shirts.

00:13:34.330 --> 00:13:35.330
Like, I don't know how long this one's gonna last, but it's gonna be up there as of the

00:13:35.330 --> 00:13:36.330
time of shooting this pretty soon.

00:13:36.330 --> 00:13:37.330
I think it's funny.

00:13:37.330 --> 00:13:38.330
It's so important.

00:13:38.330 --> 00:13:39.330
And also, we have a link to our community forum, which you should totally join.

00:13:39.330 --> 00:13:40.330
That shirt.

00:13:40.330 --> 00:13:41.330
I know, right?
