WEBVTT

00:00:00.280 --> 00:00:02.080
When NVIDIA released SLI,

00:00:02.080 --> 00:00:04.780
it enabled gamers to enjoy next generation levels

00:00:04.780 --> 00:00:06.760
of quality today.

00:00:06.760 --> 00:00:07.640
Assuming of course,

00:00:07.640 --> 00:00:10.000
that they could afford literally twice

00:00:10.000 --> 00:00:11.720
as many graphics cards.

00:00:11.720 --> 00:00:15.000
But as monitor resolutions have grown,

00:00:15.000 --> 00:00:18.080
the scalable part of the scalable link interface,

00:00:18.080 --> 00:00:20.280
which has been with us for over a decade,

00:00:20.280 --> 00:00:23.600
hasn't been able to keep up in spite of fancy,

00:00:23.600 --> 00:00:26.580
high bandwidth bridges like this one.

00:00:26.580 --> 00:00:29.900
Meanwhile, over on the professional side of things,

00:00:29.900 --> 00:00:30.980
NVIDIA has been pushing

00:00:30.980 --> 00:00:35.980
a newer inter-GPU communication protocol called NVLink.

00:00:38.350 --> 00:00:41.650
This essentially turns SLI up to 11.

00:00:41.650 --> 00:00:46.940
But why would you, the general consumer, care about that?

00:00:46.980 --> 00:00:51.560
Well, because NVLink is coming to consumers

00:00:51.560 --> 00:00:54.340
with the GeForce RTX series.

00:00:54.340 --> 00:00:58.600
So it is time then to ask the big question.

00:00:58.600 --> 00:01:00.960
Does it make gaming

00:01:00.960 --> 00:01:01.800
better?

00:01:01.800 --> 00:01:06.090
Whew, that's a lot of hardware.

00:01:06.090 --> 00:01:07.530
Speaking of big questions,

00:01:07.530 --> 00:01:09.050
have you tried GlassWire?

00:01:09.050 --> 00:01:12.150
Detect malware and block badly behaving apps

00:01:12.150 --> 00:01:14.010
on your PC or Android device.

00:01:14.010 --> 00:01:17.870
Use offer code Linus to get 25% off GlassWire 2.0

00:01:17.870 --> 00:01:28.240
at the link below.

00:01:28.240 --> 00:01:29.780
So one of the first things you'll notice

00:01:29.780 --> 00:01:31.940
about a card equipped with NVLink

00:01:31.940 --> 00:01:35.500
is just how big the connector fingers are

00:01:35.500 --> 00:01:37.240
compared to traditional SLI.

00:01:37.240 --> 00:01:40.340
They are more than three times as wide with way,

00:01:40.340 --> 00:01:41.900
way more pins.

00:01:41.900 --> 00:01:45.440
Like seriously, a single NVLink finger

00:01:45.440 --> 00:01:49.460
is wider than the entire SLI connector setup.

00:01:49.460 --> 00:01:52.900
It almost even looks like they're little

00:01:52.900 --> 00:01:55.580
PCI Express connectors, which,

00:01:55.580 --> 00:01:58.860
as we're about to see, isn't by accident.

00:01:58.860 --> 00:02:03.620
So the way that SLI works is actually a lot like,

00:02:03.620 --> 00:02:06.060
oh, here, I have a good prep for this.

00:02:06.060 --> 00:02:09.460
It's actually a lot like the older SCSI and IDE.

00:02:09.460 --> 00:02:14.280
One card functions as the master in the relationship

00:02:14.280 --> 00:02:16.580
and the other one as a slave,

00:02:16.580 --> 00:02:18.740
or in the case of multiple other cards,

00:02:18.740 --> 00:02:20.860
they would all then be slaves.

00:02:20.860 --> 00:02:24.420
So that means that because the master alone

00:02:24.420 --> 00:02:27.140
is directing the workload for those slave cards

00:02:27.140 --> 00:02:31.520
with at best two gigabytes per second of bandwidth

00:02:31.520 --> 00:02:34.300
using one of NVIDIA's high bandwidth bridges,

00:02:34.300 --> 00:02:36.940
you've got enough for the render results

00:02:36.940 --> 00:02:38.820
to be returned to the master

00:02:38.820 --> 00:02:43.080
and honestly, not a whole lot more.

00:02:43.080 --> 00:02:47.040
This is the reason why you can't simply add together

00:02:47.040 --> 00:02:49.540
the memory of your SLI graphics cards,

00:02:49.540 --> 00:02:51.520
taking two 11 gig cards and saying,

00:02:51.520 --> 00:02:54.620
well, I've got 22 gigs of RAM now.

00:02:54.620 --> 00:02:57.840
And the same is true for Team Red's Crossfire.

00:02:57.840 --> 00:03:02.780
By contrast, NVLink is bi-directional

00:03:02.780 --> 00:03:05.780
and it's configured as a mesh,

00:03:05.780 --> 00:03:08.900
which means that no one card is the master,

00:03:08.900 --> 00:03:11.420
and there are no slaves.

00:03:11.420 --> 00:03:14.980
Think of it more like if you were plugging computers

00:03:14.980 --> 00:03:18.240
into a router or a switch.

00:03:18.240 --> 00:03:21.540
So this, along with the extra pins

00:03:21.540 --> 00:03:23.820
and newer signaling protocol,

00:03:23.820 --> 00:03:27.380
gives these cards a lot more bandwidth,

00:03:27.380 --> 00:03:29.440
more than even PCI Express,

00:03:29.440 --> 00:03:34.440
at a total of up to 160 to 300 gigabytes per second.

00:03:35.620 --> 00:03:37.360
That kind of speed lets them pool resources,

00:03:37.360 --> 00:03:38.360
that kind of speed lets them pool resources,

00:03:38.360 --> 00:03:39.060
that kind of speed lets them pool resources,

00:03:39.060 --> 00:03:43.080
in a way that allows access to each card's memory

00:03:43.080 --> 00:03:46.740
and CUDA cores as though they were a single card.

00:03:46.740 --> 00:03:49.220
And that's perfect for the scientific

00:03:49.220 --> 00:03:51.240
and high-end render stations

00:03:51.240 --> 00:03:55.480
that NVIDIA has traditionally targeted with NVLink.

00:03:55.480 --> 00:03:58.040
Now, you might be thinking to yourself,

00:03:58.040 --> 00:04:02.160
awesome, NVLink is coming to GeForce RTX cards.

00:04:02.160 --> 00:04:03.640
We're gonna get those benefits.

00:04:03.640 --> 00:04:06.190
I'm doubling my pre-order.

00:04:06.190 --> 00:04:08.490
Hold your horses there, Tom.

00:04:08.490 --> 00:04:09.170
Yeah, it's awesome.

00:04:09.170 --> 00:04:12.030
But the number of links provided on RTX

00:04:12.030 --> 00:04:13.450
is relatively minimal,

00:04:13.450 --> 00:04:18.450
and the RTX cards only support SLI over the NVLink bus.

00:04:19.610 --> 00:04:24.010
So there will be no fancy resource pooling going on here.

00:04:24.010 --> 00:04:28.750
So our plan today then is to take our Quadro GP100s

00:04:28.750 --> 00:04:31.150
and run them both in compute mode,

00:04:31.150 --> 00:04:33.450
which actually disables the graphics engine,

00:04:33.450 --> 00:04:34.810
like we couldn't plug a display

00:04:34.810 --> 00:04:36.730
into these things right now if we tried,

00:04:36.730 --> 00:04:38.470
and in what's called SLI mode,

00:04:38.470 --> 00:04:41.350
to look at their gaming performance.

00:04:41.350 --> 00:04:43.070
Yes, yes, I know.

00:04:43.070 --> 00:04:45.750
This card isn't intended for gaming,

00:04:45.750 --> 00:04:49.730
but if you look closely at the spec of it,

00:04:49.730 --> 00:04:53.610
it's got HBM2 memory, yes, and more of it,

00:04:53.610 --> 00:04:58.610
but it's otherwise actually very similar to the GTX 1080 Ti.

00:04:59.750 --> 00:05:04.050
So this is probably as close as we will ever get

00:05:04.050 --> 00:05:08.050
to an apples-to-apples comparison between SLI and NVLink.

00:05:08.470 --> 00:05:13.270
Since Pascal is likely to be the only generation of products

00:05:13.270 --> 00:05:16.530
where both of these technologies are present.

00:05:16.530 --> 00:05:18.010
First up, some pre-flight tweaks

00:05:18.010 --> 00:05:19.550
to get everything working though.

00:05:19.550 --> 00:05:22.370
We needed a Quadro SLI-certified motherboard,

00:05:22.370 --> 00:05:26.170
so our ASUS X299 Deluxe with a Core i9-7900X

00:05:26.170 --> 00:05:27.630
worked nicely for this.

00:05:27.630 --> 00:05:31.450
And to look at NVLink's non-gaming performance,

00:05:31.450 --> 00:05:33.310
we needed to configure both cards

00:05:33.310 --> 00:05:35.750
in Tesla compute cluster mode,

00:05:35.750 --> 00:05:38.390
which we can check by going ahead and running...

00:05:38.470 --> 00:05:41.370
this command in the Windows PowerShell.

00:05:41.370 --> 00:05:42.790
So you can see right here,

00:05:42.790 --> 00:05:46.230
links one to three, or zero to three, excuse me,

00:05:46.230 --> 00:05:48.090
or one to, whatever the point is,

00:05:48.090 --> 00:05:52.610
they're all running, and that's good.

00:05:52.610 --> 00:05:55.930
Unfortunately, many of our benchmarks

00:05:55.930 --> 00:05:59.650
actually didn't cooperate very well

00:05:59.650 --> 00:06:03.740
with this particular setup,

00:06:03.740 --> 00:06:07.580
though the latest experimental Blender build managed it,

00:06:07.580 --> 00:06:08.420
and, whew!

00:06:09.360 --> 00:06:14.110
The results pretty much speak for themselves.

00:06:14.110 --> 00:06:17.200
Three and a half minutes for Gooseberry?

00:06:17.200 --> 00:06:20.700
20 seconds for BMW?

00:06:20.700 --> 00:06:22.300
In spite of these tests

00:06:22.300 --> 00:06:24.960
not being particularly memory intensive,

00:06:24.960 --> 00:06:28.180
we are seeing a clear advantage here.

00:06:28.180 --> 00:06:31.660
As for Luxmark's lower OpenCL performance scaling,

00:06:31.660 --> 00:06:35.320
that suggests that CUDA is a necessary ingredient

00:06:35.320 --> 00:06:38.260
if we wanna take full advantage of NVLink.

00:06:38.260 --> 00:06:39.800
Big surprise, of course.

00:06:40.100 --> 00:06:41.480
Not all there is to it, though.

00:06:41.480 --> 00:06:43.900
Remember how NVLink allows us to utilize

00:06:43.900 --> 00:06:45.960
all of the available memory on our cards

00:06:45.960 --> 00:06:48.470
as though they were one big card?

00:06:48.470 --> 00:06:50.310
Well, because of that,

00:06:50.310 --> 00:06:52.870
we can now work with much larger data sets

00:06:52.870 --> 00:06:55.930
than would have been possible on smaller configurations.

00:06:55.930 --> 00:06:58.610
And trust us, we tried on smaller configurations.

00:06:58.610 --> 00:07:02.610
You can see here, even our twinned GP100s

00:07:02.610 --> 00:07:05.130
couldn't handle this particular workload.

00:07:05.130 --> 00:07:09.870
So, it's time to bring out the big guns.

00:07:09.870 --> 00:07:14.410
Our GV100s with their new NVLink bridges

00:07:14.410 --> 00:07:19.410
will give us a total of 64 gigs of HBM2 memory.

00:07:19.710 --> 00:07:21.750
That's more than the system memory

00:07:21.750 --> 00:07:24.170
of even many workstations.

00:07:24.170 --> 00:07:25.810
And there it is.

00:07:25.810 --> 00:07:29.630
Our GV100s handle this just fine.

00:07:29.630 --> 00:07:33.910
So, that's super impressive and extremely useful

00:07:33.910 --> 00:07:37.630
for people with huge data sets.

00:07:37.630 --> 00:07:39.410
But the real thing we were after here was

00:07:39.410 --> 00:07:44.410
evaluating the SLI mode that is coming with the RTX series.

00:07:44.570 --> 00:07:48.170
So, and here we go.

00:07:48.170 --> 00:07:50.990
So, in a massive surprise to no one,

00:07:50.990 --> 00:07:55.990
the GV100s are the fastest solution on the block, for now.

00:07:56.410 --> 00:07:59.590
In SLI, even at 4K Ultra,

00:07:59.590 --> 00:08:04.030
the average frame rate never dipped below 60, which is huge.

00:08:04.030 --> 00:08:07.510
Nothing else can even come close to claiming that.

00:08:07.510 --> 00:08:09.250
What's more interesting, though,

00:08:09.250 --> 00:08:12.950
is when we look at the scaling figures side by side.

00:08:12.950 --> 00:08:15.910
So, our GP100s here,

00:08:15.910 --> 00:08:20.750
these guys seem to scale better than the GV100s

00:08:20.750 --> 00:08:22.290
in gaming and productivity,

00:08:22.290 --> 00:08:24.590
giving them the best scaling overall,

00:08:24.590 --> 00:08:28.210
which may suggest some kind of CPU bottleneck.

00:08:28.210 --> 00:08:30.510
As for the GTX 1080 Ti,

00:08:30.510 --> 00:08:33.530
well, there's huge gains to be made in gaming,

00:08:33.530 --> 00:08:35.850
but not as much in productivity.

00:08:35.850 --> 00:08:38.570
So, as you might expect with anything new,

00:08:39.250 --> 00:08:42.870
I think SLI doesn't scale the same way for everything,

00:08:42.870 --> 00:08:46.490
but it does look to be a pretty decent improvement

00:08:46.490 --> 00:08:48.210
over traditional SLI,

00:08:48.210 --> 00:08:51.750
about 10 to 23% better by our measure,

00:08:51.750 --> 00:08:53.930
with the potential to dramatically improve

00:08:53.930 --> 00:08:56.970
undesirable behavior like micro-stuttering as well,

00:08:56.970 --> 00:08:59.970
or even enable more than two-way SLI

00:08:59.970 --> 00:09:02.490
with decent scaling in the future.

00:09:02.490 --> 00:09:05.970
That is, depending on how much NVIDIA decides to neuter it

00:09:05.970 --> 00:09:08.650
compared to its professional-grade cousin.

00:09:08.650 --> 00:09:09.010
You, too, never know.

00:09:09.010 --> 00:09:10.070
You never know with those guys.

00:09:10.070 --> 00:09:11.990
I mean, one thing we discovered

00:09:11.990 --> 00:09:13.650
in the course of our testing for this video

00:09:13.650 --> 00:09:16.410
is that the new NVLink bridges here

00:09:16.410 --> 00:09:20.770
don't work with the old NVLink cards, even the pro ones.

00:09:20.770 --> 00:09:23.310
So, NVIDIA told us something about

00:09:23.310 --> 00:09:26.330
consumer NVLink bridges having fewer pins,

00:09:26.330 --> 00:09:29.050
or more importantly, a slightly different pin-out,

00:09:29.050 --> 00:09:30.590
but could they have made it work?

00:09:30.590 --> 00:09:31.430
I don't know.

00:09:31.430 --> 00:09:32.330
I don't know with those guys.

00:09:32.330 --> 00:09:36.150
Either way, NVLink has lots of potential

00:09:36.150 --> 00:09:38.310
and looks like a significant hardware upgrade

00:09:38.310 --> 00:09:39.550
that should only improve

00:09:39.550 --> 00:09:42.330
as the drivers themselves continue to improve.

00:09:42.330 --> 00:09:46.650
So maybe, just maybe, SLI isn't dead.

00:09:46.650 --> 00:09:48.450
Yet.

00:09:48.450 --> 00:09:50.250
Maybe.

00:09:50.250 --> 00:09:51.310
But you know what's not maybe?

00:09:51.310 --> 00:09:54.330
FreshBooks, being the small business accounting software

00:09:54.330 --> 00:09:56.510
custom-built for how you wanna work.

00:09:56.510 --> 00:09:58.350
If you're a freelancer or a small business owner,

00:09:58.350 --> 00:10:00.110
you need to check out FreshBooks.

00:10:00.110 --> 00:10:01.850
It's the simple way to be more productive,

00:10:01.850 --> 00:10:04.310
more organized, and to get paid faster.

00:10:04.310 --> 00:10:06.610
You can create and send professional-looking invoices

00:10:06.610 --> 00:10:08.070
in less than 30 seconds.

00:10:08.070 --> 00:10:09.410
You can set up online payments,

00:10:09.410 --> 00:10:10.570
with just a couple of clicks,

00:10:10.570 --> 00:10:12.470
to get paid up to four days faster,

00:10:12.470 --> 00:10:14.910
and you don't have to take my word for it.

00:10:14.910 --> 00:10:17.290
Go try FreshBooks for 30 days for free

00:10:17.290 --> 00:10:19.450
at freshbooks.com slash techtips.

00:10:19.450 --> 00:10:20.770
Just enter Linus Tech Tips

00:10:20.770 --> 00:10:22.270
in the How You Heard About Us section,

00:10:22.270 --> 00:10:24.450
so they'll know who sent you.

00:10:24.450 --> 00:10:25.430
So thanks for watching, guys.

00:10:25.430 --> 00:10:27.230
If this video sucked, you know what to do.

00:10:27.230 --> 00:10:28.910
But if it was awesome, get subscribed,

00:10:28.910 --> 00:10:32.330
hit that like button, or check out the NVLink.

00:10:32.330 --> 00:10:33.990
Oh, Lordy, that's awful.

00:10:33.990 --> 00:10:35.910
Anthony, come on.

00:10:35.910 --> 00:10:37.250
To where to buy the stuff we featured

00:10:37.250 --> 00:10:38.330
in the video description.

00:10:38.330 --> 00:10:39.250
Also, NVLink.

00:10:39.410 --> 00:10:40.250
NVLinked.

00:10:40.250 --> 00:10:41.450
In the description is our merch store,

00:10:41.450 --> 00:10:43.870
which has cool shirts, and our community forum,

00:10:43.870 --> 00:10:45.370
which you should totally join.

00:10:48.540 --> 00:10:50.300
She made me do it twice.

00:10:50.300 --> 00:10:52.420
I mean, it's really, it's my fault.

00:10:52.420 --> 00:10:54.420
I'll read anything on the teleprompter.