WEBVTT

00:00:00.400 --> 00:00:07.359
If I were a shovel manufacturer during the gold rush, what sort of shovel would

00:00:04.720 --> 00:00:12.639
I build for myself? Well, that's exactly the question that I'm answering today as

00:00:09.440 --> 00:00:15.120
we tour the Nyx supercomput at NVIDIA

00:00:12.639 --> 00:00:22.000
headquarters here in Santa Clara. Behind this door are 1,192

00:00:18.560 --> 00:00:24.960
B200 GPUs. Absolute monsters rated for

00:00:22.000 --> 00:00:30.240
up to,200 watts of power each. And they've been used for everything from AI

00:00:27.199 --> 00:00:33.040
research to DLSS upscaling for gamers to

00:00:30.240 --> 00:00:36.079
good old-fashioned NLPF drag racing. You know, to put the other AI up starts in

00:00:34.880 --> 00:00:40.719
their place. So, what are we waiting for? Let's gear up and get inside. Uh

00:00:38.559 --> 00:00:49.760
oh, shoot. What was the password again? Oh, that's right. S E G [music] U E for

00:00:45.840 --> 00:00:52.320
our sponsor, MSI. Their Crosshair A16 HX

00:00:49.760 --> 00:00:57.920
AI is an 18-in high-end gaming laptop that comes with a 50 series card and a

00:00:54.719 --> 00:00:59.600
240 Hz display. Experience smooth gaming

00:00:57.920 --> 00:01:04.720
on the go with the link in the video description. Are you tired of boring

00:01:01.680 --> 00:01:06.799
screwdrivers? Boom. Four new colors of

00:01:04.720 --> 00:01:13.520
the brand new LT transparent screwdriver. Plasma purple, cryo teal,

00:01:10.560 --> 00:01:16.640
molten orange, and carbon black. They're see-through. They're awesome. and

00:01:15.119 --> 00:01:21.040
they're probably going to sell out because for a limited time only. If you

00:01:19.040 --> 00:01:25.600
buy through YouTube shopping at the link in the video description, you will get

00:01:22.960 --> 00:01:29.840
20% off your driver, but only until December 30th. Collect them all or maybe

00:01:28.000 --> 00:01:34.000
pair them with our brand new bid case and all over print hoodie. This

00:01:31.920 --> 00:01:38.560
particular data center space is used for a combination of R&D and NVIDIA internal

00:01:37.119 --> 00:01:42.400
production, meaning that it gets reconfigured on a pretty regular basis

00:01:40.871 --> 00:01:47.600
[music] as NVIDIA works through validation of new chip or server

00:01:44.960 --> 00:01:51.759
designs. That's why the raised floor has these plumbing access hatches for

00:01:49.520 --> 00:01:55.920
highdensity water cooled deployments, even though at the moment it's set up

00:01:53.280 --> 00:02:00.320
for air cooling to accommodate the DGX B200 racks that currently line the

00:01:58.240 --> 00:02:04.880
floor. We're in the first of two rooms that are linked by this fiber optic

00:02:02.159 --> 00:02:09.599
cabling, each of which contains two cats or cold air containment units. We're

00:02:07.920 --> 00:02:13.360
going to go in one of them in a second, but before we do, I want to talk a

00:02:11.680 --> 00:02:18.319
little bit about some of the details that would be easy to miss in here, like

00:02:16.000 --> 00:02:22.239
this decibel meter here that probably illustrates why it's better for us to

00:02:20.000 --> 00:02:26.720
talk out here. The deployment is absolutely peppered [music] with

00:02:23.760 --> 00:02:31.440
sensors. obvious ones like temperature on both sides of the rack like this one

00:02:28.640 --> 00:02:38.400
right here and also less obvious ones like humidity and air pressure. Humidity

00:02:35.440 --> 00:02:43.360
management is very important. Too low and static electricity becomes a

00:02:40.160 --> 00:02:45.519
problem. Too high and your servers start

00:02:43.360 --> 00:02:52.239
to sweat and then you're going to start to sweat. Air pressure or maybe more

00:02:49.440 --> 00:02:57.599
accurately air flow comes down to cooling. The DGX V200 systems here are

00:02:56.000 --> 00:03:01.920
of course equipped [music] with their own cooling fans and the same goes for

00:02:59.920 --> 00:03:08.239
the accompanying advanced tablet networking equipment. But for we are

00:03:04.560 --> 00:03:10.400
talking about up to 14,000

00:03:08.239 --> 00:03:14.800
watts of power consumption in each of these units.

00:03:12.239 --> 00:03:20.000
Let's just say that these fans can use all the help that they can get. So,

00:03:17.040 --> 00:03:24.959
fresh air gets actively blown up through the floor, creating positive pressure

00:03:22.640 --> 00:03:32.239
inside this room where it gets pushed through our DGXs and out into the hot

00:03:28.239 --> 00:03:35.040
aisle where uh things definitely get a

00:03:32.239 --> 00:03:40.640
lot hotter. I mean, I was getting kind of chilly in there, but here I would be

00:03:38.000 --> 00:03:44.560
sweating in a matter of minutes. But in spite of the challenges, the systems are

00:03:42.640 --> 00:03:49.200
running perfectly. And NVIDIA pointed out that some of these nodes are running

00:03:46.319 --> 00:03:54.000
full tilt on undisclosed workloads as we speak. Now, because NVIDIA uses spaces

00:03:51.920 --> 00:03:58.319
like these for validation, it's important that they get all the little

00:03:55.760 --> 00:04:02.720
details right so customers can just take this blueprint, copy it, and paste it

00:04:00.640 --> 00:04:06.080
into their own facility and trust that it's going to run at scale. The

00:04:04.319 --> 00:04:10.480
networking racks, for instance, are strategically positioned to minimize the

00:04:08.400 --> 00:04:15.360
length of fiber optic cabling between the four cats that make up this cluster.

00:04:13.200 --> 00:04:19.519
Two in this room and two in the next one over. This is both to maintain signal

00:04:17.440 --> 00:04:24.720
integrity. They found any higher than 100 to 150 m can be problematic at these

00:04:22.400 --> 00:04:29.120
speeds, but it's also to maintain the integrity of the ceiling. See, it turns

00:04:26.800 --> 00:04:33.759
out that when you start bundling up hundreds of fiber optic cables, they can

00:04:31.759 --> 00:04:38.639
get pretty heavy. And the longer the runs, the more strain they put on the

00:04:35.759 --> 00:04:44.160
ceiling. It also helps save on cost. This I don't have much to say about

00:04:40.320 --> 00:04:45.680
other than Oh my god, isn't it

00:04:44.160 --> 00:04:49.919
beautiful? Okay, I lied. I do have some stuff to

00:04:47.440 --> 00:04:55.280
say. This cable management is not just for looks, but rather to maintain air

00:04:52.320 --> 00:05:00.240
flow. When you have this many cables, you actually do need to bundle them. And

00:04:57.759 --> 00:05:05.919
it also helps facilitate maintenance in the event of a broken or damaged cable.

00:05:03.120 --> 00:05:11.520
They run extra fiber in each bundle that can be terminated as needed, and it is a

00:05:08.639 --> 00:05:16.320
lot easier to find that when things are organized. [music] Careful thought goes

00:05:13.680 --> 00:05:19.919
into rack serviceability, too. While the networking racks use traditional

00:05:18.160 --> 00:05:25.440
vertical PDUs for their power distribution, the DGX racks use three

00:05:22.800 --> 00:05:28.880
top-mounted PDUs that help balance the three [music] phases of power coming in.

00:05:27.039 --> 00:05:32.479
On the subject of power, we skipped over these at the SFU data center, but uh

00:05:31.120 --> 00:05:36.720
have you ever wondered what a >> 415 volt 100 amp

00:05:34.240 --> 00:05:40.240
>> power plug looks like? Wonder no longer. Another key advantage of putting all the

00:05:38.479 --> 00:05:45.600
PDUs up top is it makes it a little easier to get the DGX units in and out.

00:05:42.960 --> 00:05:49.280
They uh turn this one off and agreed to pull it out of the deployment just to

00:05:47.120 --> 00:05:52.639
show us how it's done. But when I asked if we could tear it down, they were

00:05:50.880 --> 00:05:56.320
like, "Huh?" Not because they have anything to hide,

00:05:54.240 --> 00:06:00.400
but just cuz they thought this soon to be decommissioned and recycled

00:05:57.919 --> 00:06:07.120
engineering unit that we can really get our greasy meat mitts into would be a

00:06:02.800 --> 00:06:09.520
lot more [music] fun. Woo!

00:06:07.120 --> 00:06:18.639
If you ever wondered what the heat sink would look like on a 1200 W GPU, wonder

00:06:14.880 --> 00:06:20.720
no more. How many heat pipes is this?

00:06:18.639 --> 00:06:25.280
It's like a forest of heat pipes in there. Oh, they're flat. That makes so

00:06:23.600 --> 00:06:29.120
much sense cuz you'll get more air flow through the fins. And I mean, that's

00:06:26.880 --> 00:06:35.440
some pretty hard working air dealing with not one but two rows of these GPUs.

00:06:32.800 --> 00:06:40.960
Not to mention this middle heat sink here that seems to be for the Envy Link

00:06:38.400 --> 00:06:48.319
switching equipment. I mean, I knew you need some beefy to allow these GPUs to

00:06:43.840 --> 00:06:50.880
pull the 192 gigs of HBM 3 memory that

00:06:48.319 --> 00:06:55.199
they have, but still, I'd never seen the cooling for it. Another thing I've never

00:06:53.199 --> 00:06:59.840
gotten to see, or at least never gotten to film, is NVIDIA's proprietary SXM

00:06:57.919 --> 00:07:03.280
interface. I know because I was looking for a B-roll shot of one of these a

00:07:01.599 --> 00:07:07.680
little while ago, and I realized I've never actually held one up on camera.

00:07:05.120 --> 00:07:12.639
So, now I've done it. And they also gave us this to show what Okay, not the same

00:07:10.080 --> 00:07:16.880
generation, but a similar GPU might look like without the heat sink. Oh, we've

00:07:15.280 --> 00:07:20.240
also got to look at the SXM interface that goes back to the rest of the system

00:07:18.560 --> 00:07:27.280
and also carries power for these [music] GPUs. These are 1,200 W each, but these

00:07:24.000 --> 00:07:30.080
cards go as high as 1,400 W. I mean, how

00:07:27.280 --> 00:07:34.560
high would the cooler be at that point? JK, it would be water cooled. We were

00:07:32.800 --> 00:07:39.280
hoping to see some water cooled machines today, but I guess NVIDIA has to save

00:07:37.195 --> 00:07:44.319
[music] something for next time. I got to say though, if the new stuff is even

00:07:41.440 --> 00:07:48.240
just equally cool to the last gen H100 water cooled machines we saw at the SFU

00:07:46.319 --> 00:07:53.039
Fur Supercomput, I'm sure it's absolutely mind-blowing. All of which is

00:07:50.880 --> 00:07:56.879
really cool, but what do they need all of this for? Well, the folks who are

00:07:55.360 --> 00:08:00.639
responsible for getting us access to everything today are actually from the

00:07:58.400 --> 00:08:04.960
GeForce gaming team. So, one of their big uses for these internal compute

00:08:02.479 --> 00:08:10.479
resources is, of course, deep learning superers sampling or DLSS. In a

00:08:07.599 --> 00:08:15.919
nutshell, DLSS allows your GPU to render your game at a lower resolution, then

00:08:12.960 --> 00:08:19.759
use deep learning or AI to upscale each output frame to your monitor's

00:08:17.440 --> 00:08:23.440
resolution. The benefit of this is that you can run at a higher frame rate for

00:08:21.520 --> 00:08:27.759
improved animation smoothness. But the drawback is that you can't create

00:08:25.680 --> 00:08:31.840
something from nothing and upscaled images struggle to achieve the same

00:08:29.280 --> 00:08:36.240
fidelity as a native rendered image. With that said, DLSS has improved a lot

00:08:34.479 --> 00:08:40.000
over the years and I got a chance to sit down and chat with Edward Leu from that

00:08:38.159 --> 00:08:43.839
team who talked us through some of the processes that they use to develop new

00:08:41.680 --> 00:08:49.200
features and new fixes. The first thing he pointed out is that while this hot

00:08:46.560 --> 00:08:54.160
new AI model took a thousand GPUs two weeks to train or whatever, makes for a

00:08:51.519 --> 00:08:58.399
good headline, it overlooks most of the actual cost and time, which is in the

00:08:56.720 --> 00:09:02.880
test training runs that take place before the hero run. His team is

00:09:01.040 --> 00:09:06.320
constantly iterating on the data they're feeding into DLSS and [music] the

00:09:04.560 --> 00:09:11.600
waiting, and they're evaluating new innovations in the AI space. Sometimes

00:09:09.120 --> 00:09:15.680
it's more surgical, like, "Oh, hey, uh, we noticed Cyberpunk has an issue with

00:09:13.360 --> 00:09:18.880
cars having like three or four bumpers as you're driving around. How can we

00:09:17.519 --> 00:09:24.560
address this with the current model?" That kind of thing can be turned around sometimes in a matter of weeks or months

00:09:22.560 --> 00:09:29.040
using the resources that we just saw. Though, uh, he was quick to point out

00:09:26.640 --> 00:09:33.600
that Jensen, if you're watching, there's no limit to how many GPUs his team could

00:09:31.279 --> 00:09:37.360
use to speed up the process. I told him I'd say that because the faster they can

00:09:35.440 --> 00:09:43.360
test a new data set, the faster they can tune it and ship it. Other times, the

00:09:41.040 --> 00:09:46.959
changes are more transformative, pun intended, like the move from a

00:09:44.959 --> 00:09:51.360
convolutional neural network to the more accurate transformer model that runs

00:09:48.959 --> 00:09:55.600
best on NVIDIA's newest cards. This can require basically a complete tear down

00:09:53.360 --> 00:10:00.000
and doover of the entire pipeline and can take a year or more. But I mean,

00:09:58.080 --> 00:10:03.360
hey, NVIDIA has bet pretty much their entire future on being able to stay at

00:10:01.760 --> 00:10:07.760
the forefront of their 21st century shovel technology. So, I guess that's

00:10:05.120 --> 00:10:11.040
the price you pay. But what does all this mean for traditional rendering?

00:10:09.360 --> 00:10:15.707
Well, the folks here believe very strongly that in time DLSS will not only

00:10:13.440 --> 00:10:19.440
be as good, it will be better than [music] traditional. In fact, Edward

00:10:17.600 --> 00:10:23.760
pointed to these slides from a talk that he gave a number of years ago, showing

00:10:21.360 --> 00:10:29.279
that what we think of as native rendering is already a pretty imperfect

00:10:26.640 --> 00:10:33.200
approximation. Here it is compared to a ground truth image, which is rendered at

00:10:31.519 --> 00:10:38.959
a much higher resolution than downsampled to 1080p. Native, it's

00:10:36.480 --> 00:10:44.320
clearly worse. And as you can see in this comparison, even back then with

00:10:40.959 --> 00:10:46.240
DLSS 2.0, So there were situations where

00:10:44.320 --> 00:10:52.000
a model trained on these higher resolution or more accurately higher

00:10:49.040 --> 00:10:56.972
sampled ground truth images could result in the GPU reconstructing an output with

00:10:54.800 --> 00:11:01.440
DLSS that's closer to ground truth [music] than the native image was. I

00:10:59.440 --> 00:11:04.560
asked, by the way, how are these ground truth images created? [music] and he

00:11:02.959 --> 00:11:09.519
said that typically it's actually just on run-of-the-mill gaming hardware, but

00:11:06.720 --> 00:11:13.839
instead of running at 60 or 120 frames per second, they'll sometimes be running

00:11:11.440 --> 00:11:19.839
tens or thousands of pixel samples to the point where we're talking more like

00:11:15.839 --> 00:11:22.480
60 to 120 seconds per frame or even

00:11:19.839 --> 00:11:26.320
more. Man though, if DLSS could consistently reconstruct that ground

00:11:24.800 --> 00:11:30.399
truth image from a lower resolution input every time, I'm sure no one would

00:11:28.480 --> 00:11:34.640
ever turn it off. Unfortunately for Edward and his team though, no one

00:11:32.320 --> 00:11:38.720
notices when DLSS is working perfectly. It's when it

00:11:36.640 --> 00:11:44.560
trips over itself that we tend to notice, and it still does do so

00:11:42.160 --> 00:11:48.720
fairly regularly. However, with that said, from my own experience, it has

00:11:46.880 --> 00:11:53.040
continued to improve at a pretty solid clip since its debut, and the rest of

00:11:50.640 --> 00:11:57.839
the industry has pretty much accepted that AI accelerated image enhancement is

00:11:55.760 --> 00:12:03.839
the future, whether every gamer wants it or not. So sales of shovels will likely

00:12:01.279 --> 00:12:07.760
continue until gamer morale improves. Good luck everyone. I hope it's not a

00:12:05.760 --> 00:12:13.200
bubble. This is a nice office. I'd hate for something to happen to it. Just like

00:12:10.160 --> 00:12:15.120
this is a nice segue to our sponsor,

00:12:13.200 --> 00:12:19.040
Squarespace. If you're building a brand or a business, it's important to have a

00:12:17.120 --> 00:12:23.279
website. Designing your own site can feel like a bit of a daunting task, but

00:12:21.120 --> 00:12:27.040
it really doesn't have to. Squarespace makes it easy to get your message across

00:12:24.959 --> 00:12:31.760
to potential customers and subscribers in a clean and digestible way. Their

00:12:29.360 --> 00:12:35.920
design intelligence tool utilizes AI and works with your own creativity to create

00:12:33.600 --> 00:12:40.399
a theme and style for your site that matches your personality. You can also

00:12:38.320 --> 00:12:46.079
use Squarespace to directly invoice your customers with payment options like a

00:12:42.560 --> 00:12:48.000
direct debit, Apple Pay, CLA, and more.

00:12:46.079 --> 00:12:53.200
And use their analytic tools to track sales, strategize, and continue to build

00:12:50.399 --> 00:12:56.959
your brand. There's millions of URLs available, and Squarespace's domain tool

00:12:55.279 --> 00:13:01.200
will help you search for the right address just for you. We've even used

00:12:59.279 --> 00:13:05.040
Squarespace for some of our own websites here. Start building your website today,

00:13:03.200 --> 00:13:10.160
and you'll receive 10% off your first purchase by visiting squarespace.com/LTT.

00:13:08.399 --> 00:13:15.519
If you guys enjoyed this video, maybe go check out the time we I don't know,

00:13:13.279 --> 00:13:20.079
something invidated. Let's do a throwback video. How about

00:13:17.519 --> 00:13:26.480
when I checked out the launch of G-Sync? The production values were lower, but

00:13:23.040 --> 00:13:26.480
hey, it was fun.
