WEBVTT

00:00:00.080 --> 00:00:06.160
We have looked at a lot of balling GPUs over the years. Whether it's the six

00:00:04.319 --> 00:00:11.440
Titan V's we had for the six editors project, three GV100 Quadras for 12K

00:00:09.440 --> 00:00:17.199
ultrawide gaming, or even this unreleased mining GPU, the CMP 170HX.

00:00:15.519 --> 00:00:21.279
There are not a lot of cards out there that we have not been able to get our

00:00:18.720 --> 00:00:30.240
hands on in one way or another, except for one until now, the NVIDIA A100. This

00:00:26.240 --> 00:00:32.480
is their absolute top dog AI enterprise

00:00:30.240 --> 00:00:38.800
high performance compute big data analytics monster and they refused to

00:00:35.360 --> 00:00:44.280
send it to me. Well, I got one anyway,

00:00:38.800 --> 00:00:44.280
NVIDIA. So, deal with it.

00:00:50.559 --> 00:00:56.079
The first two questions on your mind are probably why we weren't able to get one

00:00:53.760 --> 00:00:59.680
of these and what ultimately changed that resulted in me holding one in my

00:00:57.840 --> 00:01:04.080
hands right now. The answer to the first one is that NVIDIA just plain doesn't

00:01:01.840 --> 00:01:08.720
seed these things to reviewers. And at a cost of about $10,000,

00:01:07.119 --> 00:01:14.640
it's not the sort of thing that I would just, you know, buy because I got that

00:01:11.760 --> 00:01:19.119
swagger. You know what I'm saying? As for how we got one, I can't tell you.

00:01:17.439 --> 00:01:24.080
And in fact, we even blacked out the serial number to prevent the fan who

00:01:21.520 --> 00:01:28.560
reached out offering to get us one from getting identified. This individual

00:01:26.720 --> 00:01:32.799
agreed to let us do anything we want with it. So, you can bet your butt we're

00:01:30.560 --> 00:01:37.439
going to be taking it apart. And all we had to offer in return was that we would

00:01:35.119 --> 00:01:41.280
test Ethereum mining on it, send a shroud that'll allow him to actually

00:01:39.040 --> 00:01:46.799
cool the thing, and reassemble it before we return it. So, let's compare it

00:01:42.880 --> 00:01:49.040
really quickly to the CMP 170HX, which

00:01:46.799 --> 00:01:54.799
is the most similar card that we have. It's this silver metal and it's not

00:01:51.200 --> 00:01:56.640
ribbed for my pleasure. Comfortable. And

00:01:54.799 --> 00:02:01.600
we actually have one other point of comparison. This isn't a perfect one.

00:01:58.560 --> 00:02:04.079
This is an RTX3090. And what would have

00:02:01.600 --> 00:02:08.879
been maybe more apt is the Quadro, or rather they dropped the Quadro branding,

00:02:05.600 --> 00:02:10.640
but the A6000. Unfortunately, that's

00:02:08.879 --> 00:02:14.160
another really expensive card that I don't have a legitimate reason to buy,

00:02:12.640 --> 00:02:18.160
and NVIDIA wouldn't send one of those for the comparison either. So, the specs

00:02:16.640 --> 00:02:25.760
on this are pretty similar. We're going to use it as a stand-in since we're not really looking at any workstation loads

00:02:21.200 --> 00:02:28.000
anyway. So, the A100, then this is a 40

00:02:25.760 --> 00:02:33.519
Gigabyte card. I'm going to let that sink in for a second. And the craziest

00:02:30.319 --> 00:02:35.519
part is that 40 gigs is not even enough

00:02:33.519 --> 00:02:39.840
for the kinds of workloads that these cards are used to crunch through. We're

00:02:37.519 --> 00:02:44.160
talking enormous data sets to the point where this 40 gig model is actually

00:02:41.840 --> 00:02:48.000
obsolete now replaced by an 80 gig model. And these NVL link bridge uh

00:02:46.720 --> 00:02:53.920
connectors on the top here. Let's go ahead and pull these off. These we go

00:02:51.440 --> 00:02:59.519
are used to link up multiples of these cards so they can all pull memory and

00:02:56.959 --> 00:03:05.440
work on even larger data sets. Now the die at the center of it is a 7nmter TSMC

00:03:02.800 --> 00:03:07.920
manufactured GPU called the GA 100. We're going to pop this shroud off.

00:03:06.480 --> 00:03:15.360
We're going to take a look at it. It has a base clock of just 765 MHz, but it'll

00:03:11.760 --> 00:03:18.720
boost up to 1410. That memory runs at a

00:03:15.360 --> 00:03:23.519
whopping 1.5 terabt a second of

00:03:18.720 --> 00:03:26.800
bandwidth on a massive 5,120

00:03:23.519 --> 00:03:30.879
bit bus. It's got 6,912

00:03:26.800 --> 00:03:33.760
CUDA cores and a what is it? Uh 250 watt

00:03:30.879 --> 00:03:37.200
TDP. She's packing. Oh, you're just going

00:03:35.519 --> 00:03:41.680
right for I'm going right for it. Oh jeez. This is Linus Tech Tips. And

00:03:39.519 --> 00:03:45.599
basically every part of this is identical to the CMP card. It kind of

00:03:44.080 --> 00:03:49.040
looks that way. I mean, the color is obviously different. Yeah. But it looks

00:03:47.200 --> 00:03:54.080
like the clam shell is two pieces in the same manner. There's no display outputs.

00:03:51.599 --> 00:03:58.959
The fins look the same. Now, here's something. The CMP card specifically

00:03:56.720 --> 00:04:02.239
didn't even contain the hardware for video encode, if I recall correctly.

00:04:00.720 --> 00:04:09.439
Yeah. This doesn't have NBN. Okay. So, it's not that it was fused off. It's that it's just plain not on the chip. on

00:04:05.680 --> 00:04:11.920
not on GA 100. Yeah. Okay. So, G102,

00:04:09.439 --> 00:04:18.079
which is like 3090. Yes. Does have it. God. And A6000. Okay. You ready? Uh oh

00:04:15.840 --> 00:04:22.320
god. So, yeah, it's like exactly the same on

00:04:19.919 --> 00:04:28.720
the inside. Same jank power connector. Wow, that is super jank. Check this out,

00:04:25.280 --> 00:04:31.680
guys. It uses a single 8 pin EPS power

00:04:28.720 --> 00:04:36.639
connector, which you might think is a PCIe power connector. See here, look,

00:04:34.160 --> 00:04:41.759
I'll show you guys. This is an 8 pin like normal GPU connector. But watch

00:04:39.919 --> 00:04:45.759
can no go in. But if we take the connector out of our CPU socket on the

00:04:43.840 --> 00:04:49.360
motherboard, there you go. Oh, well, the clips are

00:04:47.520 --> 00:04:52.720
interfering a little bit. I mean, what the what the heck is going on here,

00:04:51.040 --> 00:04:57.759
ladies and gentlemen? You need more power. Yeah, exactly. So, you can

00:04:54.880 --> 00:05:00.960
combine two PCIe connectors into that. Can't remember how to get it out of

00:04:58.960 --> 00:05:03.680
here. I see the fingerprint of the technician who assembled the card,

00:05:02.160 --> 00:05:06.880
though. Think we have to uncip this part first to just Oh, there's a little

00:05:05.759 --> 00:05:12.479
screw, right? Yeah, there's a little screw. Haha. Third type of screws.

00:05:10.080 --> 00:05:18.880
Yourself. You didn't see that one, nerd. You're a nerd. Your face is a nerd. Your

00:05:16.000 --> 00:05:22.960
butt's a nerd. Whoa. It's not coming off, Jake. What? You got to like tilt it

00:05:21.440 --> 00:05:31.960
out, buddy. Whoa, whoa, whoa. Don't pull the cooler off. See, it's like it's caught uh back here. I wouldn't re Hey.

00:05:28.080 --> 00:05:31.960
Oh, hey. How you doing? Jesus.

00:05:32.320 --> 00:05:37.280
stressful. Look, maybe if we break it, you'll

00:05:36.000 --> 00:05:40.560
actually have to buy one. I don't want to buy one. That's not the goal. What? I

00:05:39.280 --> 00:05:45.280
thought you put your hand up for a high five. I was like, what are you talking

00:05:42.639 --> 00:05:48.560
about? I don't want to buy one. Why not? Wo. What is going on here? You see that?

00:05:46.960 --> 00:05:53.280
It looks like there was a thermal pad there or something, but there isn't.

00:05:50.479 --> 00:05:57.280
It's like greasy. No, look at it closer. It's not greasy. It's like You see how

00:05:54.720 --> 00:06:01.039
this is like brushed almost or like like looks like somebody sand blasted it?

00:05:59.039 --> 00:06:05.120
That part's not. Oh, that's I don't remember that on this car. All right, so

00:06:03.120 --> 00:06:09.360
the spring loading mechanism is just from the bend of the back plate. That's

00:06:06.960 --> 00:06:12.720
kind of cool. So, I checked the CMP thing. Doesn't look like it. I wonder

00:06:11.199 --> 00:06:17.440
why they would have like a M. This doesn't look brushed at all. What did we

00:06:15.280 --> 00:06:20.880
last time we twisted? No, I don't think we did. Yeah, we did. I just looked. I

00:06:19.520 --> 00:06:24.160
just I'm pretty sure I just reamed on it. Oh my god. No, you were against

00:06:22.880 --> 00:06:30.000
reaming on it and then we were like, just twist a little. I'm reamer. Oh god.

00:06:27.440 --> 00:06:35.280
Ah, it has an IHS. It looks basically the same. Yeah, we're

00:06:32.479 --> 00:06:41.360
going to have to clean that off and see. There's not much alcohol. No, I like to

00:06:37.840 --> 00:06:42.960
go in dry first. So, yep, that's the

00:06:41.360 --> 00:06:47.280
same thing. All right. I mean, this isn't the first time NVIDIA has used the

00:06:45.120 --> 00:06:51.680
same silicon in two different products with two different capabilities. We see

00:06:49.759 --> 00:06:55.680
the same thing with their Quadro lineup versus their GeForce lineup where things

00:06:53.360 --> 00:06:58.960
will just be disabled through drivers or fusing off different functional units on

00:06:57.360 --> 00:07:02.639
the chip. What I want to know then is besides the lack of Envy Link connectors

00:07:00.880 --> 00:07:06.560
on this one, well, they are in there. They're just not accessible and they

00:07:04.000 --> 00:07:11.199
probably don't work, right? What is the actual difference in function between

00:07:08.639 --> 00:07:17.759
them? Well, this one doesn't have full PCIe 16X, right? Less memory. It's I

00:07:16.080 --> 00:07:22.080
think it has way less transistors, but it is still a G100. Yeah. So, the

00:07:20.240 --> 00:07:25.520
transistors are there. Yeah, they're probably just not functional. Let me see

00:07:24.000 --> 00:07:29.840
what the chip number is on that one. Yeah, cuz weren't we not even able to

00:07:27.039 --> 00:07:33.599
find a proper NVIDIA.com reference to this one anyway? So, we're just relying

00:07:31.520 --> 00:07:38.479
on someone else's spec sheet. So, the transistor count could just be wrong.

00:07:35.280 --> 00:07:40.479
Okay, so this is So, the CMP card was a

00:07:38.479 --> 00:07:49.759
GA Look at this guy. Yeah, what a weirdo. GA00 105F

00:07:44.160 --> 00:07:51.599
and this is a GA100 833. If it's a GA, I

00:07:49.759 --> 00:07:54.080
guess it could be a different GA. I don't know. Yeah, I mean it used to be

00:07:52.800 --> 00:07:58.800
back in the day you would assume that it's just using the same silicon as the GeForce cards because NVIDIA's data

00:07:57.520 --> 00:08:05.759
center business hadn't gotten that big yet. But nowadays they can totally justify an individual like new die

00:08:03.599 --> 00:08:09.199
design for a particular lineup of enterprise cards. And interestingly

00:08:07.120 --> 00:08:15.520
enough, the SXM version doesn't have an IHS. At least it seems that way. But the

00:08:12.479 --> 00:08:17.599
SXM version is also like 400 watts and

00:08:15.520 --> 00:08:21.599
this is like 250. Yeah. Totally different classes of capabilities. All

00:08:20.160 --> 00:08:24.960
right, let's put it back together then, shall we? I got you new goop. Goop me. I

00:08:23.680 --> 00:08:27.960
brought two goop. Going for the no look catch.

00:08:30.080 --> 00:08:36.479
Oh yeah, baby.

00:08:33.279 --> 00:08:38.719
X marks the spot, baby. My finest work.

00:08:36.479 --> 00:08:41.719
Maybe it'll perform better now. Probably not.

00:08:43.680 --> 00:08:49.360
We're backing it up.

00:08:46.800 --> 00:08:55.279
Cool story, bro. Thanks. Thanks, bro. Uh, where's our back plate? Did you take

00:08:52.160 --> 00:08:57.839
it? Oh, shoot. Yes, black. I thought it

00:08:55.279 --> 00:09:02.800
was gold. I was looking for gold. Aren't we all? I don't know about you,

00:08:59.600 --> 00:09:04.480
but I found my gold. What's What's that?

00:09:02.800 --> 00:09:08.640
Yvon, shut up. All right. All right. Let's get

00:09:07.120 --> 00:09:13.040
going here. Which one do you want to put on the bench first? What do you mean? We're not going to compare to that

00:09:11.120 --> 00:09:16.160
thing. Oh, it doesn't do doesn't do anything. Okay, so we don't need this

00:09:14.560 --> 00:09:22.000
thing. Here we go, boys. See you later. We can't put this in the first slot because we don't have a display up. But

00:09:18.959 --> 00:09:26.720
you like the bottom up? Your bottom?

00:09:22.000 --> 00:09:28.320
Sure. This Okay. This is how you flex it

00:09:26.720 --> 00:09:32.240
style. Now, you might have noticed at some point that the A100 doesn't have

00:09:30.000 --> 00:09:36.720
any sort of cooling fan. It's just one big fat long heat sink with a giant

00:09:34.959 --> 00:09:41.920
vapor chamber under it to spread the heat from that massive GPU. So, Jake

00:09:39.680 --> 00:09:45.600
actually designed uh what we call the Shroudinator. It allows us to take these

00:09:44.080 --> 00:09:49.040
two screws that are on the back of the card for securing it in a server chassis

00:09:47.440 --> 00:09:52.959
because that's how it's designed to be used. So, it's passive, but there's lots

00:09:51.120 --> 00:09:58.399
of air flow going through the chassis. And then lets us take those screw holes

00:09:55.680 --> 00:10:05.600
and mount a fan to the back of the card. It's frankly not amazing.

00:10:02.320 --> 00:10:08.000
What? No. That is aerodynamics at its

00:10:05.600 --> 00:10:13.440
peak. You should hire me to work on F1 cars. Okay. Yeah, not so much. Yeah, it

00:10:10.800 --> 00:10:17.120
it only blows probably more air out this end from the back pressure than it does

00:10:15.040 --> 00:10:22.320
out this end, but it's enough to cool it. I swear it is. Yeah. Uh let's go

00:10:19.920 --> 00:10:26.480
ahead and turn on the computer, shall we? Okay. So, a couple interesting

00:10:24.320 --> 00:10:30.160
points here. It wouldn't boot right off the bat. You have to enable above 4G

00:10:28.399 --> 00:10:35.040
decoding. And then I also had to go in and I think it's called like 4G MMIO or

00:10:33.519 --> 00:10:41.440
something like that. I had to set that to 42. Okay. The answer to the universe.

00:10:39.120 --> 00:10:47.839
Yes. Thank you. And they are both here. A100 PCIe 40 freaking gigabytes.

00:10:45.680 --> 00:10:51.600
I installed the like game ready driver for the 3090 and then I installed the

00:10:50.160 --> 00:10:55.600
data center driver and I think it overrode it, but the game ready driver

00:10:53.839 --> 00:11:00.240
it still showed as like active and you could do stuff with the A100 and vice

00:10:57.680 --> 00:11:05.760
versa. So, it's probably fine. Now, interestingly, the A100 doesn't show up

00:11:03.040 --> 00:11:09.040
in task manager at all. Did the CMP? I can't remember. No, no, I don't think it

00:11:07.600 --> 00:11:13.279
did actually. Anyways, what do you want to do in Blender? Classroom. BMW. BMW is

00:11:11.440 --> 00:11:17.279
probably too short. Yeah, let's do classroom. I think BMW on a 3090 is like

00:11:16.079 --> 00:11:22.160
15 seconds or something like that. Anyway, let's do classroom. That's also

00:11:19.279 --> 00:11:26.880
like the spiciest 3090 that you can get. Yeah, pretty much. It's just so thick.

00:11:24.640 --> 00:11:31.040
Why would you ever use it? Yeah, because you want Is it even doing anything?

00:11:28.720 --> 00:11:35.360
Like, here's one reason. Cuz you can do classroom renders in a minute and 18

00:11:33.839 --> 00:11:39.760
seconds. That's why. Okay. Well, what about the A100? Oh, the You didn't plug

00:11:37.120 --> 00:11:44.079
the fan in. Okay. Oh, whoops. How hot is this? Probably warm. Fortunately, it

00:11:41.920 --> 00:11:50.480
hasn't been doing anything. Time to beat is a minute and 18 seconds. So, let's go

00:11:46.800 --> 00:11:52.959
ahead and see how it does. It feels like

00:11:50.480 --> 00:11:56.959
this is the intake. I mean, it's hot, so like Oh, yeah. But Oh, it's it's going.

00:11:55.120 --> 00:12:01.120
It's going, Jake. It's going. You did good. It works enough. This should be

00:11:59.519 --> 00:12:05.839
like This is It should be way faster. Way huger GPU, right? It's actually

00:12:03.519 --> 00:12:12.480
slower. How much? Not by much. It's like a few seconds, but it's slower. So, it's

00:12:09.279 --> 00:12:15.040
worse in CUDA. What about optics? So,

00:12:12.480 --> 00:12:20.560
the interesting thing is this card doesn't have rateracing cores. The 3090

00:12:18.079 --> 00:12:24.800
does. So, you'd think that optics would only work on the 3090, right? Do you

00:12:22.639 --> 00:12:28.880
want me to just try the A100? Yeah, sure. Let's Yeah, it's still GPU

00:12:26.480 --> 00:12:33.760
compute. I mean, you got to give it to it in terms of efficiency. For real

00:12:31.519 --> 00:12:39.120
though, even running two renders to the 3090s one, the average power consumption

00:12:36.399 --> 00:12:44.720
here is still lower. Yeah. Well, and looking at while it's running, it's like

00:12:40.800 --> 00:12:47.279
150 watts. Yeah. Versus 350 or whatever

00:12:44.720 --> 00:12:52.800
it was on the 3090. Yeah. Ready to go again. Yep. Uh Okay. Oh my god, man.

00:12:51.360 --> 00:12:56.959
This thing is fast. What's the power cons?

00:12:55.279 --> 00:13:03.040
353. The fan is still like just I want one of

00:13:00.240 --> 00:13:05.760
these. This looks sick, dude. It's way faster. Yeah, there's no question. We

00:13:04.639 --> 00:13:10.959
don't even need to. It's going to be like 30 seconds. Yeah, not even close.

00:13:08.720 --> 00:13:14.399
So, do you want to know why? I would love to know why. You said it earlier.

00:13:13.040 --> 00:13:20.160
You just weren't really thinking about it. This has half the CUDA cores of a

00:13:16.959 --> 00:13:22.079
3090. It's like 7,000ish, I think. So,

00:13:20.160 --> 00:13:26.959
it's just full of like machine learning stuff. Yeah. So, it has basically half

00:13:25.040 --> 00:13:31.519
the CUDA cores. So, the fact that it is even close is kind of crazy in CUDA

00:13:28.959 --> 00:13:36.560
mode. But in optics, what I found out is optics will use the tensor cores for

00:13:34.079 --> 00:13:40.720
like AI denoising, but nothing else. You'll see in there. Um, so I I think

00:13:38.800 --> 00:13:46.800
it's falling back to CUDA for the other stuff. Got it. But the 3090 has ray

00:13:43.279 --> 00:13:49.040
tracing and tensor cores. So, right, it

00:13:46.800 --> 00:13:54.079
just demolishes. Uh, where's the thing where you can

00:13:51.360 --> 00:13:58.639
select apps and then tell it which GPU to use? Yeah, here we go. No. So, it

00:13:56.480 --> 00:14:02.959
will not allow you to select the A100 to run games even if we could pipe it

00:14:01.680 --> 00:14:08.959
through our onboard or through a different graphics card like we did with that direct mining card ages ago. No

00:14:07.120 --> 00:14:15.279
DirectX support whatsoever. Let's check it in GPUZ. So, way fewer CUDA cores.

00:14:12.320 --> 00:14:19.040
You can see that we go from over 10,000 to

00:14:17.120 --> 00:14:24.000
a lot less than 10,000. The pixel fill rate is actually higher. I guess that's

00:14:20.880 --> 00:14:27.839
your HPM2 memory talking.

00:14:24.000 --> 00:14:30.880
1.5 gigabytes per second. What's a 39?

00:14:27.839 --> 00:14:34.000
1.5 terabytes per second. It's like

00:14:30.880 --> 00:14:36.720
almost%. Yeah. 60% almost.

00:14:34.000 --> 00:14:42.639
Holy bananas. But what about the supported tech? Yeah. So, we can do

00:14:39.440 --> 00:14:44.800
CUDA, Open CL, Physex.

00:14:42.639 --> 00:14:51.440
Sure, we should set it as the PhysX code. Dedicated Physex card. All the rag

00:14:48.240 --> 00:14:55.120
dolls everywhere. And OpenGL, but not

00:14:51.440 --> 00:14:56.880
Direct Anything or Vulcan even. OpenGL.

00:14:55.120 --> 00:15:01.120
Now that's interesting. Go to the advanced tab. Yeah, cuz you can select

00:14:59.120 --> 00:15:06.160
like a specific DirectX version at the top under general. Like what about like

00:15:03.680 --> 00:15:11.279
DX12? What does it say? Device not found. It's the same as the mining card.

00:15:08.800 --> 00:15:15.600
It'll do open seal. So we can mine on it.

00:15:14.079 --> 00:15:19.839
All right. I mean, should we try that? Yeah, we could do mining or folding or

00:15:18.000 --> 00:15:24.399
Sure. I have a feeling it's going to kind of suck for that, too. Uh, there's

00:15:22.160 --> 00:15:30.160
no AI in mining. I don't think so. It's still a big GPU, dude. So, you can't

00:15:27.040 --> 00:15:32.000
Well, suck is relative, right? Like, for

00:15:30.160 --> 00:15:35.040
the price, you'd never buy. Oh, I think it might be better than the CMP card,

00:15:33.600 --> 00:15:38.720
though. Just a little bit. Shut up. I think so. So, the only thing you can

00:15:36.959 --> 00:15:43.120
adjust I think this is the same with the CMP card is the core clock and the power

00:15:41.519 --> 00:15:46.560
limit. You can't mess with the memory speed. And you can move the power limit

00:15:44.560 --> 00:15:52.480
only down, it looks like. Yeah. Top is the 390, bottom is the A100. Wow, that

00:15:48.800 --> 00:15:54.079
is a crap ton faster than a 3090. It's

00:15:52.480 --> 00:16:00.320
pretty much the same as a CMP, but look at the efficiency. 714 kilahash per

00:15:58.160 --> 00:16:05.759
watt. Uh, and I bet you if we lower the power limit to like 80. Uh, it's a

00:16:03.360 --> 00:16:09.519
little bit lower speed. Maybe we can go I don't know. We probably don't have to

00:16:07.120 --> 00:16:13.040
tinker with this too much. I mean, it doesn't draw that much power to begin

00:16:10.720 --> 00:16:16.800
with, I guess. Yeah, I think it's pretty freaking efficient right out of the box.

00:16:14.959 --> 00:16:22.560
I mean, the efficiency is better. It's a little bit better. But before it was

00:16:18.399 --> 00:16:25.680
doing 175 megaash roughly at 250 watts,

00:16:22.560 --> 00:16:28.800
so it's pretty damn good. 3090 you can

00:16:25.680 --> 00:16:30.560
probably do like 300 watts with 120

00:16:28.800 --> 00:16:35.199
megaash. That's We're running the folding client now. I've had it running

00:16:32.399 --> 00:16:38.959
for a few minutes and it's kind of hard to say. The thing with folding is based

00:16:37.440 --> 00:16:43.839
on whatever project you're running, which is whatever job the server has

00:16:41.279 --> 00:16:47.839
sent you to process, your points per day will be higher or lower. So, it's

00:16:45.440 --> 00:16:52.000
possible that the A100 got a job that rewards less points than the 3090 did,

00:16:50.399 --> 00:16:56.480
right? It does look like it's a bit higher, but you can see our 39, this is

00:16:54.480 --> 00:17:01.600
like a little like comparison app thing, um, is 31% lower than the average. So,

00:17:00.000 --> 00:17:08.640
it's probably just that this job doesn't give you that many points. Got it. The

00:17:04.000 --> 00:17:12.000
interesting part is the 3090 is drawing

00:17:08.640 --> 00:17:14.400
a lot. 400. Holy A 100 is drying.

00:17:12.000 --> 00:17:18.720
240, man. That's efficient. And performance

00:17:16.559 --> 00:17:22.079
per watt. Maybe gamers don't care that much. Actually, we know for a fact

00:17:20.160 --> 00:17:27.360
gamers don't care that much. In the data center, that's everything because the

00:17:24.720 --> 00:17:32.160
cost of the card is trivial compared to the cost of power delivery and cooling

00:17:30.160 --> 00:17:35.760
on a data center scale. Especially when you have eight of these with a 400 watt

00:17:34.400 --> 00:17:43.840
power budget like you would get on the SXM cards in a single chassis times 50

00:17:39.520 --> 00:17:46.559
chassis. Like that's a lot of power.

00:17:43.840 --> 00:17:51.360
Let's try something machine learning. Unfortunately, for obvious reasons, most

00:17:49.440 --> 00:17:55.039
machine learning or deep learning, whatever you want to call it, benchmarks

00:17:53.280 --> 00:17:58.400
don't run on Windows. So instead, I've switched over to Ubuntu and we've set up

00:17:56.960 --> 00:18:02.400
the CUDA toolkit which is going to include our GPU drivers that we need to

00:18:00.000 --> 00:18:05.840
even run the thing as well as Docker and the NVIDIA Docker container which will

00:18:04.240 --> 00:18:09.440
allow us to run the benchmark. We're going to be running the ResNet 50

00:18:07.520 --> 00:18:14.000
benchmark which runs within TensorFlow 2. This is a really really common

00:18:11.360 --> 00:18:19.039
benchmark for big data clusters and stuff except our cluster it's just one

00:18:17.039 --> 00:18:23.520
GPU. In a separate window I've got NVIDIA SMI running. It's kind of like

00:18:21.280 --> 00:18:27.840
the Linux version of MSI Afterburner, but it's made by NVIDIA, so not quite.

00:18:26.240 --> 00:18:31.280
But what it's good for is at least telling us our power and the memory

00:18:29.360 --> 00:18:35.280
usage, which we should see spike a lot when we run this benchmark. I took the

00:18:33.200 --> 00:18:38.559
liberty of precreating a command to run the benchmark. So, we're going to be

00:18:36.320 --> 00:18:42.080
running with XLA on to hopefully bump the numbers a bit. We will do that for

00:18:40.160 --> 00:18:46.400
the A100 as well, so no worries there. It should be the same as well as using a

00:18:44.160 --> 00:18:49.679
What do you want? Look, he he left cuz he didn't have time for this and now

00:18:47.679 --> 00:18:53.919
he's back. This is a the world's most expensive lint roller. I don't even

00:18:51.919 --> 00:18:57.600
remember what I was saying. Damn it. Distractions aside, we're going to be

00:18:55.440 --> 00:19:01.440
running with XLA on. That'll probably give us a bit higher number than you

00:18:59.360 --> 00:19:04.559
would normally. Um, but it is still accurate. And we're going to be running

00:19:02.640 --> 00:19:09.200
the same settings on the A100 as well. So, no concerns there. We'll also be

00:19:06.240 --> 00:19:13.440
using a batch size of 512 as well as FP16 rather than FP32. So, if you want

00:19:11.919 --> 00:19:20.240
to recreate these tests yourself, you totally can. Let's see what our 3090 can

00:19:16.000 --> 00:19:22.799
do. Look at that. 24 gigs of VRAM

00:19:20.240 --> 00:19:26.320
completely used. God, I don't I don't know if there's any

00:19:24.559 --> 00:19:31.200
application aside from like Premiere that will use all that VRAM. I'm sure

00:19:28.480 --> 00:19:36.720
Andy can attest to that. Okay. 1,400 images a second. That's

00:19:33.840 --> 00:19:42.160
pretty respectable. I think like a V100, which is the predecessor to the A100,

00:19:39.679 --> 00:19:46.799
does like less than a thousand. So, the fact that a 3090, which is a consumer

00:19:44.400 --> 00:19:55.360
gaming card, can pull off those kind of numbers is huge. Mind you, the wattage,

00:19:51.200 --> 00:19:56.720
412 watts, that's that's a lot of power.

00:19:55.360 --> 00:20:02.480
It'll be interesting to see how much more efficient the A100 is when we try that after. The test is done now, and

00:20:00.640 --> 00:20:06.480
the average total images per second is,435.

00:20:04.720 --> 00:20:10.000
It's pretty good. I've gone ahead and added our A100, so we can run the

00:20:08.240 --> 00:20:14.080
benchmarks on that instead. And I'm expecting this is going to be

00:20:11.679 --> 00:20:19.280
substantially more performant. So, it's the same test. I'm just going to run the

00:20:16.000 --> 00:20:21.440
command here. Got to wait a few seconds.

00:20:19.280 --> 00:20:26.160
We got NVIDIA SMI up again. You can see that it's just running on the A100. The

00:20:24.480 --> 00:20:33.960
RAM on the 3090 is not getting filled. We're just using that as a display output. Yeah. All 40 gigabytes used.

00:20:30.559 --> 00:20:33.960
That's crazy.

00:20:34.320 --> 00:20:42.240
If we thought the 3090 was fast, look at that, Andy. That's like a full,000

00:20:39.600 --> 00:20:47.520
images more. We're getting like 2400 instead of 1,400. And the icing on the

00:20:44.559 --> 00:20:53.840
cake, if you look at NVIDIA SMI, we're using like 250 watts instead of 400

00:20:51.840 --> 00:20:57.919
while getting like almost double the performance. That is nuts. Probably the

00:20:56.880 --> 00:21:02.799
coolest thing about this whole experience though is seeing the Ampear

00:20:59.919 --> 00:21:05.840
architecture on a 7nanmter manufacturing process. Cuz you got to remember, while

00:21:04.320 --> 00:21:10.159
none of this is applicable to our daily business, what this card does do is

00:21:08.080 --> 00:21:14.480
excite me for the next generation of NVIDIA GPUs. Because even though the

00:21:12.240 --> 00:21:20.080
word on the street is that the upcoming ADA love lace architecture is not going

00:21:16.559 --> 00:21:22.480
to be that different from Aier, consider

00:21:20.080 --> 00:21:28.240
this. NVIDIA's gaming lineup is built on Samsung's 8nm node, while the A100 is

00:21:25.520 --> 00:21:33.440
built on TSMC's 7 nanometer node. Now, we've talked a fair bit about how

00:21:30.320 --> 00:21:35.919
nanometers from one fab to another can't

00:21:33.440 --> 00:21:40.480
really be directly compared in that way. But what we can do is say that it is

00:21:38.240 --> 00:21:46.559
rumored that NVIDIA will be building the newer ADA love lace gaming GPUs on

00:21:43.280 --> 00:21:49.120
TSMC's 5nanmter node, which should

00:21:46.559 --> 00:21:52.559
perform even better than their 7nmter node. And if the efficiency improvements

00:21:51.200 --> 00:21:59.280
are anything like what we're seeing here, we are expecting those cards to be

00:21:54.960 --> 00:22:01.360
absolute freaking monsters. So, good

00:21:59.280 --> 00:22:05.919
luck buying one. Hey, at least you can buy one of these.

00:22:03.360 --> 00:22:11.120
We've got new pillows. That's right. This is the what are we calling it? The

00:22:08.080 --> 00:22:13.360
couch. The couch ripper. It's an AMD

00:22:11.120 --> 00:22:17.039
themed version of our CPU pillow with alpaca and regular filling blend. You

00:22:15.440 --> 00:22:23.520
guys enjoyed this video? Maybe go check out our previous video looking in more

00:22:18.720 --> 00:22:25.039
depth at the CMP 170HX.

00:22:23.520 --> 00:22:29.600
I like the silver better. If we were smart, we'd be mining on this, but we're

00:22:27.440 --> 00:22:31.840
not that smart. Well, you know, mining is
