1
00:00:00,080 --> 00:00:06,160
We have looked at a lot of balling GPUs over the years. Whether it's the six

2
00:00:04,319 --> 00:00:11,440
Titan V's we had for the six editors project, three GV100 Quadras for 12K

3
00:00:09,440 --> 00:00:17,199
ultrawide gaming, or even this unreleased mining GPU, the CMP 170HX.

4
00:00:15,519 --> 00:00:21,279
There are not a lot of cards out there that we have not been able to get our

5
00:00:18,720 --> 00:00:30,240
hands on in one way or another, except for one until now, the NVIDIA A100. This

6
00:00:26,240 --> 00:00:32,480
is their absolute top dog AI enterprise

7
00:00:30,240 --> 00:00:38,800
high performance compute big data analytics monster and they refused to

8
00:00:35,360 --> 00:00:44,280
send it to me. Well, I got one anyway,

9
00:00:38,800 --> 00:00:44,280
NVIDIA. So, deal with it.

10
00:00:50,559 --> 00:00:56,079
The first two questions on your mind are probably why we weren't able to get one

11
00:00:53,760 --> 00:00:59,680
of these and what ultimately changed that resulted in me holding one in my

12
00:00:57,840 --> 00:01:04,080
hands right now. The answer to the first one is that NVIDIA just plain doesn't

13
00:01:01,840 --> 00:01:08,720
seed these things to reviewers. And at a cost of about $10,000,

14
00:01:07,119 --> 00:01:14,640
it's not the sort of thing that I would just, you know, buy because I got that

15
00:01:11,760 --> 00:01:19,119
swagger. You know what I'm saying? As for how we got one, I can't tell you.

16
00:01:17,439 --> 00:01:24,080
And in fact, we even blacked out the serial number to prevent the fan who

17
00:01:21,520 --> 00:01:28,560
reached out offering to get us one from getting identified. This individual

18
00:01:26,720 --> 00:01:32,799
agreed to let us do anything we want with it. So, you can bet your butt we're

19
00:01:30,560 --> 00:01:37,439
going to be taking it apart. And all we had to offer in return was that we would

20
00:01:35,119 --> 00:01:41,280
test Ethereum mining on it, send a shroud that'll allow him to actually

21
00:01:39,040 --> 00:01:46,799
cool the thing, and reassemble it before we return it. So, let's compare it

22
00:01:42,880 --> 00:01:49,040
really quickly to the CMP 170HX, which

23
00:01:46,799 --> 00:01:54,799
is the most similar card that we have. It's this silver metal and it's not

24
00:01:51,200 --> 00:01:56,640
ribbed for my pleasure. Comfortable. And

25
00:01:54,799 --> 00:02:01,600
we actually have one other point of comparison. This isn't a perfect one.

26
00:01:58,560 --> 00:02:04,079
This is an RTX3090. And what would have

27
00:02:01,600 --> 00:02:08,879
been maybe more apt is the Quadro, or rather they dropped the Quadro branding,

28
00:02:05,600 --> 00:02:10,640
but the A6000. Unfortunately, that's

29
00:02:08,879 --> 00:02:14,160
another really expensive card that I don't have a legitimate reason to buy,

30
00:02:12,640 --> 00:02:18,160
and NVIDIA wouldn't send one of those for the comparison either. So, the specs

31
00:02:16,640 --> 00:02:25,760
on this are pretty similar. We're going to use it as a stand-in since we're not really looking at any workstation loads

32
00:02:21,200 --> 00:02:28,000
anyway. So, the A100, then this is a 40

33
00:02:25,760 --> 00:02:33,519
Gigabyte card. I'm going to let that sink in for a second. And the craziest

34
00:02:30,319 --> 00:02:35,519
part is that 40 gigs is not even enough

35
00:02:33,519 --> 00:02:39,840
for the kinds of workloads that these cards are used to crunch through. We're

36
00:02:37,519 --> 00:02:44,160
talking enormous data sets to the point where this 40 gig model is actually

37
00:02:41,840 --> 00:02:48,000
obsolete now replaced by an 80 gig model. And these NVL link bridge uh

38
00:02:46,720 --> 00:02:53,920
connectors on the top here. Let's go ahead and pull these off. These we go

39
00:02:51,440 --> 00:02:59,519
are used to link up multiples of these cards so they can all pull memory and

40
00:02:56,959 --> 00:03:05,440
work on even larger data sets. Now the die at the center of it is a 7nmter TSMC

41
00:03:02,800 --> 00:03:07,920
manufactured GPU called the GA 100. We're going to pop this shroud off.

42
00:03:06,480 --> 00:03:15,360
We're going to take a look at it. It has a base clock of just 765 MHz, but it'll

43
00:03:11,760 --> 00:03:18,720
boost up to 1410. That memory runs at a

44
00:03:15,360 --> 00:03:23,519
whopping 1.5 terabt a second of

45
00:03:18,720 --> 00:03:26,800
bandwidth on a massive 5,120

46
00:03:23,519 --> 00:03:30,879
bit bus. It's got 6,912

47
00:03:26,800 --> 00:03:33,760
CUDA cores and a what is it? Uh 250 watt

48
00:03:30,879 --> 00:03:37,200
TDP. She's packing. Oh, you're just going

49
00:03:35,519 --> 00:03:41,680
right for I'm going right for it. Oh jeez. This is Linus Tech Tips. And

50
00:03:39,519 --> 00:03:45,599
basically every part of this is identical to the CMP card. It kind of

51
00:03:44,080 --> 00:03:49,040
looks that way. I mean, the color is obviously different. Yeah. But it looks

52
00:03:47,200 --> 00:03:54,080
like the clam shell is two pieces in the same manner. There's no display outputs.

53
00:03:51,599 --> 00:03:58,959
The fins look the same. Now, here's something. The CMP card specifically

54
00:03:56,720 --> 00:04:02,239
didn't even contain the hardware for video encode, if I recall correctly.

55
00:04:00,720 --> 00:04:09,439
Yeah. This doesn't have NBN. Okay. So, it's not that it was fused off. It's that it's just plain not on the chip. on

56
00:04:05,680 --> 00:04:11,920
not on GA 100. Yeah. Okay. So, G102,

57
00:04:09,439 --> 00:04:18,079
which is like 3090. Yes. Does have it. God. And A6000. Okay. You ready? Uh oh

58
00:04:15,840 --> 00:04:22,320
god. So, yeah, it's like exactly the same on

59
00:04:19,919 --> 00:04:28,720
the inside. Same jank power connector. Wow, that is super jank. Check this out,

60
00:04:25,280 --> 00:04:31,680
guys. It uses a single 8 pin EPS power

61
00:04:28,720 --> 00:04:36,639
connector, which you might think is a PCIe power connector. See here, look,

62
00:04:34,160 --> 00:04:41,759
I'll show you guys. This is an 8 pin like normal GPU connector. But watch

63
00:04:39,919 --> 00:04:45,759
can no go in. But if we take the connector out of our CPU socket on the

64
00:04:43,840 --> 00:04:49,360
motherboard, there you go. Oh, well, the clips are

65
00:04:47,520 --> 00:04:52,720
interfering a little bit. I mean, what the what the heck is going on here,

66
00:04:51,040 --> 00:04:57,759
ladies and gentlemen? You need more power. Yeah, exactly. So, you can

67
00:04:54,880 --> 00:05:00,960
combine two PCIe connectors into that. Can't remember how to get it out of

68
00:04:58,960 --> 00:05:03,680
here. I see the fingerprint of the technician who assembled the card,

69
00:05:02,160 --> 00:05:06,880
though. Think we have to uncip this part first to just Oh, there's a little

70
00:05:05,759 --> 00:05:12,479
screw, right? Yeah, there's a little screw. Haha. Third type of screws.

71
00:05:10,080 --> 00:05:18,880
Yourself. You didn't see that one, nerd. You're a nerd. Your face is a nerd. Your

72
00:05:16,000 --> 00:05:22,960
butt's a nerd. Whoa. It's not coming off, Jake. What? You got to like tilt it

73
00:05:21,440 --> 00:05:31,960
out, buddy. Whoa, whoa, whoa. Don't pull the cooler off. See, it's like it's caught uh back here. I wouldn't re Hey.

74
00:05:28,080 --> 00:05:31,960
Oh, hey. How you doing? Jesus.

75
00:05:32,320 --> 00:05:37,280
stressful. Look, maybe if we break it, you'll

76
00:05:36,000 --> 00:05:40,560
actually have to buy one. I don't want to buy one. That's not the goal. What? I

77
00:05:39,280 --> 00:05:45,280
thought you put your hand up for a high five. I was like, what are you talking

78
00:05:42,639 --> 00:05:48,560
about? I don't want to buy one. Why not? Wo. What is going on here? You see that?

79
00:05:46,960 --> 00:05:53,280
It looks like there was a thermal pad there or something, but there isn't.

80
00:05:50,479 --> 00:05:57,280
It's like greasy. No, look at it closer. It's not greasy. It's like You see how

81
00:05:54,720 --> 00:06:01,039
this is like brushed almost or like like looks like somebody sand blasted it?

82
00:05:59,039 --> 00:06:05,120
That part's not. Oh, that's I don't remember that on this car. All right, so

83
00:06:03,120 --> 00:06:09,360
the spring loading mechanism is just from the bend of the back plate. That's

84
00:06:06,960 --> 00:06:12,720
kind of cool. So, I checked the CMP thing. Doesn't look like it. I wonder

85
00:06:11,199 --> 00:06:17,440
why they would have like a M. This doesn't look brushed at all. What did we

86
00:06:15,280 --> 00:06:20,880
last time we twisted? No, I don't think we did. Yeah, we did. I just looked. I

87
00:06:19,520 --> 00:06:24,160
just I'm pretty sure I just reamed on it. Oh my god. No, you were against

88
00:06:22,880 --> 00:06:30,000
reaming on it and then we were like, just twist a little. I'm reamer. Oh god.

89
00:06:27,440 --> 00:06:35,280
Ah, it has an IHS. It looks basically the same. Yeah, we're

90
00:06:32,479 --> 00:06:41,360
going to have to clean that off and see. There's not much alcohol. No, I like to

91
00:06:37,840 --> 00:06:42,960
go in dry first. So, yep, that's the

92
00:06:41,360 --> 00:06:47,280
same thing. All right. I mean, this isn't the first time NVIDIA has used the

93
00:06:45,120 --> 00:06:51,680
same silicon in two different products with two different capabilities. We see

94
00:06:49,759 --> 00:06:55,680
the same thing with their Quadro lineup versus their GeForce lineup where things

95
00:06:53,360 --> 00:06:58,960
will just be disabled through drivers or fusing off different functional units on

96
00:06:57,360 --> 00:07:02,639
the chip. What I want to know then is besides the lack of Envy Link connectors

97
00:07:00,880 --> 00:07:06,560
on this one, well, they are in there. They're just not accessible and they

98
00:07:04,000 --> 00:07:11,199
probably don't work, right? What is the actual difference in function between

99
00:07:08,639 --> 00:07:17,759
them? Well, this one doesn't have full PCIe 16X, right? Less memory. It's I

100
00:07:16,080 --> 00:07:22,080
think it has way less transistors, but it is still a G100. Yeah. So, the

101
00:07:20,240 --> 00:07:25,520
transistors are there. Yeah, they're probably just not functional. Let me see

102
00:07:24,000 --> 00:07:29,840
what the chip number is on that one. Yeah, cuz weren't we not even able to

103
00:07:27,039 --> 00:07:33,599
find a proper NVIDIA.com reference to this one anyway? So, we're just relying

104
00:07:31,520 --> 00:07:38,479
on someone else's spec sheet. So, the transistor count could just be wrong.

105
00:07:35,280 --> 00:07:40,479
Okay, so this is So, the CMP card was a

106
00:07:38,479 --> 00:07:49,759
GA Look at this guy. Yeah, what a weirdo. GA00 105F

107
00:07:44,160 --> 00:07:51,599
and this is a GA100 833. If it's a GA, I

108
00:07:49,759 --> 00:07:54,080
guess it could be a different GA. I don't know. Yeah, I mean it used to be

109
00:07:52,800 --> 00:07:58,800
back in the day you would assume that it's just using the same silicon as the GeForce cards because NVIDIA's data

110
00:07:57,520 --> 00:08:05,759
center business hadn't gotten that big yet. But nowadays they can totally justify an individual like new die

111
00:08:03,599 --> 00:08:09,199
design for a particular lineup of enterprise cards. And interestingly

112
00:08:07,120 --> 00:08:15,520
enough, the SXM version doesn't have an IHS. At least it seems that way. But the

113
00:08:12,479 --> 00:08:17,599
SXM version is also like 400 watts and

114
00:08:15,520 --> 00:08:21,599
this is like 250. Yeah. Totally different classes of capabilities. All

115
00:08:20,160 --> 00:08:24,960
right, let's put it back together then, shall we? I got you new goop. Goop me. I

116
00:08:23,680 --> 00:08:27,960
brought two goop. Going for the no look catch.

117
00:08:30,080 --> 00:08:36,479
Oh yeah, baby.

118
00:08:33,279 --> 00:08:38,719
X marks the spot, baby. My finest work.

119
00:08:36,479 --> 00:08:41,719
Maybe it'll perform better now. Probably not.

120
00:08:43,680 --> 00:08:49,360
We're backing it up.

121
00:08:46,800 --> 00:08:55,279
Cool story, bro. Thanks. Thanks, bro. Uh, where's our back plate? Did you take

122
00:08:52,160 --> 00:08:57,839
it? Oh, shoot. Yes, black. I thought it

123
00:08:55,279 --> 00:09:02,800
was gold. I was looking for gold. Aren't we all? I don't know about you,

124
00:08:59,600 --> 00:09:04,480
but I found my gold. What's What's that?

125
00:09:02,800 --> 00:09:08,640
Yvon, shut up. All right. All right. Let's get

126
00:09:07,120 --> 00:09:13,040
going here. Which one do you want to put on the bench first? What do you mean? We're not going to compare to that

127
00:09:11,120 --> 00:09:16,160
thing. Oh, it doesn't do doesn't do anything. Okay, so we don't need this

128
00:09:14,560 --> 00:09:22,000
thing. Here we go, boys. See you later. We can't put this in the first slot because we don't have a display up. But

129
00:09:18,959 --> 00:09:26,720
you like the bottom up? Your bottom?

130
00:09:22,000 --> 00:09:28,320
Sure. This Okay. This is how you flex it

131
00:09:26,720 --> 00:09:32,240
style. Now, you might have noticed at some point that the A100 doesn't have

132
00:09:30,000 --> 00:09:36,720
any sort of cooling fan. It's just one big fat long heat sink with a giant

133
00:09:34,959 --> 00:09:41,920
vapor chamber under it to spread the heat from that massive GPU. So, Jake

134
00:09:39,680 --> 00:09:45,600
actually designed uh what we call the Shroudinator. It allows us to take these

135
00:09:44,080 --> 00:09:49,040
two screws that are on the back of the card for securing it in a server chassis

136
00:09:47,440 --> 00:09:52,959
because that's how it's designed to be used. So, it's passive, but there's lots

137
00:09:51,120 --> 00:09:58,399
of air flow going through the chassis. And then lets us take those screw holes

138
00:09:55,680 --> 00:10:05,600
and mount a fan to the back of the card. It's frankly not amazing.

139
00:10:02,320 --> 00:10:08,000
What? No. That is aerodynamics at its

140
00:10:05,600 --> 00:10:13,440
peak. You should hire me to work on F1 cars. Okay. Yeah, not so much. Yeah, it

141
00:10:10,800 --> 00:10:17,120
it only blows probably more air out this end from the back pressure than it does

142
00:10:15,040 --> 00:10:22,320
out this end, but it's enough to cool it. I swear it is. Yeah. Uh let's go

143
00:10:19,920 --> 00:10:26,480
ahead and turn on the computer, shall we? Okay. So, a couple interesting

144
00:10:24,320 --> 00:10:30,160
points here. It wouldn't boot right off the bat. You have to enable above 4G

145
00:10:28,399 --> 00:10:35,040
decoding. And then I also had to go in and I think it's called like 4G MMIO or

146
00:10:33,519 --> 00:10:41,440
something like that. I had to set that to 42. Okay. The answer to the universe.

147
00:10:39,120 --> 00:10:47,839
Yes. Thank you. And they are both here. A100 PCIe 40 freaking gigabytes.

148
00:10:45,680 --> 00:10:51,600
I installed the like game ready driver for the 3090 and then I installed the

149
00:10:50,160 --> 00:10:55,600
data center driver and I think it overrode it, but the game ready driver

150
00:10:53,839 --> 00:11:00,240
it still showed as like active and you could do stuff with the A100 and vice

151
00:10:57,680 --> 00:11:05,760
versa. So, it's probably fine. Now, interestingly, the A100 doesn't show up

152
00:11:03,040 --> 00:11:09,040
in task manager at all. Did the CMP? I can't remember. No, no, I don't think it

153
00:11:07,600 --> 00:11:13,279
did actually. Anyways, what do you want to do in Blender? Classroom. BMW. BMW is

154
00:11:11,440 --> 00:11:17,279
probably too short. Yeah, let's do classroom. I think BMW on a 3090 is like

155
00:11:16,079 --> 00:11:22,160
15 seconds or something like that. Anyway, let's do classroom. That's also

156
00:11:19,279 --> 00:11:26,880
like the spiciest 3090 that you can get. Yeah, pretty much. It's just so thick.

157
00:11:24,640 --> 00:11:31,040
Why would you ever use it? Yeah, because you want Is it even doing anything?

158
00:11:28,720 --> 00:11:35,360
Like, here's one reason. Cuz you can do classroom renders in a minute and 18

159
00:11:33,839 --> 00:11:39,760
seconds. That's why. Okay. Well, what about the A100? Oh, the You didn't plug

160
00:11:37,120 --> 00:11:44,079
the fan in. Okay. Oh, whoops. How hot is this? Probably warm. Fortunately, it

161
00:11:41,920 --> 00:11:50,480
hasn't been doing anything. Time to beat is a minute and 18 seconds. So, let's go

162
00:11:46,800 --> 00:11:52,959
ahead and see how it does. It feels like

163
00:11:50,480 --> 00:11:56,959
this is the intake. I mean, it's hot, so like Oh, yeah. But Oh, it's it's going.

164
00:11:55,120 --> 00:12:01,120
It's going, Jake. It's going. You did good. It works enough. This should be

165
00:11:59,519 --> 00:12:05,839
like This is It should be way faster. Way huger GPU, right? It's actually

166
00:12:03,519 --> 00:12:12,480
slower. How much? Not by much. It's like a few seconds, but it's slower. So, it's

167
00:12:09,279 --> 00:12:15,040
worse in CUDA. What about optics? So,

168
00:12:12,480 --> 00:12:20,560
the interesting thing is this card doesn't have rateracing cores. The 3090

169
00:12:18,079 --> 00:12:24,800
does. So, you'd think that optics would only work on the 3090, right? Do you

170
00:12:22,639 --> 00:12:28,880
want me to just try the A100? Yeah, sure. Let's Yeah, it's still GPU

171
00:12:26,480 --> 00:12:33,760
compute. I mean, you got to give it to it in terms of efficiency. For real

172
00:12:31,519 --> 00:12:39,120
though, even running two renders to the 3090s one, the average power consumption

173
00:12:36,399 --> 00:12:44,720
here is still lower. Yeah. Well, and looking at while it's running, it's like

174
00:12:40,800 --> 00:12:47,279
150 watts. Yeah. Versus 350 or whatever

175
00:12:44,720 --> 00:12:52,800
it was on the 3090. Yeah. Ready to go again. Yep. Uh Okay. Oh my god, man.

176
00:12:51,360 --> 00:12:56,959
This thing is fast. What's the power cons?

177
00:12:55,279 --> 00:13:03,040
353. The fan is still like just I want one of

178
00:13:00,240 --> 00:13:05,760
these. This looks sick, dude. It's way faster. Yeah, there's no question. We

179
00:13:04,639 --> 00:13:10,959
don't even need to. It's going to be like 30 seconds. Yeah, not even close.

180
00:13:08,720 --> 00:13:14,399
So, do you want to know why? I would love to know why. You said it earlier.

181
00:13:13,040 --> 00:13:20,160
You just weren't really thinking about it. This has half the CUDA cores of a

182
00:13:16,959 --> 00:13:22,079
3090. It's like 7,000ish, I think. So,

183
00:13:20,160 --> 00:13:26,959
it's just full of like machine learning stuff. Yeah. So, it has basically half

184
00:13:25,040 --> 00:13:31,519
the CUDA cores. So, the fact that it is even close is kind of crazy in CUDA

185
00:13:28,959 --> 00:13:36,560
mode. But in optics, what I found out is optics will use the tensor cores for

186
00:13:34,079 --> 00:13:40,720
like AI denoising, but nothing else. You'll see in there. Um, so I I think

187
00:13:38,800 --> 00:13:46,800
it's falling back to CUDA for the other stuff. Got it. But the 3090 has ray

188
00:13:43,279 --> 00:13:49,040
tracing and tensor cores. So, right, it

189
00:13:46,800 --> 00:13:54,079
just demolishes. Uh, where's the thing where you can

190
00:13:51,360 --> 00:13:58,639
select apps and then tell it which GPU to use? Yeah, here we go. No. So, it

191
00:13:56,480 --> 00:14:02,959
will not allow you to select the A100 to run games even if we could pipe it

192
00:14:01,680 --> 00:14:08,959
through our onboard or through a different graphics card like we did with that direct mining card ages ago. No

193
00:14:07,120 --> 00:14:15,279
DirectX support whatsoever. Let's check it in GPUZ. So, way fewer CUDA cores.

194
00:14:12,320 --> 00:14:19,040
You can see that we go from over 10,000 to

195
00:14:17,120 --> 00:14:24,000
a lot less than 10,000. The pixel fill rate is actually higher. I guess that's

196
00:14:20,880 --> 00:14:27,839
your HPM2 memory talking.

197
00:14:24,000 --> 00:14:30,880
1.5 gigabytes per second. What's a 39?

198
00:14:27,839 --> 00:14:34,000
1.5 terabytes per second. It's like

199
00:14:30,880 --> 00:14:36,720
almost%. Yeah. 60% almost.

200
00:14:34,000 --> 00:14:42,639
Holy bananas. But what about the supported tech? Yeah. So, we can do

201
00:14:39,440 --> 00:14:44,800
CUDA, Open CL, Physex.

202
00:14:42,639 --> 00:14:51,440
Sure, we should set it as the PhysX code. Dedicated Physex card. All the rag

203
00:14:48,240 --> 00:14:55,120
dolls everywhere. And OpenGL, but not

204
00:14:51,440 --> 00:14:56,880
Direct Anything or Vulcan even. OpenGL.

205
00:14:55,120 --> 00:15:01,120
Now that's interesting. Go to the advanced tab. Yeah, cuz you can select

206
00:14:59,120 --> 00:15:06,160
like a specific DirectX version at the top under general. Like what about like

207
00:15:03,680 --> 00:15:11,279
DX12? What does it say? Device not found. It's the same as the mining card.

208
00:15:08,800 --> 00:15:15,600
It'll do open seal. So we can mine on it.

209
00:15:14,079 --> 00:15:19,839
All right. I mean, should we try that? Yeah, we could do mining or folding or

210
00:15:18,000 --> 00:15:24,399
Sure. I have a feeling it's going to kind of suck for that, too. Uh, there's

211
00:15:22,160 --> 00:15:30,160
no AI in mining. I don't think so. It's still a big GPU, dude. So, you can't

212
00:15:27,040 --> 00:15:32,000
Well, suck is relative, right? Like, for

213
00:15:30,160 --> 00:15:35,040
the price, you'd never buy. Oh, I think it might be better than the CMP card,

214
00:15:33,600 --> 00:15:38,720
though. Just a little bit. Shut up. I think so. So, the only thing you can

215
00:15:36,959 --> 00:15:43,120
adjust I think this is the same with the CMP card is the core clock and the power

216
00:15:41,519 --> 00:15:46,560
limit. You can't mess with the memory speed. And you can move the power limit

217
00:15:44,560 --> 00:15:52,480
only down, it looks like. Yeah. Top is the 390, bottom is the A100. Wow, that

218
00:15:48,800 --> 00:15:54,079
is a crap ton faster than a 3090. It's

219
00:15:52,480 --> 00:16:00,320
pretty much the same as a CMP, but look at the efficiency. 714 kilahash per

220
00:15:58,160 --> 00:16:05,759
watt. Uh, and I bet you if we lower the power limit to like 80. Uh, it's a

221
00:16:03,360 --> 00:16:09,519
little bit lower speed. Maybe we can go I don't know. We probably don't have to

222
00:16:07,120 --> 00:16:13,040
tinker with this too much. I mean, it doesn't draw that much power to begin

223
00:16:10,720 --> 00:16:16,800
with, I guess. Yeah, I think it's pretty freaking efficient right out of the box.

224
00:16:14,959 --> 00:16:22,560
I mean, the efficiency is better. It's a little bit better. But before it was

225
00:16:18,399 --> 00:16:25,680
doing 175 megaash roughly at 250 watts,

226
00:16:22,560 --> 00:16:28,800
so it's pretty damn good. 3090 you can

227
00:16:25,680 --> 00:16:30,560
probably do like 300 watts with 120

228
00:16:28,800 --> 00:16:35,199
megaash. That's We're running the folding client now. I've had it running

229
00:16:32,399 --> 00:16:38,959
for a few minutes and it's kind of hard to say. The thing with folding is based

230
00:16:37,440 --> 00:16:43,839
on whatever project you're running, which is whatever job the server has

231
00:16:41,279 --> 00:16:47,839
sent you to process, your points per day will be higher or lower. So, it's

232
00:16:45,440 --> 00:16:52,000
possible that the A100 got a job that rewards less points than the 3090 did,

233
00:16:50,399 --> 00:16:56,480
right? It does look like it's a bit higher, but you can see our 39, this is

234
00:16:54,480 --> 00:17:01,600
like a little like comparison app thing, um, is 31% lower than the average. So,

235
00:17:00,000 --> 00:17:08,640
it's probably just that this job doesn't give you that many points. Got it. The

236
00:17:04,000 --> 00:17:12,000
interesting part is the 3090 is drawing

237
00:17:08,640 --> 00:17:14,400
a lot. 400. Holy A 100 is drying.

238
00:17:12,000 --> 00:17:18,720
240, man. That's efficient. And performance

239
00:17:16,559 --> 00:17:22,079
per watt. Maybe gamers don't care that much. Actually, we know for a fact

240
00:17:20,160 --> 00:17:27,360
gamers don't care that much. In the data center, that's everything because the

241
00:17:24,720 --> 00:17:32,160
cost of the card is trivial compared to the cost of power delivery and cooling

242
00:17:30,160 --> 00:17:35,760
on a data center scale. Especially when you have eight of these with a 400 watt

243
00:17:34,400 --> 00:17:43,840
power budget like you would get on the SXM cards in a single chassis times 50

244
00:17:39,520 --> 00:17:46,559
chassis. Like that's a lot of power.

245
00:17:43,840 --> 00:17:51,360
Let's try something machine learning. Unfortunately, for obvious reasons, most

246
00:17:49,440 --> 00:17:55,039
machine learning or deep learning, whatever you want to call it, benchmarks

247
00:17:53,280 --> 00:17:58,400
don't run on Windows. So instead, I've switched over to Ubuntu and we've set up

248
00:17:56,960 --> 00:18:02,400
the CUDA toolkit which is going to include our GPU drivers that we need to

249
00:18:00,000 --> 00:18:05,840
even run the thing as well as Docker and the NVIDIA Docker container which will

250
00:18:04,240 --> 00:18:09,440
allow us to run the benchmark. We're going to be running the ResNet 50

251
00:18:07,520 --> 00:18:14,000
benchmark which runs within TensorFlow 2. This is a really really common

252
00:18:11,360 --> 00:18:19,039
benchmark for big data clusters and stuff except our cluster it's just one

253
00:18:17,039 --> 00:18:23,520
GPU. In a separate window I've got NVIDIA SMI running. It's kind of like

254
00:18:21,280 --> 00:18:27,840
the Linux version of MSI Afterburner, but it's made by NVIDIA, so not quite.

255
00:18:26,240 --> 00:18:31,280
But what it's good for is at least telling us our power and the memory

256
00:18:29,360 --> 00:18:35,280
usage, which we should see spike a lot when we run this benchmark. I took the

257
00:18:33,200 --> 00:18:38,559
liberty of precreating a command to run the benchmark. So, we're going to be

258
00:18:36,320 --> 00:18:42,080
running with XLA on to hopefully bump the numbers a bit. We will do that for

259
00:18:40,160 --> 00:18:46,400
the A100 as well, so no worries there. It should be the same as well as using a

260
00:18:44,160 --> 00:18:49,679
What do you want? Look, he he left cuz he didn't have time for this and now

261
00:18:47,679 --> 00:18:53,919
he's back. This is a the world's most expensive lint roller. I don't even

262
00:18:51,919 --> 00:18:57,600
remember what I was saying. Damn it. Distractions aside, we're going to be

263
00:18:55,440 --> 00:19:01,440
running with XLA on. That'll probably give us a bit higher number than you

264
00:18:59,360 --> 00:19:04,559
would normally. Um, but it is still accurate. And we're going to be running

265
00:19:02,640 --> 00:19:09,200
the same settings on the A100 as well. So, no concerns there. We'll also be

266
00:19:06,240 --> 00:19:13,440
using a batch size of 512 as well as FP16 rather than FP32. So, if you want

267
00:19:11,919 --> 00:19:20,240
to recreate these tests yourself, you totally can. Let's see what our 3090 can

268
00:19:16,000 --> 00:19:22,799
do. Look at that. 24 gigs of VRAM

269
00:19:20,240 --> 00:19:26,320
completely used. God, I don't I don't know if there's any

270
00:19:24,559 --> 00:19:31,200
application aside from like Premiere that will use all that VRAM. I'm sure

271
00:19:28,480 --> 00:19:36,720
Andy can attest to that. Okay. 1,400 images a second. That's

272
00:19:33,840 --> 00:19:42,160
pretty respectable. I think like a V100, which is the predecessor to the A100,

273
00:19:39,679 --> 00:19:46,799
does like less than a thousand. So, the fact that a 3090, which is a consumer

274
00:19:44,400 --> 00:19:55,360
gaming card, can pull off those kind of numbers is huge. Mind you, the wattage,

275
00:19:51,200 --> 00:19:56,720
412 watts, that's that's a lot of power.

276
00:19:55,360 --> 00:20:02,480
It'll be interesting to see how much more efficient the A100 is when we try that after. The test is done now, and

277
00:20:00,640 --> 00:20:06,480
the average total images per second is,435.

278
00:20:04,720 --> 00:20:10,000
It's pretty good. I've gone ahead and added our A100, so we can run the

279
00:20:08,240 --> 00:20:14,080
benchmarks on that instead. And I'm expecting this is going to be

280
00:20:11,679 --> 00:20:19,280
substantially more performant. So, it's the same test. I'm just going to run the

281
00:20:16,000 --> 00:20:21,440
command here. Got to wait a few seconds.

282
00:20:19,280 --> 00:20:26,160
We got NVIDIA SMI up again. You can see that it's just running on the A100. The

283
00:20:24,480 --> 00:20:33,960
RAM on the 3090 is not getting filled. We're just using that as a display output. Yeah. All 40 gigabytes used.

284
00:20:30,559 --> 00:20:33,960
That's crazy.

285
00:20:34,320 --> 00:20:42,240
If we thought the 3090 was fast, look at that, Andy. That's like a full,000

286
00:20:39,600 --> 00:20:47,520
images more. We're getting like 2400 instead of 1,400. And the icing on the

287
00:20:44,559 --> 00:20:53,840
cake, if you look at NVIDIA SMI, we're using like 250 watts instead of 400

288
00:20:51,840 --> 00:20:57,919
while getting like almost double the performance. That is nuts. Probably the

289
00:20:56,880 --> 00:21:02,799
coolest thing about this whole experience though is seeing the Ampear

290
00:20:59,919 --> 00:21:05,840
architecture on a 7nanmter manufacturing process. Cuz you got to remember, while

291
00:21:04,320 --> 00:21:10,159
none of this is applicable to our daily business, what this card does do is

292
00:21:08,080 --> 00:21:14,480
excite me for the next generation of NVIDIA GPUs. Because even though the

293
00:21:12,240 --> 00:21:20,080
word on the street is that the upcoming ADA love lace architecture is not going

294
00:21:16,559 --> 00:21:22,480
to be that different from Aier, consider

295
00:21:20,080 --> 00:21:28,240
this. NVIDIA's gaming lineup is built on Samsung's 8nm node, while the A100 is

296
00:21:25,520 --> 00:21:33,440
built on TSMC's 7 nanometer node. Now, we've talked a fair bit about how

297
00:21:30,320 --> 00:21:35,919
nanometers from one fab to another can't

298
00:21:33,440 --> 00:21:40,480
really be directly compared in that way. But what we can do is say that it is

299
00:21:38,240 --> 00:21:46,559
rumored that NVIDIA will be building the newer ADA love lace gaming GPUs on

300
00:21:43,280 --> 00:21:49,120
TSMC's 5nanmter node, which should

301
00:21:46,559 --> 00:21:52,559
perform even better than their 7nmter node. And if the efficiency improvements

302
00:21:51,200 --> 00:21:59,280
are anything like what we're seeing here, we are expecting those cards to be

303
00:21:54,960 --> 00:22:01,360
absolute freaking monsters. So, good

304
00:21:59,280 --> 00:22:05,919
luck buying one. Hey, at least you can buy one of these.

305
00:22:03,360 --> 00:22:11,120
We've got new pillows. That's right. This is the what are we calling it? The

306
00:22:08,080 --> 00:22:13,360
couch. The couch ripper. It's an AMD

307
00:22:11,120 --> 00:22:17,039
themed version of our CPU pillow with alpaca and regular filling blend. You

308
00:22:15,440 --> 00:22:23,520
guys enjoyed this video? Maybe go check out our previous video looking in more

309
00:22:18,720 --> 00:22:25,039
depth at the CMP 170HX.

310
00:22:23,520 --> 00:22:29,600
I like the silver better. If we were smart, we'd be mining on this, but we're

311
00:22:27,440 --> 00:22:31,840
not that smart. Well, you know, mining is