1
00:00:00,120 --> 00:00:05,480
it's pretty clear where NVIDIA's priorities lie these days we're here at

2
00:00:04,000 --> 00:00:10,559
the computex booth of one of their Partners Gigabyte and this is the entire

3
00:00:09,000 --> 00:00:15,240
gaming showcase that's because they like the

4
00:00:13,200 --> 00:00:22,359
rest of the industry understand that the future of computing lies in the data

5
00:00:18,800 --> 00:00:25,160
center that is where the grace super

6
00:00:22,359 --> 00:00:31,400
chip comes in under each of these gigantic heat spreaders are 72 of

7
00:00:28,400 --> 00:00:33,440
NVIDIA's graay CPU course connected

8
00:00:31,400 --> 00:00:41,360
together using what NVIDIA calls the Envy link chipto chip interconnect for a

9
00:00:36,360 --> 00:00:44,320
total of 44 cores except that's just one

10
00:00:41,360 --> 00:00:51,320
of the nodes This Server from Gigabyte accepts not one not two but four of

11
00:00:48,120 --> 00:00:54,680
these modules in its four separate nodes

12
00:00:51,320 --> 00:00:59,600
that is an absolutely mindbending

13
00:00:54,680 --> 00:01:02,399
576 cores in a 2u server rack but these

14
00:00:59,600 --> 00:01:07,040
are not the types of CPUs that you have in your gaming PC at home those

15
00:01:04,400 --> 00:01:13,080
processors from the likes of AMD and Intel are based on the x86 architecture

16
00:01:10,479 --> 00:01:19,040
so similar to what Apple did with their M series M1 and M2 processors NVIDIA is

17
00:01:16,799 --> 00:01:25,280
making use of a different processor architecture called ARM and uh we

18
00:01:22,960 --> 00:01:28,720
actually did get permission to do this we're going to be taking a closer look

19
00:01:28,920 --> 00:01:37,479
here oh it doesn't look much like it but this

20
00:01:34,720 --> 00:01:42,159
is the same style of processor that you might find in your phone ARM processors

21
00:01:40,240 --> 00:01:45,920
have a lot of advantages first and foremost being that they're typically

22
00:01:43,759 --> 00:01:51,600
more power efficient thanks to their relatively lightweight and structure set

23
00:01:48,479 --> 00:01:54,560
so much so that NVIDIA claims these gray

24
00:01:51,600 --> 00:02:01,039
CPUs have twice the performance per watt of the latest x86 chips but the

25
00:01:58,640 --> 00:02:06,159
disadvantage is that also require software like your operating system and

26
00:02:04,039 --> 00:02:12,239
all the programs you need to run to be coded and compiled specifically for ARM

27
00:02:09,759 --> 00:02:16,599
now for the PC market because 86 has been the standard for so long it's

28
00:02:14,640 --> 00:02:21,440
difficult to justify switching over to ARM it would cost you so much in terms

29
00:02:18,599 --> 00:02:25,360
of backwards compatibility but in the data center the types of customers who

30
00:02:23,640 --> 00:02:30,200
are going to buy a processor like this are usually developing their own

31
00:02:27,080 --> 00:02:31,879
software anyway like let's say Google to

32
00:02:30,200 --> 00:02:37,360
run the algorithms that power Google search or YouTube recommendations for

33
00:02:34,680 --> 00:02:42,080
them switching over to ARM isn't as big a deal and in fact companies like Amazon

34
00:02:40,319 --> 00:02:49,680
who are developing their own ARM-based CPUs are already doing it and very

35
00:02:46,519 --> 00:02:52,400
effectively I mean hey if my next gaming

36
00:02:49,680 --> 00:02:57,519
CPU could be half the power draw and the same performance of my current one I'd

37
00:02:54,560 --> 00:03:01,840
be stoked but this is even better imagine if instead of one computer

38
00:02:59,400 --> 00:03:06,879
you're talking in thousands or tens of thousands the savings start to become so

39
00:03:04,680 --> 00:03:12,480
large that it's less a question of can we afford this migration and more a

40
00:03:08,959 --> 00:03:14,239
question of can we afford not to make it

41
00:03:12,480 --> 00:03:19,280
now I didn't ask permission for this part but nobody seems to be stopping me

42
00:03:16,519 --> 00:03:26,799
or even really paying attention to me so let's take apart Grace super

43
00:03:22,080 --> 00:03:33,680
chip on each gray super chip is up to

44
00:03:26,799 --> 00:03:36,640
480 GB of lpddr 5x ECC memory per CPU

45
00:03:33,680 --> 00:03:43,159
and what's really cool is that that can actually be accessed by either CPU over

46
00:03:40,080 --> 00:03:45,799
the Envy link interconnect that's how

47
00:03:43,159 --> 00:03:50,760
fast this new Envy link is the only downside to this approach since we're

48
00:03:48,040 --> 00:03:55,480
making comparisons to Apple is that just like with your M2 MacBook you better

49
00:03:53,480 --> 00:03:59,799
decide how much memory you want in your server right at the time you buy it

50
00:03:57,640 --> 00:04:05,159
unless you want to replace the entire ire compute engine while you perform a

51
00:04:02,200 --> 00:04:10,640
memory upgrade given that the rumored price of their h100 gpus is

52
00:04:08,360 --> 00:04:14,879
$11,000 I don't even want to know what this thing costs but hopefully you get a

53
00:04:12,760 --> 00:04:20,120
bit of a discount when you buy it together with the gray super chip CPU

54
00:04:18,000 --> 00:04:24,360
let me show you this can't believe they're letting me take this off the

55
00:04:24,360 --> 00:04:31,320
wall okay success we have dropped nothing

56
00:04:29,400 --> 00:04:39,400
important important so far today this is Grace Hopper on the one

57
00:04:36,440 --> 00:04:47,199
side we've got the same 72 core Grace ARM CPU that we just saw but on the

58
00:04:42,560 --> 00:04:50,880
other side the oo shiny latest NVIDIA

59
00:04:47,199 --> 00:04:53,479
h100 Hopper GPU you can probably see

60
00:04:50,880 --> 00:04:59,240
where this is going just like with the Dual CPU Grace module these two are also

61
00:04:57,080 --> 00:05:06,520
EnV link chipto chip interconnected meaning that the CPU and GPU have a

62
00:05:02,360 --> 00:05:08,440
whopping 900 gab per second of

63
00:05:06,520 --> 00:05:16,039
theoretical bandwidth to talk to each other so first some perspective a GPU

64
00:05:11,840 --> 00:05:19,080
using a full 16 Lane Gen 5 PCIe slot

65
00:05:16,039 --> 00:05:20,919
would only have about 64 GB a second of

66
00:05:19,080 --> 00:05:26,280
peak throughput that is 114th as much as this and that's far

67
00:05:24,160 --> 00:05:32,800
from the only mindbending number that this thing is capable of while the CPU

68
00:05:28,880 --> 00:05:36,360
side uses the same up to 480 GB of lpddr

69
00:05:32,800 --> 00:05:39,199
5x for the GPU side they need much

70
00:05:36,360 --> 00:05:44,720
faster hbm3 memory that runs at a whopping 4

71
00:05:41,160 --> 00:05:46,720
terabytes per second it's about four

72
00:05:44,720 --> 00:05:50,080
times faster that's why the memory needs to be right on the package right next to

73
00:05:49,319 --> 00:05:56,840
the GPU now all that is great and cool and

74
00:05:53,520 --> 00:05:58,960
all but hbm is very expensive and as you

75
00:05:56,840 --> 00:06:06,599
can see there's only so much space here so the H1 100 only gets 96 GB of memory

76
00:06:04,520 --> 00:06:11,680
okay yeah for gaming that certainly sounds like a lot but AI data sets can

77
00:06:09,440 --> 00:06:17,160
involve terabytes of data so it can get used up very quickly that's where the

78
00:06:14,080 --> 00:06:20,000
interconnect comes in it allows the GPU

79
00:06:17,160 --> 00:06:25,560
to access the CPU's memory in a very direct and transparent way giving the

80
00:06:22,599 --> 00:06:31,240
h100 hopper GPU a functional memory capacity of nearly

81
00:06:27,280 --> 00:06:33,720
600 GB in Practical terms according to

82
00:06:31,240 --> 00:06:40,479
NVIDIA that puts Grace Hopper anywhere from about 2 and 1/2 times to nearly

83
00:06:36,240 --> 00:06:44,319
four times as fast as an x86 CPU paired

84
00:06:40,479 --> 00:06:46,800
with their last generation a100 GPU and

85
00:06:44,319 --> 00:06:51,599
where things get really wild is in the data center with an Envy link switch

86
00:06:48,800 --> 00:06:59,240
system you could connect up to 256 gpus together giving them access to

87
00:06:55,199 --> 00:07:01,280
up to 150 terab of high bandwidth memory

88
00:06:59,240 --> 00:07:05,840
I mean you guys remember that crazy Mars Lander demo that we showed off on the

89
00:07:03,000 --> 00:07:11,759
paby of flash array you could load that entire 1 billion Point data set into

90
00:07:08,919 --> 00:07:17,360
memory in that configuration and still have 50 tabt to spare now this module

91
00:07:15,759 --> 00:07:25,680
little bit more power hungry than the Dual CPU version 1,000 versus 500 watts

92
00:07:21,479 --> 00:07:28,319
per module but I mean that's for CPU GPU

93
00:07:25,680 --> 00:07:32,479
and RAM for both of them and with this kind of performance

94
00:07:30,440 --> 00:07:37,759
of course not everybody wants to move to an ARM hybrid CPU GPU architecture so

95
00:07:35,680 --> 00:07:45,319
NVIDIA is still going to be supporting their uh oldfashioned configurations be

96
00:07:41,160 --> 00:07:49,879
they h100 gpus in a PCIe form factor or

97
00:07:45,319 --> 00:07:53,919
their hgx h100 with up to eight SMX 5

98
00:07:49,879 --> 00:07:56,840
gpus each of these draws a massive 700

99
00:07:53,919 --> 00:08:03,319
Watts making an RTX 490 look like a child's play thing and supports n link

100
00:08:00,080 --> 00:08:08,000
between these gpus and envy switch to

101
00:08:03,319 --> 00:08:09,759
additional servers this is the G 593 sd0

102
00:08:08,000 --> 00:08:17,120
and Gigabyte was very proud of the fact that they are the first NVIDIA certified

103
00:08:12,400 --> 00:08:19,720
hgx h100 8gpu server in a 5u chassis man

104
00:08:17,120 --> 00:08:22,560
that is a lot of compute in a tiny space Jake's in my ear here telling me I

105
00:08:21,159 --> 00:08:25,759
should pull one of the power supplies but if you've noticed it getting darker

106
00:08:24,280 --> 00:08:29,159
it's because they're actually shutting down the pre-show and uh they're trying

107
00:08:27,840 --> 00:08:33,760
to get us out of here but there is one more thing that we wanted to talk about

108
00:08:30,879 --> 00:08:38,919
where'd it go dang it Jake no oh my God oh my God okay well this is uh no wait

109
00:08:37,519 --> 00:08:43,000
this isn't the one I wanted okay it's a connect X7 this is an even faster

110
00:08:40,959 --> 00:08:48,320
network card so this is probably the first NVIDIA developed melanox network

111
00:08:46,200 --> 00:08:52,880
card given that uh the acquisition was what about two years ago yeah conx was

112
00:08:50,160 --> 00:08:59,959
already out yeah but NVIDIA didn't buy melanox just to make faster connectx

113
00:08:56,080 --> 00:09:02,920
cards no it was to make these

114
00:08:59,959 --> 00:09:07,760
this is a Bluefield 3 so it has networking on it this is a 100 GB one

115
00:09:05,519 --> 00:09:13,480
but it's available it speeds up to 400 gbit but what's really special about it

116
00:09:10,279 --> 00:09:16,600
is that it has up to 16 processing cores

117
00:09:13,480 --> 00:09:18,680
on it why you might ask well just like

118
00:09:16,600 --> 00:09:23,720
in the old days when we started offloading tcpip processing to our

119
00:09:21,360 --> 00:09:28,640
network cards rather than having our CPU handle them this is going to offload all

120
00:09:26,519 --> 00:09:33,279
kinds of interesting things like encryption of your network traffic or

121
00:09:30,880 --> 00:09:37,440
say for example handling managing your file system because when you're someone

122
00:09:35,160 --> 00:09:41,680
like an AWS and you want to squeeze as much revenue as possible out of every

123
00:09:39,839 --> 00:09:46,640
CPU in your data center you don't want it handling stupid BS that you could

124
00:09:44,040 --> 00:09:51,519
just offload to your network card so the idea here is to free up CPU resources

125
00:09:49,399 --> 00:09:55,920
that can be leased to customers by putting them onto the network card

126
00:09:53,560 --> 00:10:00,560
itself and this is especially true for software where the developer sells you a

127
00:09:57,800 --> 00:10:04,880
license per core that's why even though these are going

128
00:10:01,640 --> 00:10:08,480
to be wildly expensive a lot more than

129
00:10:04,880 --> 00:10:10,839
the 4060 TI NVIDIA is going to sell shed

130
00:10:08,480 --> 00:10:15,800
loads of them just like I sold this segue to our sponsor pulseway are you

131
00:10:14,079 --> 00:10:20,600
sick of feeling like a prisoner changeed to a desk managing it systems Unleash

132
00:10:18,440 --> 00:10:24,360
Your Inner it hero with pulseway remote monitoring and management software

133
00:10:22,440 --> 00:10:27,680
pulseway platform gives you the power to manage your it infrastructure from

134
00:10:26,040 --> 00:10:31,720
anywhere even from the comfort of your own couch and with realtime alerts and

135
00:10:29,800 --> 00:10:35,480
notifications you can be the first to know about potential issues before

136
00:10:33,480 --> 00:10:38,959
anyone else on your team it's accessible through whatever device is close to you

137
00:10:37,120 --> 00:10:43,279
thanks to their convenient apps allowing you to control your it systems like a

138
00:10:40,880 --> 00:10:47,040
boss even if you're lounging in your pjs so say goodbye to the boring routine of

139
00:10:44,959 --> 00:10:50,880
it management and hello to the fun of being an IT hero with pulseway advanced

140
00:10:48,959 --> 00:10:54,639
technology don't wait this is your chance to become a legend in the IT

141
00:10:52,399 --> 00:10:58,160
world just try pulseway for free today and experience the power of simplified

142
00:10:56,399 --> 00:11:02,200
it infrastructure management click the link below to get started if you guys

143
00:11:00,680 --> 00:11:06,880
enjoyed this video why don't you check out oh the paby of flash that was a good

144
00:11:05,320 --> 00:11:14,720
one well we're at the Gigabyte Boo come on uh the g- one yeah g oh actually no

145
00:11:09,760 --> 00:11:18,160
new wanic new new new wanic 3 wanic 4 I

146
00:11:14,720 --> 00:11:18,160
mean damn it