WEBVTT

00:00:00.120 --> 00:00:05.480
it's pretty clear where NVIDIA's priorities lie these days we're here at

00:00:04.000 --> 00:00:10.559
the computex booth of one of their Partners Gigabyte and this is the entire

00:00:09.000 --> 00:00:15.240
gaming showcase that's because they like the

00:00:13.200 --> 00:00:22.359
rest of the industry understand that the future of computing lies in the data

00:00:18.800 --> 00:00:25.160
center that is where the grace super

00:00:22.359 --> 00:00:31.400
chip comes in under each of these gigantic heat spreaders are 72 of

00:00:28.400 --> 00:00:33.440
NVIDIA's graay CPU course connected

00:00:31.400 --> 00:00:41.360
together using what NVIDIA calls the Envy link chipto chip interconnect for a

00:00:36.360 --> 00:00:44.320
total of 44 cores except that's just one

00:00:41.360 --> 00:00:51.320
of the nodes This Server from Gigabyte accepts not one not two but four of

00:00:48.120 --> 00:00:54.680
these modules in its four separate nodes

00:00:51.320 --> 00:00:59.600
that is an absolutely mindbending

00:00:54.680 --> 00:01:02.399
576 cores in a 2u server rack but these

00:00:59.600 --> 00:01:07.040
are not the types of CPUs that you have in your gaming PC at home those

00:01:04.400 --> 00:01:13.080
processors from the likes of AMD and Intel are based on the x86 architecture

00:01:10.479 --> 00:01:19.040
so similar to what Apple did with their M series M1 and M2 processors NVIDIA is

00:01:16.799 --> 00:01:25.280
making use of a different processor architecture called ARM and uh we

00:01:22.960 --> 00:01:28.720
actually did get permission to do this we're going to be taking a closer look

00:01:28.920 --> 00:01:37.479
here oh it doesn't look much like it but this

00:01:34.720 --> 00:01:42.159
is the same style of processor that you might find in your phone ARM processors

00:01:40.240 --> 00:01:45.920
have a lot of advantages first and foremost being that they're typically

00:01:43.759 --> 00:01:51.600
more power efficient thanks to their relatively lightweight and structure set

00:01:48.479 --> 00:01:54.560
so much so that NVIDIA claims these gray

00:01:51.600 --> 00:02:01.039
CPUs have twice the performance per watt of the latest x86 chips but the

00:01:58.640 --> 00:02:06.159
disadvantage is that also require software like your operating system and

00:02:04.039 --> 00:02:12.239
all the programs you need to run to be coded and compiled specifically for ARM

00:02:09.759 --> 00:02:16.599
now for the PC market because 86 has been the standard for so long it's

00:02:14.640 --> 00:02:21.440
difficult to justify switching over to ARM it would cost you so much in terms

00:02:18.599 --> 00:02:25.360
of backwards compatibility but in the data center the types of customers who

00:02:23.640 --> 00:02:30.200
are going to buy a processor like this are usually developing their own

00:02:27.080 --> 00:02:31.879
software anyway like let's say Google to

00:02:30.200 --> 00:02:37.360
run the algorithms that power Google search or YouTube recommendations for

00:02:34.680 --> 00:02:42.080
them switching over to ARM isn't as big a deal and in fact companies like Amazon

00:02:40.319 --> 00:02:49.680
who are developing their own ARM-based CPUs are already doing it and very

00:02:46.519 --> 00:02:52.400
effectively I mean hey if my next gaming

00:02:49.680 --> 00:02:57.519
CPU could be half the power draw and the same performance of my current one I'd

00:02:54.560 --> 00:03:01.840
be stoked but this is even better imagine if instead of one computer

00:02:59.400 --> 00:03:06.879
you're talking in thousands or tens of thousands the savings start to become so

00:03:04.680 --> 00:03:12.480
large that it's less a question of can we afford this migration and more a

00:03:08.959 --> 00:03:14.239
question of can we afford not to make it

00:03:12.480 --> 00:03:19.280
now I didn't ask permission for this part but nobody seems to be stopping me

00:03:16.519 --> 00:03:26.799
or even really paying attention to me so let's take apart Grace super

00:03:22.080 --> 00:03:33.680
chip on each gray super chip is up to

00:03:26.799 --> 00:03:36.640
480 GB of lpddr 5x ECC memory per CPU

00:03:33.680 --> 00:03:43.159
and what's really cool is that that can actually be accessed by either CPU over

00:03:40.080 --> 00:03:45.799
the Envy link interconnect that's how

00:03:43.159 --> 00:03:50.760
fast this new Envy link is the only downside to this approach since we're

00:03:48.040 --> 00:03:55.480
making comparisons to Apple is that just like with your M2 MacBook you better

00:03:53.480 --> 00:03:59.799
decide how much memory you want in your server right at the time you buy it

00:03:57.640 --> 00:04:05.159
unless you want to replace the entire ire compute engine while you perform a

00:04:02.200 --> 00:04:10.640
memory upgrade given that the rumored price of their h100 gpus is

00:04:08.360 --> 00:04:14.879
$11,000 I don't even want to know what this thing costs but hopefully you get a

00:04:12.760 --> 00:04:20.120
bit of a discount when you buy it together with the gray super chip CPU

00:04:18.000 --> 00:04:24.360
let me show you this can't believe they're letting me take this off the

00:04:24.360 --> 00:04:31.320
wall okay success we have dropped nothing

00:04:29.400 --> 00:04:39.400
important important so far today this is Grace Hopper on the one

00:04:36.440 --> 00:04:47.199
side we've got the same 72 core Grace ARM CPU that we just saw but on the

00:04:42.560 --> 00:04:50.880
other side the oo shiny latest NVIDIA

00:04:47.199 --> 00:04:53.479
h100 Hopper GPU you can probably see

00:04:50.880 --> 00:04:59.240
where this is going just like with the Dual CPU Grace module these two are also

00:04:57.080 --> 00:05:06.520
EnV link chipto chip interconnected meaning that the CPU and GPU have a

00:05:02.360 --> 00:05:08.440
whopping 900 gab per second of

00:05:06.520 --> 00:05:16.039
theoretical bandwidth to talk to each other so first some perspective a GPU

00:05:11.840 --> 00:05:19.080
using a full 16 Lane Gen 5 PCIe slot

00:05:16.039 --> 00:05:20.919
would only have about 64 GB a second of

00:05:19.080 --> 00:05:26.280
peak throughput that is 114th as much as this and that's far

00:05:24.160 --> 00:05:32.800
from the only mindbending number that this thing is capable of while the CPU

00:05:28.880 --> 00:05:36.360
side uses the same up to 480 GB of lpddr

00:05:32.800 --> 00:05:39.199
5x for the GPU side they need much

00:05:36.360 --> 00:05:44.720
faster hbm3 memory that runs at a whopping 4

00:05:41.160 --> 00:05:46.720
terabytes per second it's about four

00:05:44.720 --> 00:05:50.080
times faster that's why the memory needs to be right on the package right next to

00:05:49.319 --> 00:05:56.840
the GPU now all that is great and cool and

00:05:53.520 --> 00:05:58.960
all but hbm is very expensive and as you

00:05:56.840 --> 00:06:06.599
can see there's only so much space here so the H1 100 only gets 96 GB of memory

00:06:04.520 --> 00:06:11.680
okay yeah for gaming that certainly sounds like a lot but AI data sets can

00:06:09.440 --> 00:06:17.160
involve terabytes of data so it can get used up very quickly that's where the

00:06:14.080 --> 00:06:20.000
interconnect comes in it allows the GPU

00:06:17.160 --> 00:06:25.560
to access the CPU's memory in a very direct and transparent way giving the

00:06:22.599 --> 00:06:31.240
h100 hopper GPU a functional memory capacity of nearly

00:06:27.280 --> 00:06:33.720
600 GB in Practical terms according to

00:06:31.240 --> 00:06:40.479
NVIDIA that puts Grace Hopper anywhere from about 2 and 1/2 times to nearly

00:06:36.240 --> 00:06:44.319
four times as fast as an x86 CPU paired

00:06:40.479 --> 00:06:46.800
with their last generation a100 GPU and

00:06:44.319 --> 00:06:51.599
where things get really wild is in the data center with an Envy link switch

00:06:48.800 --> 00:06:59.240
system you could connect up to 256 gpus together giving them access to

00:06:55.199 --> 00:07:01.280
up to 150 terab of high bandwidth memory

00:06:59.240 --> 00:07:05.840
I mean you guys remember that crazy Mars Lander demo that we showed off on the

00:07:03.000 --> 00:07:11.759
paby of flash array you could load that entire 1 billion Point data set into

00:07:08.919 --> 00:07:17.360
memory in that configuration and still have 50 tabt to spare now this module

00:07:15.759 --> 00:07:25.680
little bit more power hungry than the Dual CPU version 1,000 versus 500 watts

00:07:21.479 --> 00:07:28.319
per module but I mean that's for CPU GPU

00:07:25.680 --> 00:07:32.479
and RAM for both of them and with this kind of performance

00:07:30.440 --> 00:07:37.759
of course not everybody wants to move to an ARM hybrid CPU GPU architecture so

00:07:35.680 --> 00:07:45.319
NVIDIA is still going to be supporting their uh oldfashioned configurations be

00:07:41.160 --> 00:07:49.879
they h100 gpus in a PCIe form factor or

00:07:45.319 --> 00:07:53.919
their hgx h100 with up to eight SMX 5

00:07:49.879 --> 00:07:56.840
gpus each of these draws a massive 700

00:07:53.919 --> 00:08:03.319
Watts making an RTX 490 look like a child's play thing and supports n link

00:08:00.080 --> 00:08:08.000
between these gpus and envy switch to

00:08:03.319 --> 00:08:09.759
additional servers this is the G 593 sd0

00:08:08.000 --> 00:08:17.120
and Gigabyte was very proud of the fact that they are the first NVIDIA certified

00:08:12.400 --> 00:08:19.720
hgx h100 8gpu server in a 5u chassis man

00:08:17.120 --> 00:08:22.560
that is a lot of compute in a tiny space Jake's in my ear here telling me I

00:08:21.159 --> 00:08:25.759
should pull one of the power supplies but if you've noticed it getting darker

00:08:24.280 --> 00:08:29.159
it's because they're actually shutting down the pre-show and uh they're trying

00:08:27.840 --> 00:08:33.760
to get us out of here but there is one more thing that we wanted to talk about

00:08:30.879 --> 00:08:38.919
where'd it go dang it Jake no oh my God oh my God okay well this is uh no wait

00:08:37.519 --> 00:08:43.000
this isn't the one I wanted okay it's a connect X7 this is an even faster

00:08:40.959 --> 00:08:48.320
network card so this is probably the first NVIDIA developed melanox network

00:08:46.200 --> 00:08:52.880
card given that uh the acquisition was what about two years ago yeah conx was

00:08:50.160 --> 00:08:59.959
already out yeah but NVIDIA didn't buy melanox just to make faster connectx

00:08:56.080 --> 00:09:02.920
cards no it was to make these

00:08:59.959 --> 00:09:07.760
this is a Bluefield 3 so it has networking on it this is a 100 GB one

00:09:05.519 --> 00:09:13.480
but it's available it speeds up to 400 gbit but what's really special about it

00:09:10.279 --> 00:09:16.600
is that it has up to 16 processing cores

00:09:13.480 --> 00:09:18.680
on it why you might ask well just like

00:09:16.600 --> 00:09:23.720
in the old days when we started offloading tcpip processing to our

00:09:21.360 --> 00:09:28.640
network cards rather than having our CPU handle them this is going to offload all

00:09:26.519 --> 00:09:33.279
kinds of interesting things like encryption of your network traffic or

00:09:30.880 --> 00:09:37.440
say for example handling managing your file system because when you're someone

00:09:35.160 --> 00:09:41.680
like an AWS and you want to squeeze as much revenue as possible out of every

00:09:39.839 --> 00:09:46.640
CPU in your data center you don't want it handling stupid BS that you could

00:09:44.040 --> 00:09:51.519
just offload to your network card so the idea here is to free up CPU resources

00:09:49.399 --> 00:09:55.920
that can be leased to customers by putting them onto the network card

00:09:53.560 --> 00:10:00.560
itself and this is especially true for software where the developer sells you a

00:09:57.800 --> 00:10:04.880
license per core that's why even though these are going

00:10:01.640 --> 00:10:08.480
to be wildly expensive a lot more than

00:10:04.880 --> 00:10:10.839
the 4060 TI NVIDIA is going to sell shed

00:10:08.480 --> 00:10:15.800
loads of them just like I sold this segue to our sponsor pulseway are you

00:10:14.079 --> 00:10:20.600
sick of feeling like a prisoner changeed to a desk managing it systems Unleash

00:10:18.440 --> 00:10:24.360
Your Inner it hero with pulseway remote monitoring and management software

00:10:22.440 --> 00:10:27.680
pulseway platform gives you the power to manage your it infrastructure from

00:10:26.040 --> 00:10:31.720
anywhere even from the comfort of your own couch and with realtime alerts and

00:10:29.800 --> 00:10:35.480
notifications you can be the first to know about potential issues before

00:10:33.480 --> 00:10:38.959
anyone else on your team it's accessible through whatever device is close to you

00:10:37.120 --> 00:10:43.279
thanks to their convenient apps allowing you to control your it systems like a

00:10:40.880 --> 00:10:47.040
boss even if you're lounging in your pjs so say goodbye to the boring routine of

00:10:44.959 --> 00:10:50.880
it management and hello to the fun of being an IT hero with pulseway advanced

00:10:48.959 --> 00:10:54.639
technology don't wait this is your chance to become a legend in the IT

00:10:52.399 --> 00:10:58.160
world just try pulseway for free today and experience the power of simplified

00:10:56.399 --> 00:11:02.200
it infrastructure management click the link below to get started if you guys

00:11:00.680 --> 00:11:06.880
enjoyed this video why don't you check out oh the paby of flash that was a good

00:11:05.320 --> 00:11:14.720
one well we're at the Gigabyte Boo come on uh the g- one yeah g oh actually no

00:11:09.760 --> 00:11:18.160
new wanic new new new wanic 3 wanic 4 I

00:11:14.720 --> 00:11:18.160
mean damn it