WEBVTT

00:00:00.000 --> 00:00:06.680
If you're watching this right now, you're probably using at least one arm

00:00:03.920 --> 00:00:10.800
CPU to do it. Or well, not an arm CPU because arm doesn't actually make CPUs.

00:00:09.520 --> 00:00:15.800
Or do they? That's the big news that they sponsored

00:00:13.200 --> 00:00:22.360
us down here to their arm everywhere event to announce. Behind me and in my

00:00:18.960 --> 00:00:25.800
hand is the arm AGI CPU built for

00:00:22.360 --> 00:00:28.640
performance, scale, and as always

00:00:25.800 --> 00:00:34.560
efficiency. Up to 136 arm Neoverse V3 cores with 2 megabytes

00:00:31.400 --> 00:00:37.200
of level 2 cache each built on TSMC's 3

00:00:34.560 --> 00:00:41.720
nanometer process node, and they can run at up to 3.6 GHz, which

00:00:40.200 --> 00:00:47.600
right out of the gate raises some questions, doesn't it? Just 3.6 GHz? Can

00:00:45.000 --> 00:00:52.680
it like dynamically boost a single core way higher or something?

00:00:49.680 --> 00:00:55.480
No. And according to arm, that's

00:00:52.680 --> 00:01:00.200
actually a key feature, not a bug. By eschewing SMT multithreading and the

00:00:58.240 --> 00:01:04.320
highly variable power consumption that's associated with constantly fluctuating

00:01:02.120 --> 00:01:09.680
clock speeds, not to mention designing a 12-channel DDR5 memory controller that

00:01:07.000 --> 00:01:15.800
can feed every individual core with a consistent 6 GB/s of bandwidth, arm is

00:01:13.240 --> 00:01:22.000
ensuring that every core in the CPU will perform its best at all times and keep

00:01:19.520 --> 00:01:26.920
power consumption more consistent, which will allow data centers to design to how

00:01:24.120 --> 00:01:31.800
much power their racks will consistently consume rather than having to build in a

00:01:29.240 --> 00:01:35.920
buffer for how much they might consume at peak.

00:01:33.240 --> 00:01:40.160
And that's huge considering that cooling and especially power are just about the

00:01:38.200 --> 00:01:45.040
hottest commodities in a world that is rapidly scaling data center

00:01:41.560 --> 00:01:47.840
infrastructure. Each AGI CPU has 96

00:01:45.040 --> 00:01:53.520
lanes of PCI Express Gen 6 with support for CXL 3.0 for deploying massive shared

00:01:50.520 --> 00:01:55.120
memory pools over PCIe, and arm showed

00:01:53.520 --> 00:02:00.280
off node designs with their hardware partners that deployed up to two of

00:01:57.240 --> 00:02:02.600
these CPUs on a single motherboard.

00:02:00.280 --> 00:02:05.120
Super cool, but not exactly world-changing

00:02:04.040 --> 00:02:08.759
yet. To see the vision that led Arm to spend

00:02:07.720 --> 00:02:16.800
the last few years bringing this to life, you got to zoom out and look beyond the individual node to the rack

00:02:12.800 --> 00:02:19.040
level. This rack contains 32-node

00:02:16.800 --> 00:02:23.680
1P servers. So, for those keeping count at home, that's 8,160

00:02:21.920 --> 00:02:28.360
CPU cores. Okay, still not that big of a deal. I

00:02:25.920 --> 00:02:32.080
mean, dense CPU racks are already a thing.

00:02:29.560 --> 00:02:36.959
Well, here comes the big reveal. This sick error message hoodie is now

00:02:34.360 --> 00:02:40.880
available from lttstore.com. JK, okay. I mean, it is, but that's not

00:02:39.000 --> 00:02:48.320
the big reveal. The big reveal is that everything that I just told you fits in

00:02:42.760 --> 00:02:52.560
a standard OCP 36 kW air-cooled rack.

00:02:48.320 --> 00:02:54.440
Each AGI CPU draws just 300 W,

00:02:52.560 --> 00:03:00.640
a significant reduction compared to flagship x86 CPUs. So, when you throw

00:02:57.840 --> 00:03:06.320
liquid cooling at them, the numbers get frankly kind of ridiculous. In an OCP

00:03:03.080 --> 00:03:10.160
200 kW rack, Arm figures they can pack

00:03:06.320 --> 00:03:13.920
42 eight-node 1P systems for a grand

00:03:10.160 --> 00:03:13.920
total of 45,696

00:03:14.160 --> 00:03:22.239
cores and over a petabyte of RAM, all while consuming only about half of that

00:03:19.920 --> 00:03:25.519
total available power budget. They are pegging the bottom line

00:03:23.720 --> 00:03:30.600
performance per watt in the neighborhood of double compared to x86. And this is

00:03:28.600 --> 00:03:35.640
largely thanks to carrying less legacy cruft, but also thanks to architectural

00:03:33.239 --> 00:03:39.040
choices like using fewer chiplets to keep memory latency down, along with

00:03:37.720 --> 00:03:43.280
Arm's traditional strength in instructions per clock, and taking just

00:03:41.480 --> 00:03:49.880
a no silicon wasted approach to their design. With the cost and scarcity of power,

00:03:47.920 --> 00:03:55.880
that's a number that is going to perk up a a of ears. But, why though? Everybody

00:03:53.240 --> 00:04:01.480
knows that CPUs aren't good at AI compared to GPUs or application specific

00:03:58.760 --> 00:04:06.160
neural processors. So, uh what's with the branding?

00:04:03.600 --> 00:04:10.720
Arm met that question head-on. While GPUs and neural accelerators get

00:04:08.320 --> 00:04:14.920
all the attention, CPUs are still chugging along in the background

00:04:12.320 --> 00:04:20.000
coordinating tasks with Arm estimating that a typical deployment today is going

00:04:17.160 --> 00:04:23.480
to have about 30 million cores per gigawatt.

00:04:21.480 --> 00:04:26.680
But, here's the thing. That's with humans handling most of the token

00:04:25.520 --> 00:04:31.120
requests. AI agents push requests much faster and

00:04:30.200 --> 00:04:35.680
um don't sleep. Meaning that your expensive

00:04:33.600 --> 00:04:39.840
AI accelerators can end up sitting around because the CPU coordinators

00:04:38.080 --> 00:04:43.320
can't keep up with all of those requests.

00:04:41.080 --> 00:04:48.720
So, Arm figures that that 30 million cores per gigawatt number could go up a

00:04:45.520 --> 00:04:51.560
lot in head node next to the accelerator

00:04:48.720 --> 00:04:57.360
rack as high as about four times as many. But, uh here's the thing. When

00:04:54.480 --> 00:05:02.080
these are doing all the actual AI work, nobody's going to want to spend more

00:04:58.760 --> 00:05:03.280
power budget on all of those CPUs.

00:05:02.080 --> 00:05:08.240
Well, that's where Arm comes in with their

00:05:05.320 --> 00:05:12.080
famously power efficient designs. Let's go to Nick from the lab to see

00:05:10.040 --> 00:05:15.440
this thing in action. Many of the demos were focused on the ease of porting

00:05:13.600 --> 00:05:19.360
software to Arm and the support they're building for developers, which makes a

00:05:17.120 --> 00:05:23.080
lot of sense, but isn't very visual. So, let's check out this one instead where

00:05:20.960 --> 00:05:28.640
they're encoding a 1080p video from H.264 to H.265 while running computer

00:05:26.160 --> 00:05:31.720
vision at the same time on the same CPU. Let's go take a look at the man behind

00:05:30.240 --> 00:05:38.640
the curtain. That's not a video recording. Arm actually had the stones

00:05:34.680 --> 00:05:41.400
to do it live bringing an actual server

00:05:38.640 --> 00:05:46.240
running the actual hardware here to the show floor. But, um

00:05:44.240 --> 00:05:50.920
awkward question. Doesn't all of this put Arm kind of in direct competition

00:05:48.880 --> 00:05:54.760
with their own customers? You know, the ones who license their IP and their

00:05:52.920 --> 00:06:00.080
compute subsystems, the guys who got them where they are today? Well, on

00:05:56.920 --> 00:06:02.800
paper, yes, um absolutely.

00:06:00.080 --> 00:06:06.680
But, from ARM's perspective, this is actually something that many of their

00:06:04.080 --> 00:06:12.200
customers were asking for. Expanding on that, ARM laid out how their road map

00:06:09.440 --> 00:06:16.000
and their policies account for how all three of their business models are going

00:06:14.080 --> 00:06:21.600
to go forward. And they're positioning this as a choice between IP licensing,

00:06:19.240 --> 00:06:26.080
compute subsystems licensing, and physical CPUs, or hey,

00:06:24.000 --> 00:06:29.720
why not some combination of all three? They'll gladly take your money any way

00:06:28.080 --> 00:06:33.160
you want to give it to them. Contact your local sales representative.

00:06:33.560 --> 00:06:39.880
Will is here. He can be reached afterwards. And it

00:06:37.400 --> 00:06:44.800
seems like that's the plan for the long haul. In a move that I don't think I've

00:06:42.160 --> 00:06:50.400
ever seen before, ARM stood up on stage and said the quiet part out loud, "This

00:06:47.560 --> 00:06:53.880
is just a safe first attempt. The best is yet to come with our second CPU due

00:06:52.560 --> 00:06:57.800
next year." Like, obviously, given the timelines of

00:06:56.280 --> 00:07:01.360
silicon development, but you almost never hear that from a company who

00:06:59.800 --> 00:07:06.080
probably wants you to buy the hardware they have today from partners like, you

00:07:03.480 --> 00:07:09.080
know, for example, Supermicro. Pretty wild.

00:07:07.640 --> 00:07:12.680
If you guys enjoyed this video, you might enjoy the one that we did at CES,

00:07:11.400 --> 00:07:19.120
also in partnership with ARM, highlighting some of the unexpected

00:07:14.680 --> 00:07:19.120
places that you can find ARM technology.
