WEBVTT

00:00:00.000 --> 00:00:05.200
If you're watching this right now, you're probably using at least one ARM CPU to do it.

00:00:05.200 --> 00:00:10.000
Or, well, not an ARM CPU, because ARM doesn't actually make CPUs, or do they?

00:00:10.800 --> 00:00:16.960
That's the big news that they sponsored us down here to their ARM Everywhere event to announce.

00:00:16.960 --> 00:00:25.760
Behind me, and in my hand, is the ARM AGI CPU, built for performance, scale, and, as always,

00:00:26.720 --> 00:00:33.120
up to 136 ARM Neoverse V3 cores with two megabytes of level 2 cache each,

00:00:33.120 --> 00:00:41.120
built on TSMC's 3nm process node, and it can run at up to 3.6GHz, which, right out of the gate.

00:00:41.120 --> 00:00:47.760
This is some questions, doesn't it? Just 3.6GHz? Can it, like, dynamically boost a single core

00:00:47.760 --> 00:00:55.120
way higher or something? No. And, according to ARM, that's actually a key feature, not a bug.

00:00:55.120 --> 00:01:00.720
By eschewing SMT multithreading and the highly variable power consumption that's associated

00:01:00.720 --> 00:01:06.720
with constantly fluctuating clock speeds, not to mention designing a 12-channel DDR5 memory controller

00:01:06.720 --> 00:01:12.800
that can feed every individual core with a consistent 6GB per second of bandwidth,

00:01:12.800 --> 00:01:20.480
ARM is ensuring that every core in this CPU will perform its best at all times and keep power consumption

00:01:20.560 --> 00:01:26.160
more consistent, which will allow data centers to design to how much power their racks will

00:01:26.160 --> 00:01:32.240
consistently consume rather than having to build in a buffer for how much they might consume at peak.

00:01:33.120 --> 00:01:38.640
And that's huge, considering that cooling and especially power are just about the hottest

00:01:38.640 --> 00:01:44.320
commodities in a world that is rapidly scaling data center infrastructure. Each AGI CPU has

00:01:44.320 --> 00:01:51.040
96 lanes of PCI Express Gen 6 with support for CXL 3.0 for deploying massive shared memory pools

00:01:51.040 --> 00:01:57.040
over PCIe and ARM showed off node designs with their hardware partners that deployed up to two

00:01:57.040 --> 00:02:05.680
of these CPUs on a single motherboard. Super cool, but not exactly world-changing yet. To see the

00:02:05.680 --> 00:02:10.160
vision that led ARM to spend the last few years bringing this to life, you gotta zoom out and look

00:02:10.160 --> 00:02:18.480
beyond the individual node to the rack level. This rack contains 32 node 1P servers, so for those

00:02:18.480 --> 00:02:26.400
keeping count at home, that's 8,160 CPU cores. Okay, still not that big of a deal. I mean,

00:02:26.400 --> 00:02:33.360
dense CPU racks are already a thing. Well, here comes the big reveal. This sick error message

00:02:33.360 --> 00:02:39.760
hoodie is now available from LTTstore.com. JK, okay, I mean, it is, but that's not the big reveal.

00:02:39.760 --> 00:02:46.480
The big reveal is that everything that I just told you fits in a standard OCP 36 kilowatt air

00:02:46.480 --> 00:02:55.040
cooled rack. Each AGI CPU draws just 300 watts, a significant reduction compared to flagship

00:02:55.040 --> 00:03:02.160
x86 CPUs. So when you throw liquid cooling at them, the numbers get frankly kind of ridiculous.

00:03:02.160 --> 00:03:10.640
In an OCP 200 kilowatt rack, ARM figures, they can pack 42 8 node 1P systems for a grand total

00:03:10.640 --> 00:03:20.320
of 45,696 cores and over a petabyte of RAM, all while consuming only about half of that total

00:03:20.320 --> 00:03:25.840
available power budget. They are pegging the bottom line performance per watt in the neighborhood of

00:03:25.840 --> 00:03:32.160
double compared to x86. And this is largely thanks to carrying less legacy craft, but also

00:03:32.160 --> 00:03:37.120
thanks to architectural choices like using fewer chiplets to keep memory latency down,

00:03:37.120 --> 00:03:41.760
along with ARM's traditional strength in instructions per clock, and taking just a no

00:03:41.760 --> 00:03:48.560
silicon wasted approach to their design. With the cost and scarcity of power, that's a number that

00:03:48.560 --> 00:03:55.760
is going to perk up a lot of years. But why though? Everybody knows that CPUs aren't good at AI,

00:03:55.840 --> 00:04:02.720
compared to GPUs or application specific neural processors. So what's with the branding?

00:04:03.520 --> 00:04:09.280
ARM met that question head on. While GPUs and neural accelerators get all the attention,

00:04:09.280 --> 00:04:15.120
CPUs are still chugging along in the background, coordinating tasks, with ARM estimating that

00:04:15.120 --> 00:04:22.880
a typical deployment today is going to have about 30 million cores per gigawatt. But here's the thing,

00:04:22.880 --> 00:04:29.920
that's with humans handling most of the token requests. AI agents push requests much faster and

00:04:31.120 --> 00:04:36.480
don't sleep, meaning that your expensive AI accelerators can end up sitting around because

00:04:36.480 --> 00:04:43.280
the CPU coordinators can't keep up with all of those requests. So ARM figures that that 30 million

00:04:43.280 --> 00:04:49.360
cores per gigawatt number could go up a lot in the head node next to the accelerator rack,

00:04:49.360 --> 00:04:56.720
as high as about four times as many. But here's the thing, when these are doing all the actual AI

00:04:56.720 --> 00:05:03.760
work, nobody's going to want to spend more power budget on all of those CPUs. Well, that's where

00:05:03.760 --> 00:05:09.920
ARM comes in with their famously power efficient designs. Let's go to Nick from the lab to see

00:05:09.920 --> 00:05:14.560
this thing in action. Many of the demos were focused on the ease-supporting software to ARM

00:05:14.560 --> 00:05:19.200
and the support they're building for developers, which makes a lot of sense, but isn't very visual,

00:05:19.200 --> 00:05:25.280
so let's check out this one instead, where they're encoding a 1080p video from H.264 to H.265

00:05:25.280 --> 00:05:29.760
while running computer vision at the same time on the same CPU. Let's go take a look at the man

00:05:29.760 --> 00:05:36.640
behind the curtain. That's not a video recording. ARM actually had the stones to do it live,

00:05:36.640 --> 00:05:44.640
bringing an actual server running the actual hardware here to the show floor. But awkward

00:05:45.440 --> 00:05:50.720
Doesn't all of this put ARM kind of in direct competition with their own customers, you know,

00:05:50.720 --> 00:05:55.200
the ones who license their IP and their compute subsystems, the guys who got them where they are

00:05:55.200 --> 00:06:03.360
today? Well, on paper, yes, absolutely. But from ARM's perspective, this is actually something

00:06:03.360 --> 00:06:09.360
that many of their customers were asking for. Expanding on that, ARM laid out how their roadmap

00:06:09.360 --> 00:06:15.040
and their policies account for how all three of their business models are going to go forward,

00:06:15.040 --> 00:06:21.120
and they're positioning this as a choice between IP licensing, compute subsystems licensing,

00:06:21.120 --> 00:06:27.600
and physical CPUs, or hey, why not some combination of all three? They'll gladly take your money

00:06:27.600 --> 00:06:31.040
any way you want to give it to them. Contact your local sales representative.

00:06:33.520 --> 00:06:39.280
Will is here. He can be reached afterwards. And it seems like that's the plan

00:06:39.280 --> 00:06:44.880
for the long haul. In a move that I don't think I've ever seen before, ARM stood up on stage and

00:06:44.880 --> 00:06:51.440
said the quiet part out loud. This is just a safe first attempt. The best is yet to come with our

00:06:51.440 --> 00:06:57.200
second CPU due next year. Like, obviously, given the timelines of silicon development,

00:06:57.200 --> 00:07:01.760
but you almost never hear that from a company who probably wants you to buy the hardware they have

00:07:01.760 --> 00:07:08.240
today from partners like, you know, for example, Supermicro. Pretty wild. If you guys enjoyed

00:07:08.240 --> 00:07:13.120
this video, you might enjoy the one that we did at CES, also in partnership with ARM, highlighting

00:07:13.120 --> 00:07:17.360
some of the unexpected places that you can find ARM technology.
