1
00:00:00,000 --> 00:00:06,680
If you're watching this right now, you're probably using at least one arm

2
00:00:03,920 --> 00:00:10,800
CPU to do it. Or well, not an arm CPU because arm doesn't actually make CPUs.

3
00:00:09,520 --> 00:00:15,800
Or do they? That's the big news that they sponsored

4
00:00:13,200 --> 00:00:22,360
us down here to their arm everywhere event to announce. Behind me and in my

5
00:00:18,960 --> 00:00:25,800
hand is the arm AGI CPU built for

6
00:00:22,360 --> 00:00:28,640
performance, scale, and as always

7
00:00:25,800 --> 00:00:34,560
efficiency. Up to 136 arm Neoverse V3 cores with 2 megabytes

8
00:00:31,400 --> 00:00:37,200
of level 2 cache each built on TSMC's 3

9
00:00:34,560 --> 00:00:41,720
nanometer process node, and they can run at up to 3.6 GHz, which

10
00:00:40,200 --> 00:00:47,600
right out of the gate raises some questions, doesn't it? Just 3.6 GHz? Can

11
00:00:45,000 --> 00:00:52,680
it like dynamically boost a single core way higher or something?

12
00:00:49,680 --> 00:00:55,480
No. And according to arm, that's

13
00:00:52,680 --> 00:01:00,200
actually a key feature, not a bug. By eschewing SMT multithreading and the

14
00:00:58,240 --> 00:01:04,320
highly variable power consumption that's associated with constantly fluctuating

15
00:01:02,120 --> 00:01:09,680
clock speeds, not to mention designing a 12-channel DDR5 memory controller that

16
00:01:07,000 --> 00:01:15,800
can feed every individual core with a consistent 6 GB/s of bandwidth, arm is

17
00:01:13,240 --> 00:01:22,000
ensuring that every core in the CPU will perform its best at all times and keep

18
00:01:19,520 --> 00:01:26,920
power consumption more consistent, which will allow data centers to design to how

19
00:01:24,120 --> 00:01:31,800
much power their racks will consistently consume rather than having to build in a

20
00:01:29,240 --> 00:01:35,920
buffer for how much they might consume at peak.

21
00:01:33,240 --> 00:01:40,160
And that's huge considering that cooling and especially power are just about the

22
00:01:38,200 --> 00:01:45,040
hottest commodities in a world that is rapidly scaling data center

23
00:01:41,560 --> 00:01:47,840
infrastructure. Each AGI CPU has 96

24
00:01:45,040 --> 00:01:53,520
lanes of PCI Express Gen 6 with support for CXL 3.0 for deploying massive shared

25
00:01:50,520 --> 00:01:55,120
memory pools over PCIe, and arm showed

26
00:01:53,520 --> 00:02:00,280
off node designs with their hardware partners that deployed up to two of

27
00:01:57,240 --> 00:02:02,600
these CPUs on a single motherboard.

28
00:02:00,280 --> 00:02:05,120
Super cool, but not exactly world-changing

29
00:02:04,040 --> 00:02:08,759
yet. To see the vision that led Arm to spend

30
00:02:07,720 --> 00:02:16,800
the last few years bringing this to life, you got to zoom out and look beyond the individual node to the rack

31
00:02:12,800 --> 00:02:19,040
level. This rack contains 32-node

32
00:02:16,800 --> 00:02:23,680
1P servers. So, for those keeping count at home, that's 8,160

33
00:02:21,920 --> 00:02:28,360
CPU cores. Okay, still not that big of a deal. I

34
00:02:25,920 --> 00:02:32,080
mean, dense CPU racks are already a thing.

35
00:02:29,560 --> 00:02:36,959
Well, here comes the big reveal. This sick error message hoodie is now

36
00:02:34,360 --> 00:02:40,880
available from lttstore.com. JK, okay. I mean, it is, but that's not

37
00:02:39,000 --> 00:02:48,320
the big reveal. The big reveal is that everything that I just told you fits in

38
00:02:42,760 --> 00:02:52,560
a standard OCP 36 kW air-cooled rack.

39
00:02:48,320 --> 00:02:54,440
Each AGI CPU draws just 300 W,

40
00:02:52,560 --> 00:03:00,640
a significant reduction compared to flagship x86 CPUs. So, when you throw

41
00:02:57,840 --> 00:03:06,320
liquid cooling at them, the numbers get frankly kind of ridiculous. In an OCP

42
00:03:03,080 --> 00:03:10,160
200 kW rack, Arm figures they can pack

43
00:03:06,320 --> 00:03:13,920
42 eight-node 1P systems for a grand

44
00:03:10,160 --> 00:03:13,920
total of 45,696

45
00:03:14,160 --> 00:03:22,239
cores and over a petabyte of RAM, all while consuming only about half of that

46
00:03:19,920 --> 00:03:25,519
total available power budget. They are pegging the bottom line

47
00:03:23,720 --> 00:03:30,600
performance per watt in the neighborhood of double compared to x86. And this is

48
00:03:28,600 --> 00:03:35,640
largely thanks to carrying less legacy cruft, but also thanks to architectural

49
00:03:33,239 --> 00:03:39,040
choices like using fewer chiplets to keep memory latency down, along with

50
00:03:37,720 --> 00:03:43,280
Arm's traditional strength in instructions per clock, and taking just

51
00:03:41,480 --> 00:03:49,880
a no silicon wasted approach to their design. With the cost and scarcity of power,

52
00:03:47,920 --> 00:03:55,880
that's a number that is going to perk up a a of ears. But, why though? Everybody

53
00:03:53,240 --> 00:04:01,480
knows that CPUs aren't good at AI compared to GPUs or application specific

54
00:03:58,760 --> 00:04:06,160
neural processors. So, uh what's with the branding?

55
00:04:03,600 --> 00:04:10,720
Arm met that question head-on. While GPUs and neural accelerators get

56
00:04:08,320 --> 00:04:14,920
all the attention, CPUs are still chugging along in the background

57
00:04:12,320 --> 00:04:20,000
coordinating tasks with Arm estimating that a typical deployment today is going

58
00:04:17,160 --> 00:04:23,480
to have about 30 million cores per gigawatt.

59
00:04:21,480 --> 00:04:26,680
But, here's the thing. That's with humans handling most of the token

60
00:04:25,520 --> 00:04:31,120
requests. AI agents push requests much faster and

61
00:04:30,200 --> 00:04:35,680
um don't sleep. Meaning that your expensive

62
00:04:33,600 --> 00:04:39,840
AI accelerators can end up sitting around because the CPU coordinators

63
00:04:38,080 --> 00:04:43,320
can't keep up with all of those requests.

64
00:04:41,080 --> 00:04:48,720
So, Arm figures that that 30 million cores per gigawatt number could go up a

65
00:04:45,520 --> 00:04:51,560
lot in head node next to the accelerator

66
00:04:48,720 --> 00:04:57,360
rack as high as about four times as many. But, uh here's the thing. When

67
00:04:54,480 --> 00:05:02,080
these are doing all the actual AI work, nobody's going to want to spend more

68
00:04:58,760 --> 00:05:03,280
power budget on all of those CPUs.

69
00:05:02,080 --> 00:05:08,240
Well, that's where Arm comes in with their

70
00:05:05,320 --> 00:05:12,080
famously power efficient designs. Let's go to Nick from the lab to see

71
00:05:10,040 --> 00:05:15,440
this thing in action. Many of the demos were focused on the ease of porting

72
00:05:13,600 --> 00:05:19,360
software to Arm and the support they're building for developers, which makes a

73
00:05:17,120 --> 00:05:23,080
lot of sense, but isn't very visual. So, let's check out this one instead where

74
00:05:20,960 --> 00:05:28,640
they're encoding a 1080p video from H.264 to H.265 while running computer

75
00:05:26,160 --> 00:05:31,720
vision at the same time on the same CPU. Let's go take a look at the man behind

76
00:05:30,240 --> 00:05:38,640
the curtain. That's not a video recording. Arm actually had the stones

77
00:05:34,680 --> 00:05:41,400
to do it live bringing an actual server

78
00:05:38,640 --> 00:05:46,240
running the actual hardware here to the show floor. But, um

79
00:05:44,240 --> 00:05:50,920
awkward question. Doesn't all of this put Arm kind of in direct competition

80
00:05:48,880 --> 00:05:54,760
with their own customers? You know, the ones who license their IP and their

81
00:05:52,920 --> 00:06:00,080
compute subsystems, the guys who got them where they are today? Well, on

82
00:05:56,920 --> 00:06:02,800
paper, yes, um absolutely.

83
00:06:00,080 --> 00:06:06,680
But, from ARM's perspective, this is actually something that many of their

84
00:06:04,080 --> 00:06:12,200
customers were asking for. Expanding on that, ARM laid out how their road map

85
00:06:09,440 --> 00:06:16,000
and their policies account for how all three of their business models are going

86
00:06:14,080 --> 00:06:21,600
to go forward. And they're positioning this as a choice between IP licensing,

87
00:06:19,240 --> 00:06:26,080
compute subsystems licensing, and physical CPUs, or hey,

88
00:06:24,000 --> 00:06:29,720
why not some combination of all three? They'll gladly take your money any way

89
00:06:28,080 --> 00:06:33,160
you want to give it to them. Contact your local sales representative.

90
00:06:33,560 --> 00:06:39,880
Will is here. He can be reached afterwards. And it

91
00:06:37,400 --> 00:06:44,800
seems like that's the plan for the long haul. In a move that I don't think I've

92
00:06:42,160 --> 00:06:50,400
ever seen before, ARM stood up on stage and said the quiet part out loud, "This

93
00:06:47,560 --> 00:06:53,880
is just a safe first attempt. The best is yet to come with our second CPU due

94
00:06:52,560 --> 00:06:57,800
next year." Like, obviously, given the timelines of

95
00:06:56,280 --> 00:07:01,360
silicon development, but you almost never hear that from a company who

96
00:06:59,800 --> 00:07:06,080
probably wants you to buy the hardware they have today from partners like, you

97
00:07:03,480 --> 00:07:09,080
know, for example, Supermicro. Pretty wild.

98
00:07:07,640 --> 00:07:12,680
If you guys enjoyed this video, you might enjoy the one that we did at CES,

99
00:07:11,400 --> 00:07:19,120
also in partnership with ARM, highlighting some of the unexpected

100
00:07:14,680 --> 00:07:19,120
places that you can find ARM technology.
