1
00:00:00,000 --> 00:00:04,400
Although transistors still keep on shrinking, it's getting more and more difficult to pack

2
00:00:04,400 --> 00:00:09,200
as many of them on a chip as we'd like. Partial solutions to this, such as using chiplets

3
00:00:09,200 --> 00:00:13,840
to reduce the rate of manufacturing defects and stacking transistors on top of each other

4
00:00:13,840 --> 00:00:17,360
have been in vogue for a while now, but it might not be too surprising

5
00:00:17,360 --> 00:00:22,400
that some manufacturers have decided to simply make the chip themselves bigger,

6
00:00:22,400 --> 00:00:27,120
when in doubt, supersize. Now, I'm not saying that your next computer

7
00:00:27,120 --> 00:00:30,160
might have a CPU that's so big it'll take up half the motherboard,

8
00:00:30,160 --> 00:00:33,360
but when you get away from personal computers and start looking at chips

9
00:00:33,360 --> 00:00:39,040
that we might see in data centers in the near future, you start seeing some pretty eye-watering stuff.

10
00:00:39,040 --> 00:00:43,040
We're talking about designs like the Wafer Scale Engine 2 from Cerebrus,

11
00:00:43,040 --> 00:00:47,280
currently the largest chip in the world. Built on a seven nanometer process,

12
00:00:47,280 --> 00:00:51,840
it contains 850,000 cores

13
00:00:51,840 --> 00:00:56,800
and is a whopping 21.5 centimeters or 8.5 inches long.

14
00:00:56,800 --> 00:01:00,640
That's more total area than 25 Ryzen desktop CPUs.

15
00:01:00,640 --> 00:01:04,880
Perhaps unsurprisingly, a chip this big and with this many transistors,

16
00:01:04,880 --> 00:01:09,280
2.6 trillion, to be exact, requires a lot of power.

17
00:01:09,280 --> 00:01:12,800
The Wafer Scale Engine 2 sucks down 15 kilowatts,

18
00:01:12,800 --> 00:01:16,800
so if you were somehow able to drop this into your PC,

19
00:01:16,800 --> 00:01:20,000
you'd need 15 1000 watt power supplies

20
00:01:20,000 --> 00:01:23,440
just to keep it fit. And that's not even counting the rest of the system.

21
00:01:23,440 --> 00:01:28,480
But despite this, the new design should actually result in power savings.

22
00:01:28,480 --> 00:01:32,800
You see, data centers and supercomputers that do artificial intelligence processing

23
00:01:32,800 --> 00:01:38,400
often have to use lots of separate chips, such as GPUs, spread across a large facility.

24
00:01:38,400 --> 00:01:43,760
Having the same amount of computing power on just one physical chip is far more power efficient,

25
00:01:43,760 --> 00:01:48,800
even if the power consumption rating of that chip is a lot higher than a typical GPU.

26
00:01:48,800 --> 00:01:52,880
But there are other advantages to this approach, besides just saving energy.

27
00:01:52,960 --> 00:01:57,760
You might be wondering why we aren't simply just sticking a bunch of chiplets onto one package instead

28
00:01:57,760 --> 00:02:02,000
to make something like a really big version of an AMD Epic processor.

29
00:02:02,000 --> 00:02:06,880
So as versatile as chiplets have been, they still suffer from having more latency

30
00:02:06,880 --> 00:02:11,200
than one big monolithic processor. The little interconnects that move data

31
00:02:11,200 --> 00:02:16,160
between chiplets as quick as they may be, and they are fast, are still slower

32
00:02:16,160 --> 00:02:21,600
than if you physically put computing units directly adjacent to each other to form one big chip.

33
00:02:21,600 --> 00:02:26,560
Ultimately, this means that huge monolithic chips can process more data than a system

34
00:02:26,560 --> 00:02:30,400
with the same number of transistors spread out among multiple chips.

35
00:02:30,400 --> 00:02:34,720
And when you consider just how much data has to be processed for AI applications

36
00:02:34,720 --> 00:02:38,400
and scientific research, it makes a difference.

37
00:02:38,400 --> 00:02:41,760
Wave for scale technology has already drawn interest from diverse industries,

38
00:02:41,760 --> 00:02:46,320
including national intelligence and healthcare, but though it has some obvious advantages,

39
00:02:46,320 --> 00:02:51,200
that doesn't necessarily mean that it's the silver bullet to large-scale compute challenges.

40
00:02:51,200 --> 00:02:56,720
For instance, one big issue is the fact that these processors are designed to handle lots of data,

41
00:02:56,720 --> 00:03:02,160
so they also need access to a lot of memory, and designs with chiplets and larger amounts of memory

42
00:03:02,160 --> 00:03:05,360
built onto the same package may end up being more popular.

43
00:03:05,360 --> 00:03:09,600
This is similar to what Tesla has done with their new D1 chip, which was developed in part

44
00:03:09,600 --> 00:03:12,960
to help propel Tesla's self-driving AI technology.

45
00:03:12,960 --> 00:03:16,800
The D1 chip itself is much smaller than the Wave for Scale engine,

46
00:03:16,800 --> 00:03:23,680
but Tesla has included over 11 gigabytes of high-speed SRAM in an arrangement of 25 D1 chips

47
00:03:23,680 --> 00:03:28,400
connected together to make a training tile that's bigger than your head.

48
00:03:28,400 --> 00:03:33,760
And of course, making smaller chips reduces the amount of silicon you'll waste due to manufacturing errors,

49
00:03:33,760 --> 00:03:37,600
as we mentioned earlier. But regardless of whether a particular company

50
00:03:37,600 --> 00:03:40,640
is using Wave for Scale or an arrangement more like Tesla's,

51
00:03:40,640 --> 00:03:45,200
putting a large amount of silicon on one plane may end up becoming an industry trend.

52
00:03:45,200 --> 00:03:50,720
Because if America has taught us anything, it's that there's a deep human need to supersize.

53
00:03:50,720 --> 00:03:54,320
And if you feel the need, like the video or dislike the video,

54
00:03:54,320 --> 00:03:57,440
check out our other videos and comment below with video suggestions.

55
00:03:57,440 --> 00:04:00,640
We make videos here. Get your videos here at TechQuickie.

56
00:04:00,640 --> 00:04:02,320
Don't forget to subscribe and follow.

57
00:04:04,320 --> 00:04:05,040
See you later.
