WEBVTT

00:00:00.000 --> 00:00:04.400
Although transistors still keep on shrinking, it's getting more and more difficult to pack

00:00:04.400 --> 00:00:09.200
as many of them on a chip as we'd like. Partial solutions to this, such as using chiplets

00:00:09.200 --> 00:00:13.840
to reduce the rate of manufacturing defects and stacking transistors on top of each other

00:00:13.840 --> 00:00:17.360
have been in vogue for a while now, but it might not be too surprising

00:00:17.360 --> 00:00:22.400
that some manufacturers have decided to simply make the chip themselves bigger,

00:00:22.400 --> 00:00:27.120
when in doubt, supersize. Now, I'm not saying that your next computer

00:00:27.120 --> 00:00:30.160
might have a CPU that's so big it'll take up half the motherboard,

00:00:30.160 --> 00:00:33.360
but when you get away from personal computers and start looking at chips

00:00:33.360 --> 00:00:39.040
that we might see in data centers in the near future, you start seeing some pretty eye-watering stuff.

00:00:39.040 --> 00:00:43.040
We're talking about designs like the Wafer Scale Engine 2 from Cerebrus,

00:00:43.040 --> 00:00:47.280
currently the largest chip in the world. Built on a seven nanometer process,

00:00:47.280 --> 00:00:51.840
it contains 850,000 cores

00:00:51.840 --> 00:00:56.800
and is a whopping 21.5 centimeters or 8.5 inches long.

00:00:56.800 --> 00:01:00.640
That's more total area than 25 Ryzen desktop CPUs.

00:01:00.640 --> 00:01:04.880
Perhaps unsurprisingly, a chip this big and with this many transistors,

00:01:04.880 --> 00:01:09.280
2.6 trillion, to be exact, requires a lot of power.

00:01:09.280 --> 00:01:12.800
The Wafer Scale Engine 2 sucks down 15 kilowatts,

00:01:12.800 --> 00:01:16.800
so if you were somehow able to drop this into your PC,

00:01:16.800 --> 00:01:20.000
you'd need 15 1000 watt power supplies

00:01:20.000 --> 00:01:23.440
just to keep it fit. And that's not even counting the rest of the system.

00:01:23.440 --> 00:01:28.480
But despite this, the new design should actually result in power savings.

00:01:28.480 --> 00:01:32.800
You see, data centers and supercomputers that do artificial intelligence processing

00:01:32.800 --> 00:01:38.400
often have to use lots of separate chips, such as GPUs, spread across a large facility.

00:01:38.400 --> 00:01:43.760
Having the same amount of computing power on just one physical chip is far more power efficient,

00:01:43.760 --> 00:01:48.800
even if the power consumption rating of that chip is a lot higher than a typical GPU.

00:01:48.800 --> 00:01:52.880
But there are other advantages to this approach, besides just saving energy.

00:01:52.960 --> 00:01:57.760
You might be wondering why we aren't simply just sticking a bunch of chiplets onto one package instead

00:01:57.760 --> 00:02:02.000
to make something like a really big version of an AMD Epic processor.

00:02:02.000 --> 00:02:06.880
So as versatile as chiplets have been, they still suffer from having more latency

00:02:06.880 --> 00:02:11.200
than one big monolithic processor. The little interconnects that move data

00:02:11.200 --> 00:02:16.160
between chiplets as quick as they may be, and they are fast, are still slower

00:02:16.160 --> 00:02:21.600
than if you physically put computing units directly adjacent to each other to form one big chip.

00:02:21.600 --> 00:02:26.560
Ultimately, this means that huge monolithic chips can process more data than a system

00:02:26.560 --> 00:02:30.400
with the same number of transistors spread out among multiple chips.

00:02:30.400 --> 00:02:34.720
And when you consider just how much data has to be processed for AI applications

00:02:34.720 --> 00:02:38.400
and scientific research, it makes a difference.

00:02:38.400 --> 00:02:41.760
Wave for scale technology has already drawn interest from diverse industries,

00:02:41.760 --> 00:02:46.320
including national intelligence and healthcare, but though it has some obvious advantages,

00:02:46.320 --> 00:02:51.200
that doesn't necessarily mean that it's the silver bullet to large-scale compute challenges.

00:02:51.200 --> 00:02:56.720
For instance, one big issue is the fact that these processors are designed to handle lots of data,

00:02:56.720 --> 00:03:02.160
so they also need access to a lot of memory, and designs with chiplets and larger amounts of memory

00:03:02.160 --> 00:03:05.360
built onto the same package may end up being more popular.

00:03:05.360 --> 00:03:09.600
This is similar to what Tesla has done with their new D1 chip, which was developed in part

00:03:09.600 --> 00:03:12.960
to help propel Tesla's self-driving AI technology.

00:03:12.960 --> 00:03:16.800
The D1 chip itself is much smaller than the Wave for Scale engine,

00:03:16.800 --> 00:03:23.680
but Tesla has included over 11 gigabytes of high-speed SRAM in an arrangement of 25 D1 chips

00:03:23.680 --> 00:03:28.400
connected together to make a training tile that's bigger than your head.

00:03:28.400 --> 00:03:33.760
And of course, making smaller chips reduces the amount of silicon you'll waste due to manufacturing errors,

00:03:33.760 --> 00:03:37.600
as we mentioned earlier. But regardless of whether a particular company

00:03:37.600 --> 00:03:40.640
is using Wave for Scale or an arrangement more like Tesla's,

00:03:40.640 --> 00:03:45.200
putting a large amount of silicon on one plane may end up becoming an industry trend.

00:03:45.200 --> 00:03:50.720
Because if America has taught us anything, it's that there's a deep human need to supersize.

00:03:50.720 --> 00:03:54.320
And if you feel the need, like the video or dislike the video,

00:03:54.320 --> 00:03:57.440
check out our other videos and comment below with video suggestions.

00:03:57.440 --> 00:04:00.640
We make videos here. Get your videos here at TechQuickie.

00:04:00.640 --> 00:04:02.320
Don't forget to subscribe and follow.

00:04:04.320 --> 00:04:05.040
See you later.
