WEBVTT

00:00:00.000 --> 00:00:03.760
AI chips have suddenly become a big selling point for phones,

00:00:03.760 --> 00:00:08.800
but that might seem a little surprising that your little smartphone, which already

00:00:08.800 --> 00:00:15.840
has serious limitations on power consumption and heat generation, can run something as seemingly complicated as AI.

00:00:15.840 --> 00:00:22.240
So how exactly do they pull this off? Well, these neural processing units, or NPUs,

00:00:22.240 --> 00:00:28.000
are quite a bit different than your phone's main CPU cores, features like Apple's neural engine

00:00:28.000 --> 00:00:31.080
or the machine learning engine on a Google Tensor chip

00:00:31.080 --> 00:00:37.360
are highly optimized for AI tasks, but probably suck at pretty much anything else.

00:00:37.360 --> 00:00:42.440
It's kind of like how a GPU works. Although they are much better for rendering graphics

00:00:42.440 --> 00:00:46.360
than a more general-purpose CPU, you're not going to run your operating system off

00:00:46.360 --> 00:00:50.280
of your graphics card. They are embarrassingly parallel.

00:00:50.280 --> 00:00:56.380
A relatively small amount of dye area then that is dedicated to AI can effectively run machine learning

00:00:56.380 --> 00:00:59.780
based tasks without sucking down too much power.

00:00:59.780 --> 00:01:04.140
But that doesn't answer the question of why there's such a push to put these chips in our phones

00:01:04.140 --> 00:01:08.100
in the first place. I mean, we hear so much about cloud AI,

00:01:08.100 --> 00:01:14.420
where neural networks run on powerful servers. So can't we just offload tasks like image optimization

00:01:14.420 --> 00:01:20.620
and voice recognition to the cloud? Well, the answer lies in how large and complex the AI models

00:01:20.620 --> 00:01:25.140
are that your device needs to use. Models for common smartphone AI features,

00:01:25.180 --> 00:01:30.140
such as voice recognition, facial recognition, and some kinds of image correction

00:01:30.140 --> 00:01:33.900
are often relatively small, meaning that they can be run on device

00:01:33.900 --> 00:01:38.420
on a limited amount of silicon. And if these functions can be run locally

00:01:38.420 --> 00:01:42.180
instead of in the cloud, it's generally better to do so.

00:01:42.180 --> 00:01:45.980
For example, if you use an Android phones speech recognition

00:01:45.980 --> 00:01:52.020
button, you will wait around for your phone to send your speech over to a server over the internet,

00:01:52.020 --> 00:01:57.260
wait for that server to figure out what you're trying to say, and then wait to get the results back to your phone.

00:01:57.260 --> 00:02:02.100
If you could get results right now, that would be a big selling point for a modern phone.

00:02:02.100 --> 00:02:05.180
So even though cloud hardware might be more powerful,

00:02:05.180 --> 00:02:08.180
the latency advantage of having a chip on your device

00:02:08.180 --> 00:02:12.220
makes this trade-off worth it. Not to mention that it helps protect your privacy

00:02:12.220 --> 00:02:15.780
by keeping as much of your data on your phone as possible.

00:02:15.780 --> 00:02:19.500
But when may it not make sense to rely on a phone's NPU?

00:02:19.500 --> 00:02:23.580
More advanced forms of generative AI aren't quite at the point

00:02:23.620 --> 00:02:29.260
where you can run them on a phone efficiently. And by generative AI, I mean artificial intelligence

00:02:29.260 --> 00:02:34.780
that can create new media. Think about the stories that get generated by chat GPT

00:02:34.780 --> 00:02:37.980
or the AI art from services like mid-journey.

00:02:37.980 --> 00:02:43.340
Now, you probably don't expect to run an entire advanced image generation model on a phone,

00:02:43.340 --> 00:02:48.260
at least with NPUs the size they are now. But what about commonly touted features

00:02:48.260 --> 00:02:54.380
like Google's Magic Editor on its Pixel lineup? Well, Magic Editor appears to need an internet connection

00:02:54.380 --> 00:02:59.980
since the feature uses enough generative AI to the point where the phone has to rely on cloud servers

00:02:59.980 --> 00:03:03.420
in order to give you the image you want in a reasonable amount of time.

00:03:03.420 --> 00:03:09.580
However, less demanding features, such as live translate, can run on device.

00:03:09.580 --> 00:03:14.860
Since the idea of AI-specific hardware on consumer devices is still relatively new,

00:03:14.860 --> 00:03:19.300
tech companies are still trying to figure out exactly where the sweet spot is

00:03:19.300 --> 00:03:23.340
in terms of which tasks can and should be done on device

00:03:23.340 --> 00:03:29.300
versus which ones should be offloaded to the cloud. In fact, lots of AI as a service type products

00:03:29.300 --> 00:03:32.620
don't yet have a clear pathway to monetization.

00:03:32.620 --> 00:03:36.460
Instead, it's more common for tech firms to roll the features out now,

00:03:36.460 --> 00:03:40.580
figure out how they work, and then jam them into their business model

00:03:40.580 --> 00:03:46.020
at some point down the line. This is actually part of the reason that the dye areas of NPUs and phones

00:03:46.020 --> 00:03:50.220
are still relatively small. Hardware manufacturers would rather have

00:03:50.220 --> 00:03:53.620
enough inside the phone to enable AI features,

00:03:53.620 --> 00:03:57.140
but then figure out exactly what the use cases are

00:03:57.140 --> 00:04:00.700
before they dedicate more hardware to AI.

00:04:00.700 --> 00:04:03.820
You're also seeing this on the desktop and laptop side of things,

00:04:03.820 --> 00:04:09.060
with both AMD and Intel coming out with consumer processors that include NPUs.

00:04:09.060 --> 00:04:12.260
And the ideas that features like Windows Studio Effects

00:04:12.260 --> 00:04:15.420
will run on device so your video calls look a little bit nicer.

00:04:15.420 --> 00:04:18.700
But as time goes on, both PC and phone manufacturers

00:04:18.700 --> 00:04:22.220
are aiming to get more and more AI functions running locally.

00:04:22.220 --> 00:04:26.100
You're already seeing the push for this with how both Team Red and Team Blue

00:04:26.100 --> 00:04:31.780
have partnered with a number of outside software developers to make applications that can take advantage of their NPUs.

00:04:31.780 --> 00:04:35.220
While it remains to be seen what AI features will become mainstays,

00:04:35.220 --> 00:04:40.380
it's clear that your gadgets are going to have significantly more brain power going forward.

00:04:40.380 --> 00:04:42.660
For better or for worse.

00:04:45.700 --> 00:04:49.980
If you guys enjoyed this video, leave a like or dislike depending on how you feel.

00:04:49.980 --> 00:04:53.100
Check out our video on the hardware that runs ChatGPT

00:04:53.100 --> 00:04:56.100
if you're looking for something else to watch and leave a comment if you have a suggestion

00:04:56.100 --> 00:04:59.580
for a future video. And of course, don't forget to subscribe.
