WEBVTT

00:00:00.000 --> 00:00:03.920
Whether we're talking dollars in your bank account, items on a seafood buffet, or dates

00:00:03.920 --> 00:00:10.000
you've got lined up on Tinder, more is generally considered to be better. A sentiment that also

00:00:10.000 --> 00:00:15.120
seems to hold true with the number of cores in your computer's CPU. At least if you buy into the

00:00:15.120 --> 00:00:20.560
marketing. But hold on. Even though having many cores definitely gives you a boost in multi-threaded

00:00:20.560 --> 00:00:26.800
applications like rendering 3D animations, there are actually situations where more cores

00:00:26.800 --> 00:00:31.840
gives no benefit whatsoever, or can even actually hurt your system's performance.

00:00:32.480 --> 00:00:38.080
But how could this be? Well, to start off with, the more cores you pack onto a CPU,

00:00:38.080 --> 00:00:43.520
the more power they need, and the more heat they generate. And remember that because CPU cores are

00:00:43.520 --> 00:00:49.760
crammed into a relatively small space, manufacturers end up working against some serious limits

00:00:49.760 --> 00:00:56.000
when it comes to thermal design power or TDP. This means that to prevent the CPU from drawing too

00:00:56.160 --> 00:01:01.600
much power and producing too much heat, the individual cores have traditionally run their

00:01:01.600 --> 00:01:08.800
clock frequencies lower to improve efficiency. And even if the advertised boost clock for a CPU

00:01:08.800 --> 00:01:14.800
with lots and lots of cores can appear to be high, it's often the case that they cannot maintain

00:01:14.800 --> 00:01:19.840
these clocks for long periods of time, or that they only do it when you're running very light

00:01:19.840 --> 00:01:25.520
applications. So if you're using your computer mostly for applications where single-threaded

00:01:25.520 --> 00:01:33.760
performance matters more, such as games, that super-expensive 18-core CPU might actually yield

00:01:33.760 --> 00:01:40.880
you a worse experience than something cheaper. And if you go with a really high-core-count CPU,

00:01:40.880 --> 00:01:46.880
there's another wrinkle with how processors with that many cores access the system memory.

00:01:46.880 --> 00:01:53.520
You see, in some cases, these larger CPUs need to have their cores split into two groups or

00:01:53.520 --> 00:01:59.280
nodes of cores, with each group getting its own memory controller and segment of the physical

00:01:59.280 --> 00:02:06.720
memory in a scheme called non-uniform memory access or NUMA. This is generally quicker than

00:02:06.720 --> 00:02:12.880
the opposite solution called uniform memory access or UMA, where all the cores share one

00:02:12.880 --> 00:02:19.520
big pool of memory. But here's the thing, a CPU that uses NUMA, which is better for latency-sensitive

00:02:19.520 --> 00:02:25.040
applications, can often struggle when running a single program that uses tons of threads.

00:02:26.320 --> 00:02:30.720
Because of the different memory access times between the nodes and the fact that each node

00:02:30.720 --> 00:02:35.120
would have to wait on the other one to finish working on the same data, highly multi-threaded

00:02:35.120 --> 00:02:40.480
programs like these often don't want to cross nodes, even if it would mean being able to take

00:02:40.480 --> 00:02:48.640
advantage of the entire CPU. So back to UMA then, right? No. Because one controller manages all the

00:02:48.640 --> 00:02:54.640
memory accesses to give every program equal time, rather than allowing access to the memory more

00:02:54.640 --> 00:03:03.200
directly as in NUMA, UMA has a built-in performance penalty that increases the more nodes your system

00:03:03.200 --> 00:03:09.280
has to manage. So using a CPU with separate groups of cores means you're going to be subjected to

00:03:09.280 --> 00:03:14.640
one of these drawbacks and you're going to take a performance hit either way. And these are problems

00:03:14.640 --> 00:03:20.240
that you simply don't run into on smaller chips with fewer cores because you're not dealing with

00:03:20.240 --> 00:03:26.160
multiple nodes. But getting away from memory access, sometimes the cores themselves are even

00:03:26.160 --> 00:03:30.800
designed in a way that bottlenecks them the more of them you slap onto a chip. Do you remember

00:03:30.800 --> 00:03:37.200
how before Ryzen AMD processors seemed to be significantly slower than Intel despite having

00:03:37.200 --> 00:03:44.000
more cores? Well a big reason for this was that those old bulldozer FX processors didn't use

00:03:44.080 --> 00:03:51.200
full cores. Instead an FX CPU advertised as having eight cores would in reality have eight

00:03:51.200 --> 00:03:57.120
integer units but only four floating point units that were shared between the eight cores. So if

00:03:57.120 --> 00:04:06.960
you don't know what a floating point unit is you can learn more about that right up here. But the point is you could think of these CPUs as having four half cores that were missing which severely

00:04:06.960 --> 00:04:12.960
hampered their single threaded performance in some key applications. Now this design allowed AMD

00:04:12.960 --> 00:04:17.440
processors to handle more threads for a cheaper price but it also meant that their real world

00:04:17.440 --> 00:04:22.960
performance lagged way behind Intel and the only way AMD could try and compensate was to increase

00:04:22.960 --> 00:04:28.240
clock speeds which increased heat output and contributed to AMD's reputation for hot running

00:04:28.240 --> 00:04:34.720
CPUs for many years. So what's our bottom line then? Although both AMD and Intel are using much

00:04:34.720 --> 00:04:40.160
wiser strategies for their many core CPUs and clever boosting techniques to give them similar

00:04:40.240 --> 00:04:46.480
single threaded performance to their less costly brethren, if the best sales pitch for a super

00:04:46.480 --> 00:04:52.160
premium product is that it doesn't suffer a performance penalty in the applications that you

00:04:52.160 --> 00:04:58.640
use, well you'd better make sure you've got a use case for it before spending your hard-earned cash

00:04:58.640 --> 00:05:04.400
and no playing Fortnite and watching Ted Quickie definitely don't count. So thanks for watching

00:05:04.400 --> 00:05:09.840
guys like dislike check out our other videos and don't forget to subscribe
