1
00:00:00,000 --> 00:00:03,920
Whether we're talking dollars in your bank account, items on a seafood buffet, or dates

2
00:00:03,920 --> 00:00:10,000
you've got lined up on Tinder, more is generally considered to be better. A sentiment that also

3
00:00:10,000 --> 00:00:15,120
seems to hold true with the number of cores in your computer's CPU. At least if you buy into the

4
00:00:15,120 --> 00:00:20,560
marketing. But hold on. Even though having many cores definitely gives you a boost in multi-threaded

5
00:00:20,560 --> 00:00:26,800
applications like rendering 3D animations, there are actually situations where more cores

6
00:00:26,800 --> 00:00:31,840
gives no benefit whatsoever, or can even actually hurt your system's performance.

7
00:00:32,480 --> 00:00:38,080
But how could this be? Well, to start off with, the more cores you pack onto a CPU,

8
00:00:38,080 --> 00:00:43,520
the more power they need, and the more heat they generate. And remember that because CPU cores are

9
00:00:43,520 --> 00:00:49,760
crammed into a relatively small space, manufacturers end up working against some serious limits

10
00:00:49,760 --> 00:00:56,000
when it comes to thermal design power or TDP. This means that to prevent the CPU from drawing too

11
00:00:56,160 --> 00:01:01,600
much power and producing too much heat, the individual cores have traditionally run their

12
00:01:01,600 --> 00:01:08,800
clock frequencies lower to improve efficiency. And even if the advertised boost clock for a CPU

13
00:01:08,800 --> 00:01:14,800
with lots and lots of cores can appear to be high, it's often the case that they cannot maintain

14
00:01:14,800 --> 00:01:19,840
these clocks for long periods of time, or that they only do it when you're running very light

15
00:01:19,840 --> 00:01:25,520
applications. So if you're using your computer mostly for applications where single-threaded

16
00:01:25,520 --> 00:01:33,760
performance matters more, such as games, that super-expensive 18-core CPU might actually yield

17
00:01:33,760 --> 00:01:40,880
you a worse experience than something cheaper. And if you go with a really high-core-count CPU,

18
00:01:40,880 --> 00:01:46,880
there's another wrinkle with how processors with that many cores access the system memory.

19
00:01:46,880 --> 00:01:53,520
You see, in some cases, these larger CPUs need to have their cores split into two groups or

20
00:01:53,520 --> 00:01:59,280
nodes of cores, with each group getting its own memory controller and segment of the physical

21
00:01:59,280 --> 00:02:06,720
memory in a scheme called non-uniform memory access or NUMA. This is generally quicker than

22
00:02:06,720 --> 00:02:12,880
the opposite solution called uniform memory access or UMA, where all the cores share one

23
00:02:12,880 --> 00:02:19,520
big pool of memory. But here's the thing, a CPU that uses NUMA, which is better for latency-sensitive

24
00:02:19,520 --> 00:02:25,040
applications, can often struggle when running a single program that uses tons of threads.

25
00:02:26,320 --> 00:02:30,720
Because of the different memory access times between the nodes and the fact that each node

26
00:02:30,720 --> 00:02:35,120
would have to wait on the other one to finish working on the same data, highly multi-threaded

27
00:02:35,120 --> 00:02:40,480
programs like these often don't want to cross nodes, even if it would mean being able to take

28
00:02:40,480 --> 00:02:48,640
advantage of the entire CPU. So back to UMA then, right? No. Because one controller manages all the

29
00:02:48,640 --> 00:02:54,640
memory accesses to give every program equal time, rather than allowing access to the memory more

30
00:02:54,640 --> 00:03:03,200
directly as in NUMA, UMA has a built-in performance penalty that increases the more nodes your system

31
00:03:03,200 --> 00:03:09,280
has to manage. So using a CPU with separate groups of cores means you're going to be subjected to

32
00:03:09,280 --> 00:03:14,640
one of these drawbacks and you're going to take a performance hit either way. And these are problems

33
00:03:14,640 --> 00:03:20,240
that you simply don't run into on smaller chips with fewer cores because you're not dealing with

34
00:03:20,240 --> 00:03:26,160
multiple nodes. But getting away from memory access, sometimes the cores themselves are even

35
00:03:26,160 --> 00:03:30,800
designed in a way that bottlenecks them the more of them you slap onto a chip. Do you remember

36
00:03:30,800 --> 00:03:37,200
how before Ryzen AMD processors seemed to be significantly slower than Intel despite having

37
00:03:37,200 --> 00:03:44,000
more cores? Well a big reason for this was that those old bulldozer FX processors didn't use

38
00:03:44,080 --> 00:03:51,200
full cores. Instead an FX CPU advertised as having eight cores would in reality have eight

39
00:03:51,200 --> 00:03:57,120
integer units but only four floating point units that were shared between the eight cores. So if

40
00:03:57,120 --> 00:04:06,960
you don't know what a floating point unit is you can learn more about that right up here. But the point is you could think of these CPUs as having four half cores that were missing which severely

41
00:04:06,960 --> 00:04:12,960
hampered their single threaded performance in some key applications. Now this design allowed AMD

42
00:04:12,960 --> 00:04:17,440
processors to handle more threads for a cheaper price but it also meant that their real world

43
00:04:17,440 --> 00:04:22,960
performance lagged way behind Intel and the only way AMD could try and compensate was to increase

44
00:04:22,960 --> 00:04:28,240
clock speeds which increased heat output and contributed to AMD's reputation for hot running

45
00:04:28,240 --> 00:04:34,720
CPUs for many years. So what's our bottom line then? Although both AMD and Intel are using much

46
00:04:34,720 --> 00:04:40,160
wiser strategies for their many core CPUs and clever boosting techniques to give them similar

47
00:04:40,240 --> 00:04:46,480
single threaded performance to their less costly brethren, if the best sales pitch for a super

48
00:04:46,480 --> 00:04:52,160
premium product is that it doesn't suffer a performance penalty in the applications that you

49
00:04:52,160 --> 00:04:58,640
use, well you'd better make sure you've got a use case for it before spending your hard-earned cash

50
00:04:58,640 --> 00:05:04,400
and no playing Fortnite and watching Ted Quickie definitely don't count. So thanks for watching

51
00:05:04,400 --> 00:05:09,840
guys like dislike check out our other videos and don't forget to subscribe
