{"video_id":"fp_E09N4oK60Z","title":"TQ: Are more cores actually BETTER?","channel":"Techquickie","show":"Techquickie","published_at":"2019-12-04T07:36:48.408Z","duration_s":310,"segments":[{"start_s":0.0,"end_s":3.92,"text":"Whether we're talking dollars in your bank account, items on a seafood buffet, or dates","speaker":null,"is_sponsor":0},{"start_s":3.92,"end_s":10.0,"text":"you've got lined up on Tinder, more is generally considered to be better. A sentiment that also","speaker":null,"is_sponsor":0},{"start_s":10.0,"end_s":15.12,"text":"seems to hold true with the number of cores in your computer's CPU. At least if you buy into the","speaker":null,"is_sponsor":0},{"start_s":15.12,"end_s":20.56,"text":"marketing. But hold on. Even though having many cores definitely gives you a boost in multi-threaded","speaker":null,"is_sponsor":0},{"start_s":20.56,"end_s":26.8,"text":"applications like rendering 3D animations, there are actually situations where more cores","speaker":null,"is_sponsor":0},{"start_s":26.8,"end_s":31.84,"text":"gives no benefit whatsoever, or can even actually hurt your system's performance.","speaker":null,"is_sponsor":0},{"start_s":32.48,"end_s":38.08,"text":"But how could this be? Well, to start off with, the more cores you pack onto a CPU,","speaker":null,"is_sponsor":0},{"start_s":38.08,"end_s":43.52,"text":"the more power they need, and the more heat they generate. And remember that because CPU cores are","speaker":null,"is_sponsor":0},{"start_s":43.52,"end_s":49.76,"text":"crammed into a relatively small space, manufacturers end up working against some serious limits","speaker":null,"is_sponsor":0},{"start_s":49.76,"end_s":56.0,"text":"when it comes to thermal design power or TDP. This means that to prevent the CPU from drawing too","speaker":null,"is_sponsor":0},{"start_s":56.16,"end_s":61.6,"text":"much power and producing too much heat, the individual cores have traditionally run their","speaker":null,"is_sponsor":0},{"start_s":61.6,"end_s":68.8,"text":"clock frequencies lower to improve efficiency. And even if the advertised boost clock for a CPU","speaker":null,"is_sponsor":0},{"start_s":68.8,"end_s":74.8,"text":"with lots and lots of cores can appear to be high, it's often the case that they cannot maintain","speaker":null,"is_sponsor":0},{"start_s":74.8,"end_s":79.84,"text":"these clocks for long periods of time, or that they only do it when you're running very light","speaker":null,"is_sponsor":0},{"start_s":79.84,"end_s":85.52,"text":"applications. So if you're using your computer mostly for applications where single-threaded","speaker":null,"is_sponsor":0},{"start_s":85.52,"end_s":93.76,"text":"performance matters more, such as games, that super-expensive 18-core CPU might actually yield","speaker":null,"is_sponsor":0},{"start_s":93.76,"end_s":100.88,"text":"you a worse experience than something cheaper. And if you go with a really high-core-count CPU,","speaker":null,"is_sponsor":0},{"start_s":100.88,"end_s":106.88,"text":"there's another wrinkle with how processors with that many cores access the system memory.","speaker":null,"is_sponsor":0},{"start_s":106.88,"end_s":113.52,"text":"You see, in some cases, these larger CPUs need to have their cores split into two groups or","speaker":null,"is_sponsor":0},{"start_s":113.52,"end_s":119.28,"text":"nodes of cores, with each group getting its own memory controller and segment of the physical","speaker":null,"is_sponsor":0},{"start_s":119.28,"end_s":126.72,"text":"memory in a scheme called non-uniform memory access or NUMA. This is generally quicker than","speaker":null,"is_sponsor":0},{"start_s":126.72,"end_s":132.88,"text":"the opposite solution called uniform memory access or UMA, where all the cores share one","speaker":null,"is_sponsor":0},{"start_s":132.88,"end_s":139.52,"text":"big pool of memory. But here's the thing, a CPU that uses NUMA, which is better for latency-sensitive","speaker":null,"is_sponsor":0},{"start_s":139.52,"end_s":145.04,"text":"applications, can often struggle when running a single program that uses tons of threads.","speaker":null,"is_sponsor":0},{"start_s":146.32,"end_s":150.72,"text":"Because of the different memory access times between the nodes and the fact that each node","speaker":null,"is_sponsor":0},{"start_s":150.72,"end_s":155.12,"text":"would have to wait on the other one to finish working on the same data, highly multi-threaded","speaker":null,"is_sponsor":0},{"start_s":155.12,"end_s":160.48,"text":"programs like these often don't want to cross nodes, even if it would mean being able to take","speaker":null,"is_sponsor":0},{"start_s":160.48,"end_s":168.64,"text":"advantage of the entire CPU. So back to UMA then, right? No. Because one controller manages all the","speaker":null,"is_sponsor":0},{"start_s":168.64,"end_s":174.64,"text":"memory accesses to give every program equal time, rather than allowing access to the memory more","speaker":null,"is_sponsor":0},{"start_s":174.64,"end_s":183.2,"text":"directly as in NUMA, UMA has a built-in performance penalty that increases the more nodes your system","speaker":null,"is_sponsor":0},{"start_s":183.2,"end_s":189.28,"text":"has to manage. So using a CPU with separate groups of cores means you're going to be subjected to","speaker":null,"is_sponsor":0},{"start_s":189.28,"end_s":194.64,"text":"one of these drawbacks and you're going to take a performance hit either way. And these are problems","speaker":null,"is_sponsor":0},{"start_s":194.64,"end_s":200.24,"text":"that you simply don't run into on smaller chips with fewer cores because you're not dealing with","speaker":null,"is_sponsor":0},{"start_s":200.24,"end_s":206.16,"text":"multiple nodes. But getting away from memory access, sometimes the cores themselves are even","speaker":null,"is_sponsor":0},{"start_s":206.16,"end_s":210.8,"text":"designed in a way that bottlenecks them the more of them you slap onto a chip. Do you remember","speaker":null,"is_sponsor":0},{"start_s":210.8,"end_s":217.2,"text":"how before Ryzen AMD processors seemed to be significantly slower than Intel despite having","speaker":null,"is_sponsor":0},{"start_s":217.2,"end_s":224.0,"text":"more cores? Well a big reason for this was that those old bulldozer FX processors didn't use","speaker":null,"is_sponsor":0},{"start_s":224.08,"end_s":231.2,"text":"full cores. Instead an FX CPU advertised as having eight cores would in reality have eight","speaker":null,"is_sponsor":0},{"start_s":231.2,"end_s":237.12,"text":"integer units but only four floating point units that were shared between the eight cores. So if","speaker":null,"is_sponsor":0},{"start_s":237.12,"end_s":246.96,"text":"you don't know what a floating point unit is you can learn more about that right up here. But the point is you could think of these CPUs as having four half cores that were missing which severely","speaker":null,"is_sponsor":0},{"start_s":246.96,"end_s":252.96,"text":"hampered their single threaded performance in some key applications. Now this design allowed AMD","speaker":null,"is_sponsor":0},{"start_s":252.96,"end_s":257.44,"text":"processors to handle more threads for a cheaper price but it also meant that their real world","speaker":null,"is_sponsor":0},{"start_s":257.44,"end_s":262.96,"text":"performance lagged way behind Intel and the only way AMD could try and compensate was to increase","speaker":null,"is_sponsor":0},{"start_s":262.96,"end_s":268.24,"text":"clock speeds which increased heat output and contributed to AMD's reputation for hot running","speaker":null,"is_sponsor":0},{"start_s":268.24,"end_s":274.72,"text":"CPUs for many years. So what's our bottom line then? Although both AMD and Intel are using much","speaker":null,"is_sponsor":0},{"start_s":274.72,"end_s":280.16,"text":"wiser strategies for their many core CPUs and clever boosting techniques to give them similar","speaker":null,"is_sponsor":0},{"start_s":280.24,"end_s":286.48,"text":"single threaded performance to their less costly brethren, if the best sales pitch for a super","speaker":null,"is_sponsor":0},{"start_s":286.48,"end_s":292.16,"text":"premium product is that it doesn't suffer a performance penalty in the applications that you","speaker":null,"is_sponsor":0},{"start_s":292.16,"end_s":298.64,"text":"use, well you'd better make sure you've got a use case for it before spending your hard-earned cash","speaker":null,"is_sponsor":0},{"start_s":298.64,"end_s":304.4,"text":"and no playing Fortnite and watching Ted Quickie definitely don't count. So thanks for watching","speaker":null,"is_sponsor":0},{"start_s":304.4,"end_s":309.84,"text":"guys like dislike check out our other videos and don't forget to subscribe","speaker":null,"is_sponsor":0}],"full_text":"Whether we're talking dollars in your bank account, items on a seafood buffet, or dates you've got lined up on Tinder, more is generally considered to be better. A sentiment that also seems to hold true with the number of cores in your computer's CPU. At least if you buy into the marketing. But hold on. Even though having many cores definitely gives you a boost in multi-threaded applications like rendering 3D animations, there are actually situations where more cores gives no benefit whatsoever, or can even actually hurt your system's performance. But how could this be? Well, to start off with, the more cores you pack onto a CPU, the more power they need, and the more heat they generate. And remember that because CPU cores are crammed into a relatively small space, manufacturers end up working against some serious limits when it comes to thermal design power or TDP. This means that to prevent the CPU from drawing too much power and producing too much heat, the individual cores have traditionally run their clock frequencies lower to improve efficiency. And even if the advertised boost clock for a CPU with lots and lots of cores can appear to be high, it's often the case that they cannot maintain these clocks for long periods of time, or that they only do it when you're running very light applications. So if you're using your computer mostly for applications where single-threaded performance matters more, such as games, that super-expensive 18-core CPU might actually yield you a worse experience than something cheaper. And if you go with a really high-core-count CPU, there's another wrinkle with how processors with that many cores access the system memory. You see, in some cases, these larger CPUs need to have their cores split into two groups or nodes of cores, with each group getting its own memory controller and segment of the physical memory in a scheme called non-uniform memory access or NUMA. This is generally quicker than the opposite solution called uniform memory access or UMA, where all the cores share one big pool of memory. But here's the thing, a CPU that uses NUMA, which is better for latency-sensitive applications, can often struggle when running a single program that uses tons of threads. Because of the different memory access times between the nodes and the fact that each node would have to wait on the other one to finish working on the same data, highly multi-threaded programs like these often don't want to cross nodes, even if it would mean being able to take advantage of the entire CPU. So back to UMA then, right? No. Because one controller manages all the memory accesses to give every program equal time, rather than allowing access to the memory more directly as in NUMA, UMA has a built-in performance penalty that increases the more nodes your system has to manage. So using a CPU with separate groups of cores means you're going to be subjected to one of these drawbacks and you're going to take a performance hit either way. And these are problems that you simply don't run into on smaller chips with fewer cores because you're not dealing with multiple nodes. But getting away from memory access, sometimes the cores themselves are even designed in a way that bottlenecks them the more of them you slap onto a chip. Do you remember how before Ryzen AMD processors seemed to be significantly slower than Intel despite having more cores? Well a big reason for this was that those old bulldozer FX processors didn't use full cores. Instead an FX CPU advertised as having eight cores would in reality have eight integer units but only four floating point units that were shared between the eight cores. So if you don't know what a floating point unit is you can learn more about that right up here. But the point is you could think of these CPUs as having four half cores that were missing which severely hampered their single threaded performance in some key applications. Now this design allowed AMD processors to handle more threads for a cheaper price but it also meant that their real world performance lagged way behind Intel and the only way AMD could try and compensate was to increase clock speeds which increased heat output and contributed to AMD's reputation for hot running CPUs for many years. So what's our bottom line then? Although both AMD and Intel are using much wiser strategies for their many core CPUs and clever boosting techniques to give them similar single threaded performance to their less costly brethren, if the best sales pitch for a super premium product is that it doesn't suffer a performance penalty in the applications that you use, well you'd better make sure you've got a use case for it before spending your hard-earned cash and no playing Fortnite and watching Ted Quickie definitely don't count. So thanks for watching guys like dislike check out our other videos and don't forget to subscribe"}