WEBVTT

00:00:00.000 --> 00:00:03.220
So you've finally gotten your hot little hands

00:00:03.220 --> 00:00:09.000
on a shiny new graphics card for your PC, and you're confident that by pushing it to its limits,

00:00:09.000 --> 00:00:13.520
you'll be able to run any game at mouth-watering settings.

00:00:15.560 --> 00:00:18.840
But it turns out there's a long-standing bottleneck

00:00:18.840 --> 00:00:21.920
that limits how many frames your GPU can push.

00:00:21.920 --> 00:00:25.920
And we're not talking about your CPU or the PCI Express slot.

00:00:25.920 --> 00:00:31.040
It's actually the GPU's pipeline, the set of steps that visual data has to go through

00:00:31.040 --> 00:00:35.260
in order to become a fully rendered image. But how does that pipeline bottleneck

00:00:35.260 --> 00:00:39.840
your gaming experience? Let's find out by learning how a graphics pipeline works.

00:00:39.840 --> 00:00:44.680
And we'd like to extend our warm thanks to Jianye Liu, Senior Program Manager at Microsoft

00:00:44.680 --> 00:00:49.780
for helping us understand the problem and the solution that's starting to gain adoption.

00:00:51.360 --> 00:00:56.160
The first three steps of the pipeline, called the geometry pipeline, are really important

00:00:56.160 --> 00:01:01.200
and actually end up being where the bottleneck is. Step one is when the raw visual data,

00:01:01.200 --> 00:01:07.160
basically just numbers, plain numbers, like mama used to make them, goes through an input assembler,

00:01:07.160 --> 00:01:11.360
which takes the vertices of the triangles that ultimately make up a finished image

00:01:11.360 --> 00:01:14.760
and organizes them so that they're pointing at each other correctly.

00:01:14.760 --> 00:01:20.000
Step two takes this organized set of vertices and puts them through a vertex shader,

00:01:20.000 --> 00:01:24.320
which raises or lowers the vertices to create a 3D mesh of sorts.

00:01:24.320 --> 00:01:29.600
For example, the vertex shader can create a bumpy texture on a wall or ripples on a pond.

00:01:29.600 --> 00:01:35.360
The third step is a rasterizer. This is the bit that puts pixels inside each triangle

00:01:35.360 --> 00:01:39.200
to fill out the image. Now, there are other steps beyond these three,

00:01:39.200 --> 00:01:42.240
namely pixel shading, which gives each pixel

00:01:42.240 --> 00:01:47.080
the appropriate color and lighting, and the output merger, which puts different visual elements

00:01:47.080 --> 00:01:51.240
together, such as ensuring a character standing in front of a wall is displayed properly

00:01:51.280 --> 00:01:54.920
so you only see the character and not the wall behind them.

00:01:54.920 --> 00:01:59.600
However, the big bottleneck we're talking about today involves those first three steps.

00:01:59.600 --> 00:02:03.720
But why are they such a bottleneck? I mean, the process seems pretty straightforward.

00:02:03.720 --> 00:02:08.920
Well, there are a few big reasons for this. One is that that first step, the input assembler,

00:02:08.920 --> 00:02:12.680
can only understand data that's organized in a very specific way.

00:02:12.680 --> 00:02:17.080
Typically, it can't accept compressed data that can be moved around more quickly.

00:02:17.080 --> 00:02:20.220
Or if a developer thinks of a more efficient way to organize their data,

00:02:20.220 --> 00:02:25.740
the input assembler simply won't be able to understand it. Two, there are actually several optional stages

00:02:25.740 --> 00:02:29.660
after the vertex shader. For example, one is called a geometry shader

00:02:29.660 --> 00:02:34.780
that can take a point and expand it out to a particular shape, such as a strand of hair.

00:02:34.780 --> 00:02:39.900
This is quicker than drawing a bunch of new triangles, but the geometry shader and other optional steps

00:02:39.900 --> 00:02:43.460
have been added over the years as games have become more complex.

00:02:43.460 --> 00:02:48.900
They're essentially glommed onto the pipeline in a rigid, sequential way that can't be processed

00:02:48.940 --> 00:02:53.860
in parallel, meaning they take longer. Three, because of this rigid sequence,

00:02:53.860 --> 00:02:57.740
you have to wait until you get to the rasterizer to start culling.

00:02:57.740 --> 00:03:01.780
Sounds dangerous. Or throwing away unused triangles that are way off

00:03:01.780 --> 00:03:06.820
in the distance or obscured by an object on the screen. This is a problem because by the time the data

00:03:06.820 --> 00:03:11.740
has gotten to the rasterizer phase of the pipeline, the GPU has already done a ton of legwork

00:03:11.740 --> 00:03:14.900
rendering unnecessary triangles.

00:03:14.900 --> 00:03:20.100
Sounds like my typical Wednesday evening. So because the geometry pipeline is so inflexible,

00:03:20.100 --> 00:03:25.340
the solution isn't to retool it. It's to replace it completely.

00:03:25.340 --> 00:03:30.860
This is where mesh shading comes in, one of the biggest features in the DirectX 12 Ultimate API.

00:03:30.860 --> 00:03:34.300
Instead of having discrete steps before the rasterizer,

00:03:34.300 --> 00:03:39.500
the mesh shader is one stage that can do a few really cool things.

00:03:39.500 --> 00:03:42.940
First, the data that you feed into it can be much more arbitrary

00:03:42.940 --> 00:03:47.940
so it can understand compressed data and other data sets an old school input assembler couldn't.

00:03:47.940 --> 00:03:52.500
You can't handle this new data old man. Essentially, the mesh shader is almost

00:03:52.500 --> 00:03:57.180
like a mini programmable computer. So if a developer wants to accomplish some rendering task

00:03:57.180 --> 00:04:02.300
more efficiently, they can just code it in. Each of the processes above can also intelligently

00:04:02.300 --> 00:04:06.580
communicate with each other within a mesh shader. So instead of waiting to cull triangles

00:04:06.580 --> 00:04:11.300
so late in the process, it can be done earlier. And the geometry can even be arranged in a way

00:04:11.300 --> 00:04:15.540
to make culling easier, demanding even less GPU power.

00:04:15.540 --> 00:04:20.540
Oh, and we haven't even talked about the whole reason it's called mesh shading in the first place.

00:04:20.540 --> 00:04:23.540
It's a really fun term, meshlets.

00:04:23.540 --> 00:04:27.980
Instead of working on one triangle at once, your GPU can instead work on meshes

00:04:27.980 --> 00:04:33.900
of multiple triangles in parallel. So instead of making a decision about culling one triangle,

00:04:33.900 --> 00:04:38.340
your GPU can instead do them in batches. Throwing out data, it doesn't need a process

00:04:38.340 --> 00:04:43.900
and saving precious resources. Older vertex shaders only saw a soup of points

00:04:43.900 --> 00:04:47.940
instead of an actual mesh, but mesh shaders are much smarter

00:04:47.940 --> 00:04:51.620
and they can know exactly what they're working with much earlier in the rendering process.

00:04:51.620 --> 00:04:56.300
So what does all of this mean? Well, because your GPU doesn't have to work

00:04:56.300 --> 00:05:00.500
as hard for each frame, it means faster frame rates and more detailed environments.

00:05:00.500 --> 00:05:03.660
Ultimately, the things we all want from our graphics cards.

00:05:03.660 --> 00:05:08.220
Currently, many game developers are looking at the best ways to implement mesh shading into their titles

00:05:08.260 --> 00:05:12.100
because it's such a customizable tool and the old geometry pipeline

00:05:12.100 --> 00:05:16.580
has been around for such a long time. It might take a few years before we see widespread adoption

00:05:16.580 --> 00:05:20.140
of mesh shading in popular games. The good news though, is that there's already

00:05:20.140 --> 00:05:23.260
hardware support for it with newer consumer graphics cards.

00:05:23.260 --> 00:05:28.300
So by the time we see these games on the market, chances are we'll all have next-gen graphics cards

00:05:28.300 --> 00:05:31.380
that support these new features. Every one of us.

00:05:33.380 --> 00:05:35.940
Yes! I can't wait.

00:05:36.940 --> 00:05:41.180
Well guys, that's a Techquickie video. Not sure if you've ever seen one before,

00:05:41.180 --> 00:05:45.220
but that's what it's like. Hey, speaking of like, you wanna like the video

00:05:45.220 --> 00:05:48.980
if you liked it? Hey, do you have a like? Do you have a like to give out?

00:05:48.980 --> 00:05:52.900
Please, we love your likes. Also check out other videos,

00:05:52.900 --> 00:05:57.300
comment below with video suggestions and don't forget to subscribe and follow Techquickie.

00:05:57.300 --> 00:05:58.140
Love you.
