Okay, so, we’ve got all our DirectX stuff set up to start rendering pretty pictures.
So it’s important, at this time, to talk about the pipeline that does the rendering. Unfortunately, it’s a beast:
Some of these can be simplified or ignored for now — but it’s important you understand this stuff. This is the core of rendering.
This stage is where we assemble (as in, gather together) our inputs (as in, our vertices and textures and stuff). The input-assembler stage knows what information needs to be associated with which vertices and shaders (does every vertex have an associated color, if it needs one? A UV position? Are we loading the textures each shader needs?). This stage makes sure to get that information from the CPU, and it passes that information in to the GPU for processing.
This stage does operations on vertices, and vertices alone. It receives one vertex with associated data, and outputs one vertex with associated data. It’s totally possible you don’t want to affect the vertex at all, in which case your vertex shader will just pass data through, untouched. One of the most common operations in the vertex shader is skinning, or moving vertices to follow a skeleton doing character animations.
HULL SHADER + TESSELLATOR + DOMAIN SHADER
These are new for DirectX 11, and a bit advanced, so I’m summarizing all these stages at once. The vertex shader only allows one output vertex per input vertex — you can’t end up with more vertices than you passed in. However, generating vertices on-the-fly has turned out to be very useful for algorithms like dynamic level of detail. So these pipeline stages were created. They allow you to generate new vertices to pass to further stages. The tessellation stages specifically are designed to create vertices that “smooth” the paths shaped by other vertices. For basic projects, it’s common to not use these stages at all.
Also fairly new, introduced in DirectX 10. This stage does operations on primitives — or triangles (also lines and points, but usually triangles). It takes as input all the vertices to build the triangle (and possibly additional vertices indicating the area around that triangle), and can output any number of vertices (including zero). In the sense that it operates on sets of vertices and outputs not-necessarily-the-same-amount of vertices, it’s similar to the hull shader / tessellator / domain shader. However, the geometry shader is different, because it can output less vertices than it received, and it allows you to create vertices anywhere, whereas tessellation can only create vertices along the path shaped by other vertices. Before DirectX 11, tessellation was done in the geometry shader, but because it was such a common use case, DX11 moved tessellation into its own special purpose (and significantly faster) pipeline stages. For basic projects, it’s common to not use this at all.
After running the geometry shader, you have the full set of vertices you want to operate on. The stream-output stage allows you to redirect all the vertices back into the input-assembler stage for a second pass, or copy them out to CPU for further processing. This stage is optional, and will not be used if you only need one pass to generate your vertices and don’t need the CPU to know what those vertices are (which, again, is probably the case for basic projects).
The GPU outputs a 1920×1080 (or whatever size) RGB image, but right now it only has a bunch of 3d vertex data. The rasterizer bridges that gap. It takes as input the entire set of triangle positions in the scene, information about where the camera is positioned and where it’s looking, and the size of the image to output. It then determines which triangles the camera would “see”, and which pixels they take up. This sounds easy, but is actually hard.
This stage works on each individual pixel of your image, and does things like texturing and lighting. Essentially, it’s where everything is made to look pretty. It receives input data about the vertices that compose the primitive that this pixel “sees”, interpolated to match the position of this pixel on the primitive itself. It then performs operations (such as “look up this corresponding pixel in a texture” or “light this pixel as though it were 38 degrees tilted and 3 meters away from an orange light”), and outputs per-pixel data — most notably the color of the pixel. Arguably, this is the most important stage of the entire DirectX pipeline, because this is where most an image’s prettiness comes from.
Although it seems like you’re done at this point, you can’t just take the pixel shader output and arrange it all to make an image. It’s possible for the pixel shader to compute multiple output colors for a single pixel — for instance, if a pixel “sees” an opaque object and a semi-transparent object in front of it, or if the pixel shader was given a far-away object to compute colors for but it later turned out that pixel’s “sight” was blocked by a closer object. This is a problem, since our final rendered frame can only contain one color per pixel. So that’s where the output merger stage comes in. You tell it how to handle differences in depth and alpha values (as well as stencil values, which you can set for fancy rendering tricks) between two outputs of the same pixel. It follows those rules and creates the one final image to draw to screen.
And there you go! It’s a lot, but this is all the steps the GPU takes to go from raw data to an output image. There is no magic, just these steps.