Author Archives: walkerb

DirectX Part 3: Vertices and Shaders

Let’s get some real 3D going.

PART ONE: DEFINING YOUR DATA AND FUNCTIONS

Everything in 3D is just a series of triangles, and triangles are just a series of vertices. Vertices must have 3-dimensional positions — it’s the only absolutely required information DirectX 11 needs — but they can have any number of additional traits. Normal vectors, colors (for vertex coloring), lighting information (per-vertex lighting), metadata, etc. So, before anything happens we have to tell DX11 what our vertex layout looks like — that is, what information defines a given vertex:


D3D11_INPUT_ELEMENT_DESC pVertexLayout[] =
{
   { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
   { "COLOR", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
   { "SOME_MORE_DATA", 0, DXGI_FORMAT_R32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};
UINT uiNumElements = ARRAYSIZE( pVertexLayout );

Most of these toggles are meaningless to beginners. The two important ones are semantic ("POSITION" and "SOME_MORE_DATA"), which is the variable name you’ll call on in shaders, and format (DXGI_FORMAT_R32G32B32_FLOAT and DXGI_FORMAT_R32_FLOAT), which defines how much / what type of data is associated with the named variable.

You can name your vertex variables anything you want, but some names (such as "POSITION") are reserved and must have certain formats associated with them.

In our pVertexLayout, the format for "COLOR" is 3 RGB floats — easy. "POSITION" is also 3 RGB floats — they’re actually going to be used as XYZ, the RGB nomenclature means nothing. "SOME_MORE_DATA" is just one float for playing with.

Next, we’ll create the actual vertices to draw. It’s just going to look like raw numbers — only the pVertexLayout lets the GPU understand how to read the data.


FLOAT pVertexArray[] =
{
   0.0f, 0.5f, 0.5f,   1.0f, 0.0f, 0.0f,   0.2f,
   0.5f, -0.5f, 0.5f,   0.0f, 1.0f, 0.0f,   0.0f,
   -0.5f, -0.5f, 0.5f,   0.0f, 0.0f, 1.0f,   -0.2f
};

So, this defines three vertices:

  • a vertex located at (0.0, 0.5, 0.5) that’s colored red (1, 0, 0) and has a SOME_MORE_DATA of 0.2
  • a vertex located at (0.5, -0.5, 0.5) that’s colored green (0, 1, 0) and has a SOME_MORE_DATA of 0.0
  • a vertex located at (-0.5, -0.5, 0.5) that’s colored blue (0, 0, 1) and has a SOME_MORE_DATA of -0.2

Next, we’ll write the shader file itself! This should be exciting for you, because this is sorta the heart of rendering. Create a new file and call it “shaders.hlsl” or something similar. Just preserve the “.hlsl” format. HLSL is a common shader-authoring language, and you’re about to write a hello-world in it. Here it is:


struct VS_INPUT
{
   float4 vPosition : POSITION;
   float3 vColor : COLOR;
   float OffsetX : SOME_MORE_DATA;
};

struct VS_OUTPUT
{
   float4 vPosition : SV_POSITION;
   float3 vColor : COLOR;
};

VS_OUTPUT SimpleVertexShader( VS_INPUT Input )
{
   VS_OUTPUT Output;
   Output.vPosition.x = Input.vPosition.x + Input.OffsetX;
   Output.vPosition.y = Input.vPosition.y;
   Output.vPosition.z = Input.vPosition.z;
   Output.vPosition.w = Input.vPosition.w;
   Output.vColor = Input.vColor;
   return Output;
}

float4 SimplePixelShader( VS_OUTPUT Input ) : SV_Target
{
   return float4( Input.vColor.r, Input.vColor.g, Input.vColor.b, 1.0 );
}

This is fairly simple, largely because DirectX does a lot of magic in the background. We define a vertex shader that receives a pre-defined VS_INPUT struct and outputs a VS_OUTPUT struct. That float myVal : SOMETHING construct means that we want myVal to magically receive the value SOMETHING that we define in our pVertexLayout description.

SOME_MORE_DATA is going to be placed in OffsetX in our VS_INPUT, and POSITION and COLOR will also be there. We’ll create a VS_OUTPUT, copy over position and color, and add our OffsetX to the position’s x value. (By the way, fun fact — instead of saying vPosition.{x,y,z,w}, you can say vPosition.{r,g,b,a} or vPosition.{[0],[1],[2],[3]} — they all compile the same. Use whichever nomenclature makes sense!)

That SV_POSITION in VS_OUTPUT means that it’s a SYSTEM VALUE. System values are hardcoded variables that get special treatment, and the ultimate vertex position is one such special variable.

Then, SimplePixelShader will magically receive that information and return a color to draw to screen (by writing it in SV_Target — the special variable that stores a final color for this pixel).

So that’s everything you need — you’ve defined what your vertices will look like, you’ve made some vertices, and you’ve written a shader to handle them and draw the triangle they form to screen. Now, you need to hook it all up.

PART TWO: MAKING THE GPU AWARE OF YOUR DATA AND FUNCTIONS

First, write a function to handle shader compiling. Note that the shaders.hlsl file we just wrote contains multiple shaders — a vertex shader and a pixel shader — and we’ll have to compile each separately.


#include <C:\Program Files (x86)\Windows Kits\8.0\Include\um\d3dcompiler.h>

HRESULT CompileShaderFromFile(const WCHAR* pFileURI, const CHAR* pShaderName, const CHAR* pShaderModelName, ID3DBlob** ppOutBlob)
{
   DWORD dwShaderFlags = D3DCOMPILE_ENABLE_STRICTNESS;
   dwShaderFlags |= D3DCOMPILE_DEBUG;

   ID3DBlob* pErrorBlob = nullptr;

   HRESULT hr = D3DCompileFromFile( pFileURI, nullptr, D3D_COMPILE_STANDARD_FILE_INCLUDE, pShaderName, pShaderModelName, dwShaderFlags, 0, ppOutBlob, &pErrorBlob );

   if( FAILED(hr) ) return hr;

   if( pErrorBlob ) pErrorBlob->Release();

   return S_OK;
}

A lot of confusing toggles — par for the course. Pass in the path to shaders.hlsl in pFileURI, the name of the shader in pShaderName (i.e. "SimpleVertexShader"), and the name of the shader model to compile against in pShaderModelName (use "vs_5_0" for compiling vertex shaders, and "ps_5_0" for pixel shaders). The ppOutBlob returned is a handle to the compiled shader. Close your eyes to everything else.

Let’s use it to set up our vertex shader.


ID3DBlob* pVertexShaderBlob = nullptr;
ID3D11InputLayout* pVertexLayout = nullptr;

CompileShaderFromFile( L"SimpleShaders.hlsl", "SimpleVertexShader", "vs_5_0", &pVertexShaderBlob );
m_pd3dDevice->CreateVertexShader( pVertexShaderBlob->GetBufferPointer(), pVertexShaderBlob->GetBufferSize(), nullptr, &m_pVertexShader );
m_pDeviceContext->VSSetShader( m_pVertexShader, NULL, 0 );

HRESULT hr = m_pd3dDevice->CreateInputLayout( pVertexLayout, uiNumElements, pVertexShaderBlob->GetBufferPointer(), pVertexShaderBlob->GetBufferSize(), &pVertexLayout );

pVertexShaderBlob->Release();

m_pDeviceContext->IASetInputLayout( pVertexLayout );

So we use our new function to compile the SimpleVertexShader, we create a handle to the compiled code (m_pVertexShader) that recognizes it as a vertex shader, and then we tell our D3DDeviceContext to use it. Cool!

Next, we call m_pd3dDevice->CreateInputLayout, to make the GPU aware of our pVertexLayout that we defined all the way at the top, and set it as our official vertex layout. Note that CreateInputLayout requires the vertex shader in addition to the vertex input layout — this is because it cross-checks the two to make sure pVertexLayout contains all the information m_pVertexShader asks for.

Next, we set up our pixel shader, almost the same as we set our vertex shader…


ID3DBlob* pPixelShaderBlob = nullptr;
CompileShaderFromFile( L"SimpleShaders.hlsl", "SimplePixelShader", "ps_5_0", &pPixelShaderBlob );
m_pd3dDevice->CreatePixelShader( pPixelShaderBlob->GetBufferPointer(), pPixelShaderBlob->GetBufferSize(), nullptr, &m_pPixelShader );

pPixelShaderBlob->Release();

m_pDeviceContext->PSSetShader( m_pPixelShader, NULL, 0 );

…And then we set our vertices…


D3D11_BUFFER_DESC bd;
ZeroMemory( &bd, sizeof(bd) );
bd.ByteWidth = sizeof(pVertexArray)
bd.Usage = D3D11_USAGE_DEFAULT;
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0; //no CPU access necessary

D3D11_SUBRESOURCE_DATA InitData;
ZeroMemory( &InitData, sizeof(InitData) );
InitData.pSysMem = pVertexArray; //Memory in CPU to copy in to GPU

ID3D11Buffer* pVertexBuffer;
m_pd3dDevice->CreateBuffer( &bd, &InitData, &pVertexBuffer );

// Set vertex buffer
UINT offset = 0;
UINT stride = 7 * sizeof(float); //how much each vertex takes up in memory -- the size of 7 floats, one each for position XYZ, color RGB, and our SOME_MORE_DATA
m_pDeviceContext->IASetVertexBuffers( 0, 1, &pVertexBuffer , &stride , &offset );

m_pDeviceContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );

Which, despite the line count, isn’t actually that scary! Our D3D11_BUFFER_DESC just says we want to allocate some memory on the GPU with size equal to the size of pVertexArray, to be used as a vertex buffer — it’s default behavior in every other way. Our D3D11_SUBRESOURCE_DATA tells the GPU where our vertex data lives on the CPU. We pass both structures in to m_pd3dDevice->CreateBuffer to copy that data to the GPU, then tell the GPU to use it as our VertexBuffer!

And now, finally, everything is set up. In your Render loop, call m_pDeviceContext->Draw( 3, 0 ); to draw 3 vertices. And will you look at that.

It took hours, but damn it, that is your triangle.

triangle

DirectX Part 2.5: The Rendering Pipeline

Okay, so, we’ve got all our DirectX stuff set up to start rendering pretty pictures.

So it’s important, at this time, to talk about the pipeline that does the rendering. Unfortunately, it’s a beast:

That's a ten-step pipeline, yes it is

Some of these can be simplified or ignored for now — but it’s important you understand this stuff. This is the core of rendering.

INPUT-ASSEMBLER

This stage is where we assemble (as in, gather together) our inputs (as in, our vertices and textures and stuff). The input-assembler stage knows what information needs to be associated with which vertices and shaders (does every vertex have an associated color, if it needs one? A UV position? Are we loading the textures each shader needs?). This stage makes sure to get that information from the CPU, and it passes that information in to the GPU for processing.

VERTEX SHADER

This stage does operations on vertices, and vertices alone. It receives one vertex with associated data, and outputs one vertex with associated data. It’s totally possible you don’t want to affect the vertex at all, in which case your vertex shader will just pass data through, untouched. One of the most common operations in the vertex shader is skinning, or moving vertices to follow a skeleton doing character animations.

HULL SHADER + TESSELLATOR + DOMAIN SHADER

These are new for DirectX 11, and a bit advanced, so I’m summarizing all these stages at once. The vertex shader only allows one output vertex per input vertex — you can’t end up with more vertices than you passed in. However, generating vertices on-the-fly has turned out to be very useful for algorithms like dynamic level of detail. So these pipeline stages were created. They allow you to generate new vertices to pass to further stages. The tessellation stages specifically are designed to create vertices that “smooth” the paths shaped by other vertices. For basic projects, it’s common to not use these stages at all.

GEOMETRY SHADER

Also fairly new, introduced in DirectX 10. This stage does operations on primitives — or triangles (also lines and points, but usually triangles). It takes as input all the vertices to build the triangle (and possibly additional vertices indicating the area around that triangle), and can output any number of vertices (including zero). In the sense that it operates on sets of vertices and outputs not-necessarily-the-same-amount of vertices, it’s similar to the hull shader / tessellator / domain shader. However, the geometry shader is different, because it can output less vertices than it received, and it allows you to create vertices anywhere, whereas tessellation can only create vertices along the path shaped by other vertices. Before DirectX 11, tessellation was done in the geometry shader, but because it was such a common use case, DX11 moved tessellation into its own special purpose (and significantly faster) pipeline stages. For basic projects, it’s common to not use this at all.

STREAM OUTPUT

After running the geometry shader, you have the full set of vertices you want to operate on. The stream-output stage allows you to redirect all the vertices back into the input-assembler stage for a second pass, or copy them out to CPU for further processing.  This stage is optional, and will not be used if you only need one pass to generate your vertices and don’t need the CPU to know what those vertices are (which, again, is probably the case for basic projects).

RASTERIZER

The GPU outputs a 1920×1080 (or whatever size) RGB image, but right now it only has a bunch of 3d vertex data. The rasterizer bridges that gap. It takes as input the entire set of triangle positions in the scene, information about where the camera is positioned and where it’s looking, and the size of the image to output. It then determines which triangles the camera would “see”, and which pixels they take up. This sounds easy, but is actually hard.

PIXEL SHADER

This stage works on each individual pixel of your image, and does things like texturing and lighting. Essentially, it’s where everything is made to look pretty. It receives input data about the vertices that compose the primitive that this pixel “sees”, interpolated to match the position of this pixel on the primitive itself. It then performs operations (such as “look up this corresponding pixel in a texture” or “light this pixel as though it were 38 degrees tilted and 3 meters away from an orange light”), and outputs per-pixel data — most notably the color of the pixel. Arguably, this is the most important stage of the entire DirectX pipeline, because this is where most an image’s prettiness comes from.

OUTPUT MERGER

Although it seems like you’re done at this point, you can’t just take the pixel shader output and arrange it all to make an image. It’s possible for the pixel shader to compute multiple output colors for a single pixel — for instance, if a pixel “sees” an opaque object and a semi-transparent object in front of it, or if the pixel shader was given a far-away object to compute colors for but it later turned out that pixel’s “sight” was blocked by a closer object. This is a problem, since our final rendered frame can only contain one color per pixel. So that’s where the output merger stage comes in. You tell it how to handle differences in depth and alpha values (as well as stencil values, which you can set for fancy rendering tricks) between two outputs of the same pixel. It follows those rules and creates the one final image to draw to screen.

And there you go! It’s a lot, but this is all the steps the GPU takes to go from raw data to an output image. There is no magic, just these steps.

DirectX Part 2: Displaying, Like, Anything

So you just read the previous tutorial, you initialized Direct3D on your video card, and now you have pointers to these three items — a Swap Chain, a D3D Device, and a D3D Device Context. Well, in this tutorial, we’re going to display a color on screen! And it’s going to require using all three items.

Yes, that’s a lot of work for drawing a color to the screen — but it’s the groundwork for vertices and shaders and all that sexy stuff in the future. So let’s review.

D3D Device: This is an interface to the physical resources on your video card (i.e. GPU memory and processors). You’ll only have this one D3D Device to work with. Use it to allocate memory for textures, shaders, and the like.

D3D Device Context: This is an interface to the commands you want to give to the video card. As an analogy, the D3D Device is the company that builds the guitar, and the D3D Device Context is the musician that uses the guitar to play songs. You’ll only have this one context to work with (although pros use multiple contexts, known as “deferred” contexts). Things like passing in triangles, textures, and shader commands to generate a pretty picture are done through the D3D Device Context.

Swap Chain: This is an interface to the image you display on the monitor. So, it’ll contain a 1920×1080 texture that you draw on, to display on your 1920×1080 monitor. It’s called a swap chain because you’re actually drawing into 1 of 2 1920×1080 textures (you draw on one while the other is being displayed by the monitor), and then you swap those images when you’re done drawing. Since the swap chain directly controls what image is on the monitor, any time you want to see anything, you’ll use it.

Anyhow, let’s see how to draw a color to screen!

First, there’s some setup code you need to run once, after you initialize Direct3D:


ID3D11Texture2D* pBuffer = NULL;
m_pSwapChain->GetBuffer( 0, __uuidof( ID3D11Texture2D ), ( LPVOID* )&pBuffer );

m_pd3dDevice->CreateRenderTargetView( pBuffer, NULL, &m_pRenderTargetView );
pBackBuffer->Release();

m_pDeviceContext->OMSetRenderTargets( 1, &m_pRenderTargetView, NULL );

Again, let’s take it step by step.


ID3D11Texture2D* pBuffer = NULL;
m_pSwapChain->GetBuffer( 0, __uuidof( ID3D11Texture2D ), ( LPVOID* )&pBuffer );

This is simple enough: you’re making pBuffer point to the buffer (texture) at index 0 of the swap chain. You can’t grab the texture that’s currently displayed to screen, so this is the only texture you have to deal with.

That __uuidof( ID3D11Texture2D ) bit looks confusing, but it’s a fairly common setup here, so try to get comfortable with it! In order to future-proof the D3D APIs, rather than have GetBuffer(...) return a pointer to an ID3D11Texture2D (which will become obsolete come D3D 12), GetBuffer() writes out an empty void* pointer — but it guarantees that you can cast that pointer to whatever type you give in argument 2.


m_pd3dDevice->CreateRenderTargetView( pBuffer, NULL, &m_pRenderTargetView );
pBackBuffer->Release();

This code makes our D3D Device aware of the texture in our swap chain (which in turn will become the texture displayed in our monitor). Now, pBuffer and m_pRenderTargetView are two different interfaces to the same memory. They both modify the exact same image (which will eventually go to screen), but they expose different ways to modify it. Like an X-ray versus an infrared image of a human body — both offer different information about the same subject.

Calling Release() says that we don’t need to look at our frame texture in any of the methods exposed by pBackBuffer anymore. Our frame texture still exists in memory, but we can only view it through m_pRenderTargetView now. It’s a very good idea to Release() any handles you don’t need anymore.


m_pDeviceContext->OMSetRenderTargets( 1, &m_pRenderTargetView, NULL );

This says “Hey GPU! Render to this buffer”. Because we’re clever, we just made sure the buffer we’re rendering to is the one that gets displayed to screen — but it’s still not on screen yet! WE ARE SO CLOSE.

Fun fact: The “OM” stands for “Output Merger”, because it takes all the information you give the video card (which shaders/vertices/etc to use) and merges them all to create an output image.

AND NOW WE GET TO ACTUALLY DRAW. TO THE SCREEN.
In your update loop, or some loop that gets called every time you want to draw a new frame, include this code:


float ClearColor[4] = {
(float)rand() / RAND_MAX, //red
(float)rand() / RAND_MAX, //green
(float)rand() / RAND_MAX, //blue
1.0f //alpha
};
m_pDeviceContext->ClearRenderTargetView( m_pRenderTargetView, ClearColor );
m_pSwapChain->Present( 0, 0 );

Hey, simple!

Your ClearColor is a standard RGBA color — 0 is min, 1 is max. Now that we set our DeviceContext to write to the SwapChain back buffer, that clear command on m_pRenderTargetView clears our backbuffer to ClearColor, and we just tell SwapChain to present it.

THAT WAS EASY

GRAPHICS CARDS ARE SO FRIENDLY

DirectX Part 1: The “Initialize” Function

GUYS. I hate to break it to you, but normal programming on CPUs is for wimps. GPU programming is where you have to go to find fast cars / hot ladies. BUT THERE’S A PROBLEM: it’s hella hard to program for GPUs! Well. Until now, when I explain it all to you.

Since the dawn of computing, every line of code has run on your CPU, by default. Graphics cards only became a thing in the mid-80s, and even to this day, they aren’t really “standard” parts of a computer. What this means is that everything you want to run on a GPU has to be wrapped in APIs that very specifically say, “I want this to run on a GPU and it will be my fault if this computer has no GPU to run on”.

The two most common APIs to allow people to run code on graphics cards for the purpose of rendering pretty 3D scenes are DirectX and OpenGL. This is the first of many articles focusing on DirectX, although many of the concepts apply to OpenGL. Using the GPU to do non-rendering stuff, like cracking passwords, isn’t really DirectX’s strength and we aren’t gonna focus on it.

So, the entire scope of this article is the DirectX11 Initialize function. That’s a pretty small scope, but it’s a dense function, and it provides a great overview of what the API designers think you should care about re: your video card.

Anyhow, the Initialize function is called D3D11CreateDeviceAndSwapChain . More specifically, it’s:

HRESULT D3D11CreateDeviceAndSwapChain(
   _In_ IDXGIAdapter *pAdapter,
   _In_ D3D_DRIVER_TYPE DriverType,
   _In_ HMODULE Software,
   _In_ UINT Flags,
   _In_ const D3D_FEATURE_LEVEL *pFeatureLevels,
   _In_ UINT FeatureLevels,
   _In_ UINT SDKVersion,
   _In_ const DXGI_SWAP_CHAIN_DESC *pSwapChainDesc,
   _Out_ IDXGISwapChain **ppSwapChain,
   _Out_ ID3D11Device **ppDevice,
   _Out_ D3D_FEATURE_LEVEL *pFeatureLevel,
   _Out_ ID3D11DeviceContext **ppImmediateContext
);

( _In_ and _Out_ are compile-time hints indicating whether a function parameter is input or output. _In_ parameters can only be input (read from but not written to), and _Out_ parameters can only be output (written to but not read from) ).

ANYHOW WOW THAT’S A LOT OF PARAMETERS

The number of parameters is representative of DirectX’s overall API design — a design that assumes GPU programming is only for super-hardcore programmers. This is a self-fulfilling prophecy — the scariness of the DirectX API keeps everyone away who isn’t hardcore enough to handle reams of documentation — but it sucks, because GPUs are pretty mainstream now and many programmers could really add GPU skills to their arsenal. But that’s a topic for another time! Let’s cover these variables one-by-one.

_In_ IDXGIAdapter *pAdapter: “IDXGI” stands for “Interface to a DirectX Graphics Infrastructure”. Basically, a DirectX Graphics Infrastructure Adapter is anything that can handle displaying graphics. This includes video cards, integrated graphics chips on CPUs, or a CPU software renderer — if it can output images to screen, it’s a DXGI adapter. My dual-nVidia GTX560 machine has 3 adapters: one for each of my GTX560 cards, and one for the Microsoft Basic Render Driver that Microsoft falls back on if there are no video cards available. The value you pass in here will be the video card that DirectX gives you access to; pass in nullptr for the default video card (usually a pretty good guess).

_In_ D3D_DRIVER_TYPE DriverType: This is one of a pre-defined list of values, which lets you specify whether you want any commands passed through DirectX to go to the GPU hardware, or if you want to actually do a fake-out and emulate your commands in software. Chances are, if you’re using the graphics card, it’s because you want to make things go fast. So you want to use D3D_DRIVER_TYPE_HARDWARE. If you’re doing tricky enough stuff to warrant using another driver type, chances are, you’ll know about it.

_In_ HMODULE Software: This is only used if your D3D_DRIVER_TYPE above is “Software”, because this is a pointer to your software implementation of Direct3D that you want to use to debug stuff. Otherwise, it can be NULL. Again, if you’re doing tricky enough stuff that you need to pass a non-NULL value here, you’ll know about it.

_In_ UINT Flags: This is where you specify a bunch of low-level flags to modify behavior. If you’re only rendering from a single thread, or if you’re using the GPU for tasks that take lots of time to execute (i.e. cracking passwords instead of playing video games), or you want to debug stuff, there’s flags here that you may want to play with.

_In_ const D3D_FEATURE_LEVEL *pFeatureLevels: This is a pointer to the first element of an array of D3D_FEATURE_LEVEL items. “Feature Level” means “version of DirectX” — such as 9.1, 10.0, or 11.0 . In general, you want the highest feature level you can get, and newer video cards offer support for newer versions of DirectX. You’ll probably pass in an array like D3D_FEATURE_LEVEL featureLevels[] = { D3D_FEATURE_LEVEL_11_0, D3D_FEATURE_LEVEL_10_1 }; which means “Try for DX11.0 if possible, but if that fails, give me 10.1”. Alternatively, you can just pass in nullptr and it’ll set up whatever the highest DirectX feature level is that your adapter supports.

_In_ UINT FeatureLevels: The number of features in the above pFeatureLevels array. lol at the old-school C in this API, instead of just passing a vector or something you pass in a pointer-to-array-start and length-of-array. If you pass nullptr for pFeatureLevels, just set this to 0.

_In_ UINT SDKVersion: Just pass in D3D11_SDK_VERSION. That’s seriously what the documentation tells you to do. No choice, no explanation. Thanks, API designers.

_In_ const DXGI_SWAP_CHAIN_DESC *pSwapChainDesc: Okay, so what is a swap chain? Well, it’s a collection of frame buffers (frame buffers being the 1920×1080 RGB image that gets presented to your 1920×1080 monitor — a buffer containing the entire frame to display on-screen). Anything you want displayed gets drawn to one of the frames in this collection, and then once you’re done drawing to that frame it gets displayed on-screen and you start drawing to the next frame buffer in the collection. This structure tells DirectX things like how many frame buffers to create, how big they should be, and how they’ll be used.

_Out_ IDXGISwapChain **ppSwapChain: This is a pointer to the output swap chain generated by DirectX. Every time you want to display your drawn image onto the monitor, you’ll have to call pSwapChain->Present(...). So hold on to this!

_Out_ ID3D11Device **ppDevice: This is a representation of DirectX running on your graphics card. Things like memory allocation and status checks relating to the whole GPU are done through the D3DDevice. Hold on to this, too!

_Out_ D3D_FEATURE_LEVEL *pFeatureLevel: This just confirms the version of DirectX that your graphics card is able to run. Quit out if it’s lower than you want it to be.

_Out_ ID3D11DeviceContext **ppImmediateContext: This is the thing you actually care about — the “context” to the D3DDevice (as in, this is how you use the DirectX wrapper that is now lying on top of your video card). The “Immediate” refers to the fact that any command you send through here gets executed on the graphics card immediately. You use the device context to set shaders and draw 3D geometry, which is pretty much the meat of rendering.

So you came out of this with your swap chain, your D3DDevice, and your D3DDeviceContext. Cool! Next time, we’ll look at how to start using these items to — god forbid — draw a purple box on screen.

Going Down The List: RAM Stats Spec Sheet

So, we’ve already discussed CPUs, GPUs, and motherboards. But we haven’t discussed RAM! LET’S FIX THAT.

RAM stands for “Random Access Memory”. The name, like many in computer hardware, is antiquated and emphasizes aspects of memory that nobody cares about anymore. Although almost every component in a computer has some amount of random-access memory attached, most people are referring to the “main memory” in computers when they talk about RAM. For this conversation, we’re only talking about SDRAM, or Synchronized Dynamic RAM, which is the type of RAM used in desktop computers.

RAM only has two jobs: it holds a ton of data, and when the CPU asks for a specific piece of data, it finds that piece and returns it to the CPU as efficiently as possible (both in terms of bandwidth — megabytes transferred per second — and latency — time between the CPU asking for data and the RAM returning it). Although that’s not too much to do, there’s still plenty of jargon to analyze when discussing RAM.

We’re going to split this up into the EASY MODE terminology about RAM (self-evident information), and HARD MODE terminology (which requires understanding the nitty-gritty details of how RAM works).

EASY MODE

CAPACITY: This represents how much information can be stored in memory at once. More is better.

PIN COUNT: It’s the number of connections made between the RAM and the motherboard (if you count the little gold tabs at the bottom of your stick of RAM, that’s your pin count). Your motherboard only supports a certain type of RAM (usually 240-pin DDR3). Obnoxiously, two different generations of RAM can have the same pin count but not both be usable by the same motherboard, so make sure both pin count AND generation of DDR match between your RAM and your motherboard’s supported spec sheet.

DDR GENERATION (DDR1 vs DDR2 vs DDR3): Changes in generation represent major shifts in the internal design of RAM. Each generation is incompatible with the others, so you can’t install DDR2 RAM on a motherboard that supports DDR3. Later generations consume less energy and offer higher clock speeds (they also add more latency, but the higher clock speeds offset that). The “DDR” itself stands for “double data rate”, and it refers to the fact that all DDR memory can do two operations per clock cycle.

VOLTAGE: How much power it consumes. In general, DDR consumes 2.5V, DDR2 consumes 1.8V, and DDR3 RAM consumes 1.5V. You can buy RAM that doesn’t conform fully to this spec, but most RAM will match those numbers.

BUFFERED/REGISTERED RAM: Buffers/registers help the RAM during periods of prolonged access, and keeps RAM more stable, but it costs more money to purchase and adds latency. You don’t need it if you aren’t working on a server.

DDR[X]-[Y] PC[X]-[Z]: You’ll see this format appear on some RAM spec sheets. The “X” represents which generation DDR SDRAM is used (it will always be the same as the number after “PC”). “Y” represents the effective clock speed, or, how many operations it can perform per second. (Note that this number is actually double the real clock speed, due to DDR’s two-actions-per-cycle methodology). Finally, “Z” represents the maximum theoretical transfer rate in megabytes/second, or, its max bandwidth. Since DDR RAM transfers 8 bytes per operation, this number is always just 8 * clock speed in megahertz.

HARD MODE

As you may know, RAM only thinks of data in terms of addresses — for instance, instead of asking RAM what is the value of myInt, you’d look up the address of myInt, see that it’s 0x0f3c, and then ask the RAM for the data located at 0x0f3c. Well, internally, RAM memory banks are stored as a 2-dimensional table. So, once it receives that instruction, the RAM may internally split address 0x0f3c into 0x0f and 0x3c, in order to create the instruction read the data at row 0x0f, column 0x3c. Memory is stored in row major order, meaning that you read memory along rows, not along columns. With that in mind, let’s look at how to determine RAM latency.

TIMING: You’ll often see timing numbers that look like “A-B-C-D” or “A-B-C-D-E”. Each number represents the latency in performing certain operations. The smaller the numbers, the less latency, the better your RAM is. In order: A represents CAS Latency, B represents RAS to CAS delay, C represents RAS precharge delay, and D represents Row Active Time. If listed, E represents Command Rate. We’re going to define each term in a different order than it’s listed in the timing specs.

RAS AND CAS: These stand for “Row Access Strobe” and “Column Access Strobe”. Basically, it means “this latency appears when we look at a different row” or “this latency appears when we look at a different column”.

RAS PRECHARGE DELAY: Whenever you need to look at a new row of memory, you have to wait for a certain amount of clock cycles for that row to be prepared. If your current memory read is off the same row as your previous memory read, that row is already prepared and you don’t need to pay this cost. Referred to sometimes as tRP.

RAS to CAS DELAY: This represents the amount of clock cycles the RAM has to wait between defining which row to read and which column to read. Referred to sometimes as tRCD.

CAS LATENCY: This represents the amount of clock cycles the RAM has to wait after the address row and column are specified before it gets the data in that row/column back to send out. Referred to sometimes as tCAS. This is the most well-known source of latency, but really, RAS to CAS is just as important. So pay attention to your entire timing specs, not just a separately-listed CAS Latency spec.

ROW ACTIVE TIME: This represents the amount of clock cycles the RAM has to wait between activating a row and de-activating it in order to access a new row. Ideally, it should equal CAS latency + RAS-to-CAS delay + 2 clock cycles, or, the amount of time taken to read data from a row after it’s activated plus two clock cycles to push out the memory to the CPU.

COMMAND RATE: Often not represented in the timing numbers, because it’s not that important. Represents the time between activating a memory chip and it being able to receive its first command.

LATENCY: Your actual latency between deciding you want memory at a certain address and receiving it will vary depending on what memory has been accessed previously, but at worst, it is equal to RAS precharge + RAS to CAS delay + CAS latency (or, the sum of the first three numbers in your timing specs).

Motherboards: Where Are Their Ladyparts??

Continuing in a series of explanations of computer hardware, let’s look at motherboards! How do you tell motherboards apart? If you can hook all your hardware up to two motherboards, which one is better?

BASIC STUFF: THE MOTHERBOARD AS CONNECTIVE TISSUE

The most important thing about the motherboard is that it connects all the individual parts of your computer. The motherboard contains the wires that let data flow between CPU, GPU, RAM, HDD, your keyboard and mouse, etc. — it is the spinal cord of your computer.

When you’re buying a motherboard, the most important spec is the “socket type” of CPU that it supports. “Socket type” refers to the physical connection between the pins on the CPU and the sockets on the motherboard. A processor built for one socket type will physically not fit in a motherboard built to accept another socket type. Intel and AMD processors use different socket types. Furthermore, both companies create new socket types every few years, requiring motherboard upgrades to use the (presumably better) CPUs built to the new socket type. So, buying an Intel LGA 1155 socket-type motherboard locks you out of all AMD chips and all Intel chips older or newer than that socket type. Chances are, if you have to buy a new motherboard, it’s because new CPUs can’t work in your old one.

Number and type of expansion slots is the next most important thing. You may want to use many sticks of RAM or 2+ graphics cards, and some motherboards can’t support that. Furthermore, RAM, USB ports, hard drives, and GPUs are also built to standards that evolve over time. Although these standards don’t change as frequently as CPU chipsets (and Intel and AMD socket motherboards both support the same standards for all these things), you’ve still got to make sure that they’re supported.

Most desktop motherboards will support 240-pin DDR3 RAM, but different motherboards support different speeds of RAM (this is the 1066, 1333, 1600, etc.) Higher is better. Make sure to look up the number of memory slots too — Fewer slots of RAM isn’t a dealbreaker, but more slots is better. For instance, you can save money by buying 16GB of RAM as 4x 4GB sticks instead of 2x 8GB sticks.

You’ll also see PCI (“Peripheral Component Interconnect”) expansion slots listed on motherboard specs. These slots are where you insert specialized hardware as needed — audio cards, network cards, video cards, and even some SSDs use PCI slots. There’s multiple standards of PCI, and again, bigger numbers (and the word “express”) is better. Stuff that can afford to be slow, such as network cards, only need PCI. Nice sound cards and PCI-based SSDs will want PCI Express, which allows faster data transfer. Video cards are the only things that really need PCI Express 2.0+, since they transfer absurd amounts of data.

There’s other considerations like form factor (desktops are ATX), built-in audio/network/video cards (use expansion-slot cards if you can, but these get the job done), and HDD connectors (almost everything runs on SATA 6gb/s nowadays). These are all boring and don’t change very much, so we’re glossing over them.

GETTING FANCIER: THE MOTHERBOARD AS BRAIN

You can go out and buy a usable motherboard just based on the information above, and you’ll be fine. But stopping now is for losers! Motherboards are more than just the wires connecting components.

First, the motherboard contains the BIOS. BIOS stands for “Basic Input/Output System” (pretend like it means “Built-in Operating System” — Neal Stephenson’s idea — because that’s a better name). It’s a super-low-level system where you can see hardware stats and change them. This is the settings screen that you get when you hit F11 or Delete while booting, where you do things like RAID together hard drives, control voltage/timings, and specify to boot off HDD or CD. Honestly, 95% of motherboard manufacturers’ BIOS utilities feature 95% of the things you care about, so it’s not a factor in your motherboard purchase.

Less visibly, but more importantly, the motherboard contains the Northbridge and Southbridge chips. These chips manage communication between each part of your computer, and they are vital. The Northbridge manages access to high-importance / high-data-transfer-rate parts of the computer, like RAM and video cards in PCIe slots (also, the Northbridge is being phased out of existence, as more system-on-chips wrap the Northbridge into the CPU). Southbridges manage access to lower-importance / lower-data-transfer-rate peripherals in PCI, USB, or SATA slots (i.e. audio cards, keyboards/mice, slow hard drives).

Since a ton of data transfers happen that don’t involve the CPU at all (RAM <-> video card; USB stick <-> printer), and the CPU is usually held back by memory transfer speed anyway, the Northbridge and Southbridge play an important role: not slowing the CPU down by making it a middleman in these transfers. A good Northbridge and Southbridge are the caretakers that make your entire machine flow smoothly.

What’s the difference between a good and bridge Southbridge? the Intel Z77 chipset can communicate at 5GB/s with 8 PCIe 2.0 slots and 6GB/s with 6 SATA ports. However, Intel’s H61 can only handle 6 PCIe 2.0 slots, and only 4 SATA ports at 3GB/s — even if you installed a USB 3.0 card on a H61 motherboard, it wouldn’t run at full speed. (In general, ‘H’ means ‘budget’ and ‘Z’ means ‘performant’ for Intel).

These chips are the things that make one motherboard more expensive than another with the same slots — and although a budget Northbridge/Southbridge can hold you if you aren’t pushing the boundaries of your CPU, certain PC builds will see stunning increases by swapping out the motherboard alone.

On The History of Programming

We’re starting this post with a Zen Koan. Here it is, from The Gateless Gate, a 13th century Chinese compilation.

A monk asked the master to teach him.
The master asked, “Have you eaten your rice?”
“Yes, I have,” replied the monk.
“Then go wash your bowl”, said the master.
With this, the monk was enlightened.

Cool! We’ll get back to this.

If you are a programmer, and you care about programming, then you should study the history of programming. In fact, for most professionals, I’d argue that studying the history of programming is a better hour-per-hour investment than studying programming itself.

See, the history of programming is different from most other histories. While gravity existed before Newton, and DNA existed before Watson and Crick, programming is literally nothing but the sum of peoples’ historical contributions to programming. It is a field built from nothingness by human thought alone. It is completely artificial.

This leads to a useful fact: every addition to the world of programming, that gained enough traction to be in use today, was created to solve a problem. FORTRAN was created to make assembly coding faster on IBM mainframes. BASIC built on FORTRAN and allowed the same code to be run on different computers. C wasn’t actually based off BASIC, but it kept BASIC’s portability and instead focused on easier, more readable programming. C++ came after C and created object-oriented programming, allowing code re-use across tasks. Java expanded on C++, removing the need to recompile for different computer architectures.

Your favorite programming language is influenced by the contributions and thought patterns of every language before it.

By understanding the problems that birthed each new language, you can appreciate the solutions they offer. Java’s virtual machine was (and is) a Big Deal, the entire reason that Java exists. Learning Java without learning (or at least understanding) C++ robs that internalization from you. And don’t just learn about the languages that inspired your preferred language — learn about the offshoots and derivatives of languages you’re interested in, too. Each language highlights the deficiencies and tradeoffs of its parents and children, and learning Java can be just as useful to a C++ programmer as learning C++ could be to a Java programmer.

Even if you refuse to leave your favorite language, knowledge of its ancestors and children can make you recognize a language’s strengths and weaknesses, and tailor your program to match. And that is a valuable skill.

See, trying to master a programming language without understanding the context it arose from is like trying to understand a Zen Koan without understanding the context in which it’s meant to be read. You can’t.

Video Cards Have So Many Stats!

If you research video cards, because you’re buying one or something, you’re gonna see a TON of stats. And let’s be honest, you won’t understand all of them. This blog post will fix that problem! Maybe. Hopefully.

This is pretty much an info dump of all stats mentioned in NewEgg, AnandTech, and TomsHardware listings. Stats are split up by general category, whether they affect the video card EXIST AS A HUNK OF METAL, or MOVE DATA AROUND, or DO CALCULATIONS.

THESE MAKE THE VIDEO CARD EXIST AS A HUNK OF METAL

MANUFACTURING PROCESS: Measured in nanometers. This measures how small the semiconductors in the video card are (semiconductors are the building blocks of, like, all electronic devices). The smaller the semiconductors, the less heat/electricity they consume, and the more you can pack on a card.

TRANSISTOR COUNT: Transistors are made of semiconductors, so transistor count is inversely proportional to manufacturing process. Again, more transistors = more better.

THERMAL DESIGN POWER (TDP): Measured in watts. Measures how much power the video card expects to consume. Most overclocking software lets you increase wattage beyond TDP, but you’ll need to upgrade the stock fans to dissipate the extra heat, and you probably won’t get as good performance as just buying a card with a greater TDP. TDP should be close to load power, or how much power the card consumes when running Crysis or something. Most video cards have TDPs in the 200W range — which, for the record, is beastly, 3x+ the power of a good x64 CPU.

THESE MAKE THE VIDEO CARD MOVE DATA AROUND

PHYSICAL INTERFACE: The physical part that hooks in to the motherboard and lets data move between your video card and your motherboard. Whatever your video card’s interface is, make sure your motherboard has a slot of that interface type. New video cards are usually PCIe 2.0 x16 (which can transfer 8 gigabytes / second) or PCIe 3.0 x16 (15 gb/s!). Transfer speeds of 15gb/s may sound like overkill for a 4gb video game, but in addition to textures, etc., the computer is sending a LOT of data about game state to the GPU 30 times a second, so it’s needed.

RAMDAC: Stands for “Random Access Memory Digital-to-Analog Converter”. It takes a rendered frame and pushes pixels to your monitor to display. The DAC isn’t used if you’re using digital interfaces for your monitor (like HDMI), and the information held in the RAM isn’t used in modern full-color displays. So everything about the name ‘RAMDAC’ is outdated. Most RAMDACs run at 400MHz, which means it can output 400 million RGB pixel sets per second, enough to drive a 2560×1600 monitor at 97fps. Probably good enough for you.

MEMORY SIZE: How much data the video card can store in memory. Although video cards can communicate with the main computer and therefore save/load data in the computer’s RAM / hard drive, memory that resides inside the video card can be accessed with less latency and higher bandwidth. Bandwidth is one of the biggest bottlenecks (and therefore one of the most important measures) for graphics cards.

MEMORY TYPE: Probably GDDR 2/3/4/5. ‘DDR’ stands for “double data rate”, because DDR memory performs transfers twice per clock cycle. The ‘G’ stands for ‘Graphics’ — since memory access patterns differ between GPUs (who want lots of data / can wait for it) and CPUs (who want little data / can’t wait for it), GDDR memory and computer DDR memory went down separate upgrade paths. Higher numbers represent new architectures that allow more memory transfers per clock cycle.

MEMORY INTERFACE: Measured in bits. Represents how much data is carried per individual data transfer.

MEMORY CLOCK: Measured in MHz/GHz. Represents how many memory-transfer cycles occur per second (although more than one memory-transfer can occur per cycle). Sometimes you’ll see “effective memory clock” listed, which means “real clock speed * number of memory transfers per clock cycle afforded by our memory type”.

MEMORY BANDWIDTH: How many bytes of data can be transferred between the memory on the graphics cards and the GPUs themselves, per second. Measured by clock speed * interface size * [4 for GDDR2, 8 for GDDR3 or GDDR4, 16 for GDDR5]. Or, effective clock speed * interface size. This is one of the most important numbers for comparing graphics cards.

THESE MAKE THE VIDEO CARD DO CALCULATIONS

CORE CLOCK: Measured in MHz/GHz. Represents how many computation cycles occur per second. If you see references to the shader clock — it’s tied to the core clock.

BOOST CLOCK: Measured in MHz/GHz. If your GPU detects that it’s running at full capacity but not using much power (which happens when you’re not using all parts of the card — i.e. GPGPU computing that doesn’t render anything, or poorly optimized games), it’ll overclock itself until it consumes the extra power. It may overclock itself to a frequency below or above boost clock frequency, based on how little power it’s using, so boost clock is a nebulous measurement. Although only Nvidia uses the term ‘Boost Clock’, AMD offers the same controls, called ‘PowerTune’.

SHADER CORE COUNT: Sometimes called “CUDA cores” for Nvidia or “Stream Processors” for ATI. It’s the number of cores, similar to processor count in CPUs. Note that, compared to CPUs, GPUs generally have 100x the cores at 0.3x the clock speed (this is still, obviously, a win). That architecture means GPUs ideal for running the same operation thousands of times on thousands of different sets of data, which is exactly what video games need (for instance, figuring out where every vertex in a 3d model is located on screen).

TEXTURE UNITS: Also called texture mapping units or TMUs. Video games have 3D models and need to apply textures to them. However, there are many different issues that arise when you try texturing a model at an angle, or far away, or super-close up. When your GPU asks for a given pixel in a given texture, the texture unit handles that request and solves all these problems before passing it back to the GPU. More texture units means you can look up more textures per second!

TEXTURE FILL RATE: Measured as number of texture units * core clock speed. Represents the number of pixels in textures that the card can lookup every second. If you’re playing a game where every object on screen is textured (which is most games), this should be higher than the resolution of your screen * desired framerate, because one on-screen pixel can be determined by many textures (i.e. diffuse + specular + normal).

ROPs: Stands for “Raster Operators”. These units receive the final color value for a given pixel and write it to the output image, to be passed to the RAMDAC to be rendered on your monitor.

PIXEL FILL RATE: Measured as number of ROPs * core clock speed. Represents the number of pixels that can be written to the output image to display on your monitor, per second. This also has to be higher than screen resolution * desired framerate, because one on-screen pixel can be determined by the output color of many 3d models (i.e. looking at a mountain through a semi-transparent pane of glass requires 2 ROPs per pixel, one for the mountain, one for the pane of glass in front of it). If you see “fill rate”, it usually refers to this instead of texture fill rate.

FLOPs: Measured in megaflops/gigaflops. Stands for “floating point operations per second”, and represents how many times a video card can multiply/divide/add/subtract two floating point numbers, per-second (most video card calculations are done with floats instead of integers).

Whew! So that’s a pretty intense crash course in video card specs. Hope that helps!

C++: When should you use copy constructors?

LET’S TALK COPY CONSTRUCTORS.

Well, first, let’s do a refresher course.

1. Copy constructors are when you construct a new instance of a class by giving it an already-constructed instance of that class to clone. MyClass otherMyClass = thisMyClass; and MyClass otherMyClass(thisMyClass); are the two ways to call copy constructors. At the end of copy construction, otherMyClass is theoretically the same as thisMyClass.

2. If you don’t explicitly create a copy constructor for your class (the footprint to use in your header is MyClass(MyClass& copyFromMe) ) then C++ will always default-create one. This auto-generated default copy constructor just calls the copy constructors of all member variables. The default copy constructor for primitive types (int, float, etc.) is just a data-copy.

3. Default copy construction of pointers is a shallow copy — you copy the pointer, but not the data it points to.

4. Copy constructors can be called implicitly. When you pass-by-value into a function, what that actually means is “copy-construct clones of the variables you want to pass in”. void DoStuffWith(MyClass thisMyClass) implicitly does a copy-construct on thisMyClass every time you call it.

5. (A sidebar, but useful to know) MyClass otherMyClass = thisMyClass; and MyClass otherMyClass(thisMyClass);, despite looking different, both call your defined MyClass(MyClass& copyFromMe) function. MyClass otherMyClass = thisMyClass; doesn’t call your operator=(MyClass& copyFromMe) function because that function is meant to be used on already-initialized instances of your class (copy assignment, not copy construction). MyClass otherMyClass; otherMyClass = thisMyClass; will default-construct otherMyClass and then call operator=.

So: when should you use copy constructors? Well, only when you really know what you’re doing, and you probably shouldn’t anyhow.

Let’s see by example, using the two classes below. They both hold a lot of memory; one statically allocates it, one dynamically allocates it.


class MyStaticAllocMem {
public:
   int m_iBuffer[1920*1080];
};

class MyDynamicAllocMem {
public:
   MyDynamicAllocMem() { m_pBuffer = new int[1920*1080]; }
   ~MyDynamicAllocMem() { delete[] m_pBuffer; }

   int* m_pBuffer;
};

void ActOnStaticAllocMem( MyStaticAllocMem staticMemValCpy ) { /* do stuff */ };
void ActOnDynamicAllocMem( MyDynamicAllocMem dynamicMemValCpy ) { /* do stuff */ };

void main()
{
   MyStaticAllocMem staticMem;
   MyDynamicAllocMem dynamicMem;

   ActOnStaticAllocMem( staticMem );
   ActOnDynamicAllocMem( dynamicMem );
}

When you call ActOnStaticAllocMem( staticMem ), the copy constructor generates a clone of staticMem to use inside the function. This means it stack-allocates another 1920*1080 ints and copies over all 7.9mb of data. This is very, very slow. Plus, if one stack-allocated MyStaticAllocMem is enough to make you worry about stack overflow, well, now you have two!

Still, your program can copy-construct static-allocated memory and survive. It’ll run significantly slower and be more prone to certain errors, but it runs. The same can not be said about copy-constructing dynamic-allocated memory.

When you call ActOnDynamicAllocMem( dynamicMem ), the copy constructor generates a clone of dynamicMem to use inside the function. This only stack-allocates the memory for one pointer, which isn’t scary — what’s scary is this:

  • When ActOnDynamicAllocMem returns, its function-scope variable dynamicMemValCpy goes out of scope, calling dynamicMemValCpy‘s destructor
  • Because the default copy constructor is a shallow copy, dynamicMemValCpy has a pointer that points to the same location as your dynamicMem back in main() points to
  • dynamicMemValCpy dutifully deletes its m_pBuffer, and the function returns, leaving you back at main() with dynamicMem pointing to unallocated data
  • The next time you do anything with dynamicMem, the world explodes.

There are plenty of workarounds to these issues, of course. You could use reference-counted pointers, or you could pass everything by reference or const-reference or by pointer (you really should do that). But it’s so easy to accidentally write void MyFunc(MyClass val) instead of void MyFunc(const MyClass& val), and the compiler will never complain.

Now, there are times where copy constructors are super-helpful — you may legitimately want a throwaway clone of a given object. Then, you can mess with the clone’s data as much as you want, and there are no repercussions after the function exits. Copy-constructing a class with sizeof(myClass) less than sizeof(void*) may also be faster than passing a pointer or reference (I totally haven’t tested that, though).

And if you don’t want to use copy construction, you can disallow it on a per-class basis using private: MyClass(const MyClass&).

But still, given how easy it is to implicitly copy construct, and given the amount of ways implicit copy construction can kill you / the amount of engineering needed to ensure it doesn’t, I’m surprised it’s implemented in C++ as an opt-out feature instead of an opt-in feature.

TL;DR — I do not like copy constructors.

Conditional Probability: PROBABLY Awesome

Forget computer architecture, I want to talk about statistics.

LET’S SAY that you invent a test that determines whether someone is in the Illuminati, and it’s 99.9% accurate. Now, let’s say 0.1% of the world’s population is actually in the Illuminati. Someone takes the test, and it claims they’re Illuminati. What are the odds they’re really a member of a shadow government?

Well, gee, it’s like a 99.9% chance, right? Or, 99.8% or something to account for inaccuracy? No. It’s 50%, as good as a coin flip. So, congratulations. Your 99.9%-accurate test is worthless. And if you want to know why, look to conditional probability.

See, until the 1700s, statistics wasn’t much further along than “lol coin flips”. Then, in 1763, Bayes published an essay that asked (and answered) a new question: how do statistics change if the probability of an event depends on the probability of another event? Like, the probability of a test saying you’re in the Illuminati depends on the probability of you actually being in the Illuminati.

This whole sub-branch of statistics is called conditional probability — as in, the probability something happens given the condition that something else happens. It’s great, because the math involved is simple, but some of the results throw our human intuitions for a loop.

Statistics textbooks usually start abstracting out from here and it gets easy for your eyes to glaze over, but as I said, the math is simple:

ARE YOU ILLUMINATI? WHAT DOES THE TEST SAY? CHANCE OF HAPPENING
TOTALLY, SHADOWS AND STUFF (0.1%) DAMN DUDE (99.9%) 0.0999%
NAH BRAH (0.1%) 0.00001%
WHAT? NO (99.9%) DAMN DUDE (0.1%) 0.0999%
NAH BRAH (99.9%) 99.8001%

What are the chances you’ll be told you’re Illuminati correctly? (0.1% chance you’re Illuminati * 99.9% chance test calls it) = 0.0999% chance.

What are the chances you’ll be told you’re Illuminati incorrectly? (99.9% chance you’re boring as shit * 0.1% chance the test is wrong) = 0.0999%, or, the same damn likelihood.

IN CONCLUSION: if any of you think I’m part of the Illuminati, no matter how much evidence you have, you’re probably wrong. Hint hint.