Before we start, I’m going to make a distinction between a change and a deployment. You can’t have the latter without the former, so for this article, I’m going to concentrate on the characteristics of a good change.
Yes, Captain Pedantic, I know you can do a no-op deployment of the same artifacts currently running in production — I’m not talking about those. There’s a whole realm of thinking around deployments that I’m not going to cover here in this quick blog post.
What’s in a Good Change?
A good change has three primary characteristics:
It’s easy for humans to reason about. You’re adding a feature, or you’re taking one away. You’re removing an unused code path. Things like that. Not “we modified a core code path to do something a little different under some vague set of conditions.”
The simplicity of your change is often a reflection of the elegance of your system’s design. It depends on the strength of your system’s contract with your users. What does your system do? Does it do that, and only that? Are there unsupported code paths because you were in a hurry (or lazy) while building it? Did you bolt something onto it that should be in a completely different system?
A simple change is like a simple function: it changes one thing.
We understand the impacts of this change. Our monitoring is setup to observe its effects. Better yet, our monitoring is setup to observe pre-change. We should be able to tell right away when our change has taken effect, and how well it’s doing. We should know the exact instant when it’s NOT going well. Our customers should not find that for us.
Things go wrong. Often. Cloud computing has the wonderful side effect of emergent system properties. We don’t know what we don’t know. On top of that, humans suck at predicting complex systems. We didn’t evolve to be good at this stuff.
So assume things are going to go wrong with your fancy new change. The best change is the one you can erase. Real fast.
Solve for: “Oh my God, put it back to where it was, before you touched it.”
A good change is one that doesn’t involve heroics to undo.
That’s it. Three things. Simple, measurable, erasable. It’s all about reducing the risk of system instability.
If your change is missing one of these three qualities, then that risk increases. Missing two? High risk.
Missing all three? You’re going to break your system. You likely won’t know when you broke it, how you broke it, or - and this is the most critical part - how to fix it in a hurry.
Think of it this way: good changes protect your system from you.