MYTH BUSTING

Agile metrics that underwhelm

Brad Hipps

11-21-2021

Look, broadly speaking, Agile has been a force for good. The Agile Manifesto was essential to demolishing the plodding, box-checking practices and artifact bloat that had come to define software development. The success of the Manifesto speaks for itself.

In the two decades since, tooling and automation have put Agile principles on steroids. Depending on your mileage, these tools have either increased the speed at which teams work, or reimposed the kind of ideological adherence to process that Agile meant to explode.

(Every revolution eats its young.)

What hasn't increased much is the way we understand engineering work: how we measure it, how we explain it, how we get better.

So, I'm not here to bury Agile. I would like to bury a few agile metrics, though.

Three things that make a metric useful

Not that long ago, I listened to a consultant talk about software engineering OKRs. He listed a bunch of key result-style measures to help teams improve in categories like quality, stability, etc.

Predictably, he trotted out some of the agile staples:

  • Velocity: Average story points completed in a sprint.

  • Burndown: Rate of story point completion over a sprint.

  • Story point variance: Variance of points completed across all sprints.

All of these sound reasonable. Certainly, we're familiar with them. And it may (?) be fair to say they're better than nothing.

But they don't really get us where we want to go.

Why not? To be useful, a metric should be:

  1. Discoverable: The metric shouldn’t require hours of work and/or an army of people to build macros and comb spreadsheets. If it does, it'll soon collapse under its own weight.

  2. Actionable: The metric should make obvious what needs attention. If a metric starts any kind of debate about what it means and what to do next, it flunks.

  3. Consequential: We want things that actually move the needle, that show not only to ourselves but to the broader business how things are going.

Given this rubric, let's put these three metrics—velocity, burndown, story point variance—to the test.

Death by story point

Story points assign an estimate of effort to work, without resorting to units of time, like hours. Essentially, story points abstract away the time element of an estimate. This was intended as a feature, not a bug. Story points were created to short-circuit debates between engineers and stakeholders around how long something would take.

But we've all lived with the practical effect of this. Without a common unit of measure, a story “point” is worth whatever a team says it’s worth. This makes story points unusable as shared currency.

As Ron Jeffries, an originator of story points writes, “If [different teams] look at two stories that seem the same, and one team says it’s a two and the other one says it’s a six, that’s just not very interesting, and it’s certainly not a useful way to compare teams."

Bullseye.

Given how squishy “points” are, making them the basis of any kind of productivity metric is dead on arrival. You certainly can't—or shouldn't!—use them to understand how one team compares to another.

But strip out the comparison use case. Can't we at least use story points to understand a single team's productivity, in isolation?

Not really. Points are whatever the team members agree they are at a given period in time. If a team averaged 20 points per sprint last quarter and 18 this quarter, what does this mean? Are they losing efficiency? Or has there been a change in people, or their interpretation of a point?

There's one other slight problem with story points. They mean nothing to the business. The business understands work delivered, whether by release, feature, or bug fix. Which means that this unit of measure we (engineering) are counting and tracking isn't much help in showing our contributions to the broader business. It might as well be cryptocurrency.

Going back to our rubric above, velocity, burndown, and story point variance are discoverable, yes. But they don’t clear the bar for being actionable or consequential.

Granted, it’s easy to be a critic. The real question is: If these agile metrics fall short as reliable indicators of engineering productivity, what should we use instead?

Here are three to start with.