INSIDE SOCRATIC

Unstuck: finding flow in software engineering

Brad Hipps

2-5-2022

The cumulative flow diagram is famous for illustrating engineering flow efficiency—that is, how smooth our work throughput is. It’s famous for a reason. The CFD is a pretty ingenious use of an area graph to show, at a glance(ish), if and where work is getting bottlenecked.

I like CFDs. They look cool. But to tell the truth, I’m never quite sure what to do with the information.

The problem is that they can’t tell me why work is bottlenecked. There are no diagnostics. A CFD may work to spot problems; it doesn’t work to fix them.

Flow efficiency (re)considered

The principle of flow is straightforward. Work begins in some waiting state (e.g. To Do), and moves through whatever your phases (In Development, Testing, etc.) until it’s done. The ideal flow is a simple left-to-right through our phases, as smoothly as possible.

What I want to understand is, how much of my work is in that idealized flow state—moving from left to right on our board, barely a pause between handoffs, the baton passing from person to person and then across the finish line like some glorious Olympic relay—and how much isn’t, um, in that exalted state. And why.

What gets in the way of Olympic flow? Actually, our interruptors are pretty common. There are four. Four interrupted states that work can tumble into:

Deprioritized: work that we started but then stopped, usually in favor of some higher priority.
Rework: work that goes backward—that is, from right to left—on our board. Think, for example, work in a test phase that’s returned to a development phase for refinements or fixes.
Blocked: usually, ****awaiting some other piece of work to finish, or maybe for someone to make a decision.
Idle: as in, ****that’s all we know, man. Nothing’s happening...

Now, we know the reality. All of these interrupted states are a natural part of software engineering. (In fact, would you trust an organization that didn’t have any rework? Everything was always perfect, first time out?) The aim isn’t—can’t be—zero interruptions. It’s really just being able to quantify the amount of interruption, so you can see any spikes, and take action.

Making the invisible visible

With Socratic, flow efficiency is derived by the movement of tasks on the board. As work moves forward, backward, or stalls out, we surface in real-time the impact on flow.

In the example below, I can see that the number of tasks in an interrupted state have crept above 30 percent. (The dark blue represents tasks in a flowing state. The brighter colors represent interrupted states.) Thirty percent of work in an interrupted state feels... high.

At a click, I can spot which interrupted state(s) need attention. In this case, deprioritized tasks are the chief offender: 21 percent. This gives me a kind of early warning indicator. We need to keep an eye on the amount of work we spin up, only to see it superseded by something else.

But to properly understand flow, we’re interested not only in the quantity of work interrupted, but the time involved. That is: the amount of active work time that’s spent in a good, flowing state versus time spent in interrupted states.

We can take that a step further. What really interests us is knowing how our efficiency is changing over time. That is: how does the amount of time work spends in some exception state compare to, say, a month ago?

To understand this, I use Socratic Trends.

In this example, I want to see how our flow efficiency over the past week compares across the prior four weeks:

It’s an encouraging picture. Over the past four weeks, our efficiency—percent of work time spent in good, flowing state—has risen 30 percent, from 63 percent (not great) to 93 percent (maybe unsustainably good)! Sure enough, our biggest improvement was by reducing time on tasks that were subsequently deprioritized.

With Socratic, the idea is to use data to make the invisible visible. Good flow is essential to good engineering. But “seeing” it is hard. Having a way to surface how well work is flowing, to spot outliers and take action before things really start to pile up, and to be able to demonstrate to your teams and other stakeholders how efficiency is changing—all of this is a way of leveling up how software gets built.