Intro: Did History End with Agile?

Maybe this rings a bell…

It’s quarterly planning time. You and your peers huddle online with a shared spreadsheet, roughing out the primary objectives and/or key product roadmap items, discussing priorities and dependencies, guesstimating size of work and engineering capacity.

At last consensus is reached. The chief goals of the quarter are clear. There exists some back-of-the-napkin math around scope and capacity. It’s time for more detailed planning. Engineering takes over, breaking work down in the task system, assigning story points, packing sprints, and so forth.

So. Work that began life as, say, an OKR in a Google Sheets row has splintered into hundreds of issues across tens of sprints in Jira. Now the real questions begin…

  • “How long will it take to finish?” Hard to know. We’ll poll some folks for their gut feel. Story points sure won’t tell us—they were designed to obscure the time aspect in the first place!
  • “Are we making good progress?” Well, there are lots of tickets assigned, and plenty of them have been started. Does that count as progress?
  • “What’s at risk and why?” I’ve asked around, and summarized what I heard on this slide. See the red, yellow and green circles. (I hope this actually reflects reality…)
  • “How are teams doing?” Hm. Does story point velocity mean anything?
  • “Where can we get better?” Define ‘better’. As compared to what? What are the benchmarks?
  • The Jira-Agile industrial complex has resulted in lots of jobs for Scrum Masters and Jira architects, but somehow hasn’t made engineering’s life much easier.

    It hasn’t materially improved the way we work—not since Ye Olden Days of waterfall, anyway.

    It hasn’t changed a perception that engineering is a closed-off, opaque operation, a secret society within the business, requiring specialized acts of information extraction and dubious translations for non-technical stakeholders. (Try saying “story point velocity” in an e-staff meeting, then watch everyone’s eyes roll back in their heads.)

    Things don’t need to be this way. They shouldn’t be this way. And if you’re looking for a better way, you’ve come to the right place.

    This guide looks at how we can graduate from empty ceremonies and make-work jobs, which have held sway over so much of engineering in the last two decades. Spoiler: it’s not so much about newfangled methods. (Ours borrow heavily from tried-and-true Lean.) It’s about supercharging these methods with data science and AI. The result is a modern way for engineering teams to work—one that’s simpler, faster, and easier.


    Value to Customers (Or: Finding Your Flow)

    When it comes to software engineering, we hold to three principles:

  • The work of software engineering should be driven by objectives. An “objective” is anything that adds value for your customer.
  • Engineering’s job is to move objectives from idea to working software as efficiently as possible. Think of this as flow: how well do our objectives flow from beginning to end? How do we find and unblock the areas where flow is interrupted?
  • Continuous improvement—knowing, and being able to show, what the organization does well and what needs attention—is fundamental to healthy, motivated engineering teams, and strong relationships with business stakeholders.
  • To understand how well we’re working, what we really want to know is: How well do objectives move from start to finish? How long does it take on average to get these objectives prioritized and built? Where are the bottlenecks in our workflow, and how do we eliminate them?

    Put another way: How do we move value into customer hands, from request to delivery, as quickly and efficiently as possible?

    Broadly speaking, this means a few things for the way we plan and build…

    Objectives should respond to actual customer demand

    We want a short list of objectives at any given time, stack ranked by priority. We’re not interested in a bottomless backlog packed with every theoretically bright idea we’ve ever had. Backlogs aren’t parking lots. We want to be a value-delivery machine, not an idea-generation machine.

    It’s not about how much you start, but how much you finish

    We care about how much work we finish, not how much work we start. Here you’re thinking, Thanks for the tip, Einstein. But the truth is, most organizations prioritize starting work. That is, there’s a strong pressure on engineering teams to get going on something that’s been deemed a priority. This involves negotiation with those teams and their resources to evaluate who can take on what, when.

    The norm in most organizations is a lot of top-down planning, which is then “pushed” out onto engineering leaders and their teams, with the mandate to get going on as much as possible, as quickly as possible. In practice, this means lots of task creation, bloating team backlogs, and whole bunches of assigned tasks for every team member.

    We all know the comfort of being able to confidently report, “Work has started on [X].” But it can be a false comfort, and set up false expectations. Just because work has started doesn’t mean it’s going to be finished any time soon. Still, in the moment, it sounds better than saying, “We haven’t started that yet.”

    At Socratic, our topmost objectives are “pulled” by engineering teams into the development cycle as they have capacity to work them. In other words: the teams decide when they’re ready for more work, and pull it accordingly. The focus is on completing work as soon and as efficiently as possible—speed and efficiency being two key measures we surface in Socratic. We don’t let new work begin until sufficient work has been finished and the team has capacity. This is because…

    By doing less at once, you get more done

    How do you know when enough work has been finished, and new work can be taken on? Through the use of Work in Progress (WIP) limits.

    With WIP limits, you establish a ceiling on the amount of active work any one person (or team) can have at a time. Only when a person’s workload falls below the WIP limit can new work be pulled. The point is to let people finish what’s on their plates before adding more. WIP limits recognize a central truth: namely, that trying to do too much at once only leads to less actually getting done.

    Why this “pull” model? Because people and teams get more done, more efficiently, when they aren’t asked to spin too many plates at once. When they can focus on finishing one thing before starting the next. When they can focus attention on a few things instead of spreading it across many.

    Truth (and transparency) in data

    Now, the above doesn’t really work if you can’t answer the primary questions we started with. Namely:

  • How well do objectives move from start to finish?
  • How long does it take on average to get these objectives prioritized and built?
  • Where are the bottlenecks in our workflow, and how do we eliminate them?
  • This is what Socratic is for: to answer these questions (and their correlations) instantly and automatically, without you having to search, dig, debate, spelunk and spreadsheet.

    In place of opaque, ceremonial measures (see e.g. story point velocity), we favor actionable data that can be understood by engineers and non-engineers alike. That means things like cycle time, throughput, and flow. More on this in Chapter 4.

    Leaning on Lean

    If you’re familiar with Lean thinking, or value stream management, you’ve been reading the above and saying to yourself, Boy, this sure sounds familiar. And you’re right. These are Lean principles, through and through.

    There are a few things we like about Lean, and the kanban / pull model:

  • It emphasizes throughput—actual work finished—instead of starting lots of stuff and declaring progress.
  • It empowers people (and their teams) to keep work flowing. This is because the decision to begin new work rests with them, based on their WIP limit. Everyone becomes an owner of good flow.
  • It helps keep people sane by allowing them to focus on completing what’s on their plates before picking up more work.
  • It favors a pragmatic way of understanding—and showing—work health, using actionable measures that anyone can understand: cycle time, throughput, and flow.
  • There’s less management overhead. A manager’s job is concentrated on helping to relieve bottlenecks in flow and grooming the To Do queue, not on evaluating capacity or deciding who should take on what work. The team members take care of that themselves.
  • Are there limitations to the pull model? Sure. It works best in projects or environments where just about any team member can take on any of the work. Because the next priority item is to be taken on by whomever has capacity next, the pull model presumes a pretty seamless fit between the types of work needed and the skills of every member of the team. (We’ve seen some customers address this by using task labels or even dedicated workstreams to separate say, general full-stack development work from more specialized work, like mobile app development.)

    But truthfully, the biggest “limitation” may be that the pull model is a change from the “push” model familiar to so many organizations.

    In the push model, a bunch of work is planned, based largely on assumptions that customers will want it. That work is then assigned out—pushed—to development team members, after negotiations with those teams over how quickly the work can start. The pull model is a different way to work, one that depends on educating and building up trust with business stakeholders. A good deal of that trust stems from being able to show, in real time, how well your work is flowing, and to identify and solve bottlenecks.

    The good news is: this is where we come in. Socratic gives you the data you need, right out of the box, to show everyone how work is moving, where the bottlenecks are, and why. It’s the kind of visibility and transparency that trust and partnership are made of.


    Making Flow a Reality

    Enough with broad principles. Let’s look at an example: us.

    As mentioned earlier, we organize our product ideas according to what will add value to customers. These value-adding ideas are captured in Socratic as objectives.

    We organize our objectives in a plan. Our plan is called, straightforwardly enough, “Product Roadmap.” Within this plan, we define the work phases each objective will move through. This represents our end-to-end product development workflow—essentially, how customer value moves from an idea or request to delivered software.

    The big picture

    Following are the work phases on our Product Roadmap. Note that when deciding your phases, it pays to make your “wait” phases explicit. A wait phase may exist anywhere that the work of the prior phase(s) are complete, and the objective is just waiting to be pulled by the next available person or team.

    One example in our world is “Ready for Build.” Objectives move here when all upstream work (prioritization, design, etc.) is complete. As the phase name suggests, these objectives are good to go for engineering.

    The goal is good flow from start to finish. Among other things, this means that work spend as little time as possible in any wait phase. Socratic’s data shows us average time by phase, and how these averages are changing over time. This allows us to spot rising bottlenecks, take some action—and then to use the same data to verify the action had the intended effect.

    1. Backlog

    Objectives still awaiting prioritization. We aim to have no more than three objectives per person in backlog at any given time. With a team of ten engineers, this means no more than 30 objectives total in backlog.

    This begs the question, But what if you have more than 30 mind-blowing ideas? Easy. If we’re going to add one, one has to come out. Again, Work-in-Progress (WIP) limits matter. They’re there to help us stay focused on throughput—on actually completing work—rather than documenting every idea under the sun and then pushing to start on as many as possible.

    The specific WIP limit isn’t as important. The real magic is in picking a limit, and sticking to it.

    One other note. Our objectives are generally of a size that we have a single developer for each. If our objectives were larger, such that we had the whole team working on each, we would set our WIP limit accordingly—i.e. no more than three objectives total in the backlog. Your WIP limit should match to whatever unit of delivery (a person or a team) makes sense for you.

    2. Prioritized

    Objectives prioritized for development. Meaning, we’ve done whatever analysis and preparatory work is needed, and have decided these are the objectives we want to deliver next. Naturally, their order on the board corresponds to priority.

    Our WIP limit for the Prioritized phase is twenty objectives. That’s two prioritized objectives per person. As always, the specific limit you choose is less important than abiding by it.

    3. Design

    Prioritized objectives are pulled for design as the team has capacity. Depending on the nature of the objective, the design work may include a conceptual design prototype that we show to users for feedback. Other objectives can go straight to tactical design. Yet other objectives can skip the design phase altogether.

    We haven’t yet seen a need to formalize these as independent phases (e.g. Conceptual Design, User Review, Tactical Design), though we may down the road. Why? The trigger would be a need for more granularity in understanding time in phase. We use Socratic Trends to analyze bottlenecks in workflow (more on this later). If our Design phase were to surface as a bottleneck, we might decide that breaking it into its piece parts—conceptual, user validation, tactical—would help to diagnose the nature of the bottleneck, and suggest the changes to improve.

    But so far, so good.

    4. Ready for Build

    Just what the name suggests. We’re design-complete and ready for development. As always, objectives are stack ranked in the phase. The WIP limit here is twenty (two per person). As engineering capacity frees up, they pull the topmost objective for further definition and development.

    5. Build

    Generally we want no more than ten objectives in development at a time—that is, one per person. Occasionally two objectives are closely enough related that it makes sense for one person to work both simultaneously. But generally we aim for each person to complete an objective before pulling the next.

    6. User Validation

    An optional phase, depending on the objective. Sometimes, an objective introduces a feature that we first provide to a subset of users for feedback. For instance, when we introduced GitLab integration to Socratic, we feature-flagged the capability to certain GitLab users for feedback on how it performed “in the wild.”

    7. Done

    Okay, no WIP limits here. The W is no longer IP!

    The work of the workstream

    The above is how larger units of value, objectives, make their way into customer hands. This is a matter of objectives organized in a plan.

    The detailed work within each objective, its tasks, are organized in one or more workstreams. Just as objectives follow the phases of a plan, tasks follow the phases of a workstream.

    Once the team pulls an objective, the first order of business is to break down the work into the tasks required for its development. We have two primary workstreams, “Design” and “Product”, where tasks and their workflow are defined.

    Of course with Socratic, all the detailed work in the workstream is reflected automatically back into the objectives of the plan. We can see how each objective is progressing, how scope is changing, throughput, speed, flow efficiency and the like—instantly and automatically. In this way, our plan isn’t some point-in-time thing now gathering dust, but a living, breathing reflection of the daily realities of engineering.

    A few additional notes on our process…

    No estimates

    Estimating, whether by story point, Fibonacci number, or wet finger in the air, is a titanic waste of engineering time.

    With the pull model, what we care about is the speed at which tasks move from start to finish—that is, their cycle time. Socratic surfaces this automatically. We also build a forecast to complete, based on historical actuals: meaning, when someone wants to know when something will be finished, we have more than just gut feel behind our answer.

    For our Product workstream, we generally like to see our average cycle time per task at seven days. (Other workstreams, like Design and Go-to-market, have different targets. Different work, different workflows, different expectations. That’s natural!) If we notice our average creeping up, that’s a cue for us to to break work into smaller units. This is where we want engineering brainpower focused: figuring out how to decompose work into its most logical, granular elements, not on guesstimating arbitrary, entirely invented “points” that mean little to anyone.

    Personalized WIP limits

    For each person working in the Product workstream, we aim to have no more than two active tasks at any given time. An active task is anything not in Backlog or To Do. Tasks are unassigned whenever there’s no more work remaining on the task—typically all that remains is deployment. In this way, finished-but-not-deployed tasks don’t count towards anyone’s WIP.

    Flow is the name of the game

    With Socratic, flow efficiency is derived by the movement of tasks on the board. As work moves forward, backward, or stalls out, we surface in real-time the impact on flow. This is shown on the health card as follows:

  • Rework: that is, tasks that went backward in the workflow, either from a later active phase (e.g. "Testing") to a prior one (e.g. "In Development"), or from the Done phase back to an active phase.
  • Deprioritized: this is any task that moves from an active phase back to a waiting phase (e.g. Backlog or To Do).
  • Tasks that become blocked.
  • Limit the backlog to 30 tasks, max

    There’s more than magic behind this figure. We’ve used Socratic data to analyze the correlation between backlog size and average cycle time for tasks. The analysis shows that, on average, teams with backlogs of fewer than 30 tasks deliver three times faster (!) than teams whose backlogs have become bloated, um… craplogs.

    Git integration is our friend

    As you’d expect, we use Socratic’s git integration to automate the movement of all tasks with coding involved. This also lets us see, via Socratic Trends, how things like pull request merge time affect our flow and cycle time.

    Yes to changelogs

    Keeping a regular changelog is a good practice. (Here's ours.) A changelog demonstrates, to both customers and team members, the pace of fixes, enhancements and innovation.

    Generally each of our changelogs is headlined by at least one objective delivered, along with other fixes and “tiny wins” (see below). See how to create a changelog in Socratic here.

    What about sprints?

    Sprints, at their finest, provide three things:

  • Smaller time horizons make it simpler to sort what to do next and how much can get done;
  • Regular interaction with stakeholders to show realized progress;
  • Regular review of what's working and what needs improvement.
  • Stripped to that essence, we like sprints.

    The trouble is, all too often that essence gets lost. Instead, sprints become make-work generators, vehicles for hours of empty ceremony. Hours of story pointing and sprint packing, all of which sucks up engineering time, little of which provably moves value into customer hands as efficiently as possible.

    For plenty of engineering teams, sprints aren’t much more than a default choice. We don’t want to go back to waterfall, so what other option is there? This is understandable. Sprints are something everyone understands by this point, “agile” is a nice word, and aren’t burndown charts better than nothing? (Actually, no.)

    Put another way, the pursuit of “Agile-capital-A” frequently just reimposes the kind of ideological adherence to process that the Manifesto writers meant to explode. (Every revolution eats its young…) If this describes your world, the methods laid out in this guide—combined with the data surfaced natively by Socratic—are a worthy alternative to try.

    If sprints are working for you, and you just want better intelligence and an end to empty ceremony: we provide that too.

    Make room for tiny wins

    We’re fans of Joel Califa’s concept of Tiny Wins. Probably you do something similar—looking for small changes with outsized impacts, keeping eyes peeled for the proverbial low-hanging fruit.

    We like tiny wins as a complement to objective-driven development. It reminds us value doesn’t have to come as part of something big(ger).

    So, what’s the best way to ensure tiny wins are a living part of your development? We create a monthly recurring objective to capture all tiny wins (as well as other generally recurring work, like bug fixes) that we’ve prioritized for that month. The advantage to this approach is that it gives visibility to the investment. Anyone can see the objective and know tiny wins are one of the priorities of engineering.

    What’s nice is that Socratic’s data lets you see the time invested in Tiny Wins. In Trends, you can see what work was delivered as a result, and whether the size of investment and results are changing over time, and whether they match your expectations.


    How Should Engineering be Measured?

    Continuous improvement—measuring what we as engineers do well and what needs attention—drives healthy teams and good relationships with stakeholders.

    The trouble is, “continuous improvement” is nebulous. What to measure? And how?

    As a general rubric, we favor actionable data that can be understood by engineers and non-engineers alike. Furthermore, any good metric should be:

  • Discoverable: The metric shouldn’t require hours of work and/or an army of people to build queries and data lakes. If it does, it'll soon collapse under its own weight.
  • Actionable: The metric should make obvious what needs attention. If a metric starts any kind of debate about what it means and what to do next, it flunks.
  • Consequential: We want things that actually move the needle, that show not only to ourselves but to the broader business how things are going.
  • This stuff is Socratic’s reason for being. This is our bread and butter.

    Demystifying what matters

    If software engineering has historically resisted measurement, it may owe in part to a feeling that there’s no single, common goal to measure ourselves against. Sales, for example, has dollars. Marketing has leads. Finance has profitability.

    What do we have?

    Consider the answer this way. The job of engineering is to turn ideas into software. Framed this way, there are really three things we want to know:

  • How many ideas do we deliver? (Throughput)
  • How fast do we deliver them? (Speed)
  • How well do we do it? (Efficiency)
  • If we get a lot done, well and at good speed, we know—and can show—that we’re doing our job.

    What represents an “idea”? In software engineering, it’s tasks and objectives. These give us our common unit of measure—the equivalent of “dollars” or “leads.”

    With this in mind, let's look at how we use Socratic Trends to understand throughput, speed, and efficiency for any body of work.


    Obviously, the starting variable for throughput is knowing how many tasks and/or objectives we completed in a given period. But that variable, by itself, is fairly meaningless. Is ten good? A hundred? A thousand?

    What we're really interested in is the amount of work completed, relative to the amount of new work raised. Ten tasks completed is actually a good number, if the number of new tasks raised over the same period was, say, seven. This tells us that delivery is outpacing demand: we’re keeping ahead of the number of new ideas being requested.

    But—can we really use tasks and objectives as the unit to measure throughput? Aren't there so many variations in their size and complexity, that you're essentially mixing apples and oranges and bicycles?

    Paradoxically, the answer to both questions is Yes.

    How? The law of large numbers (LLN). Over the course of enough tasks, the inevitable differences in complexity among them basically come out in the wash.


    In software engineering, we've developed a strong allergy to saying how long something is likely to take. Instead, we signal effort through coded language like Fibonacci numbers or story points. The whole point of these secret handshakes is to remove the time element. Why? As Agile Manifesto contributor Ron Jeffries once put it, apologetically, “to obscure the time aspect, so that management wouldn’t be tempted to misuse the estimates.”

    Resist the urge to use these opaque abstractions. The business doesn't understand them. (They're not exactly crystal clear among engineers either.)

    Instead, Socratic determines the actual average time, measured in days elapsed, required to complete tasks and objectives. In plain terms, this is the average time to deliver, or cycle time—something everyone can understand.

    With Trends, we can see how our cycle time is changing period over period. As a rule, we like to see our cycle time average not more than seven days. If we notice a significant upward change, our eyes go immediately to the diagnostic metrics included as a part of cycle time—notably merge time, efficiency, and phase time (see next).

    In our analysis of anonymized work activity of hundreds of Socratic beta users, we found that the average cycle time for teams was nine days. Of this, merge time for pull requests made up nearly a third (2.9 days). This wait time is low-hanging fruit. If you find your own merge times are growing, try breaking pull requests into smaller units. Less time to review will also mean less reluctance to review.


    An efficient engineering team is one whose work moves from start to finish with a minimum of interruption. This is flow efficiency, as mentioned in the previous chapter. Socratic derives flow efficiency based on the movement of tasks:

  • Rework: that is, the backward movement of tasks, e.g. from a test phase back into a development phase;
  • Deprioritized: tasks that we got started on, and then had to backlog in favor of new, higher priority work;
  • Tasks that become blocked.
  • For efficiency, we want to know how much of our total active work time is spent productively—that is, in a normal flowing state—versus time spent in any of the above exception states.

    Assume a project has absorbed a 100 total work days so far. If the time spent in nonproductive states is only 20 days, we probably feel pretty good—our efficiency, by this measure, is 80 percent. But if that number were 50 days, it would mean that half our total time so far has been eaten up by blocked or idled tasks, reworking tasks, or burning time on things that fell out of priority. Something is off.

    Generally, we like our own flow efficiency to be no less than 80 percent.

    As a reminder of the “hidden tax” that inefficient flow exerts on software teams, the aforementioned analysis saw that tasks averaged 19.6 percent of their duration in other, non-flowing exception states such as blocked, rework, or deprioritized. When combined with merge times that consumed another 33 percent of cycle time, this means more than half (51.9%) of the average task cycle time was spent in waiting or exception states.

    By using Trends to measure our flow efficiency, and to spot inefficiencies that may creep in over time, we have a clear path to faster, more efficient “idea delivery.”

    Scrutinizing flow efficiency is especially useful when we want to diagnose the cause(s) of any bottlenecks in our workflow. To spot bottlenecks, we use the phase time analysis surfaced in Trends. This shows whether any work phases are slowing (at either a plan or workstream level). We trust that WIP limits will keep work moving; we rely on phase time analysis to prove it.

    Adding it together

    Each of these measures is useful on its own. But they're best when used as a cooperative group. What really interests us is the check-and-balance among them.

    For example, if we're working efficiently and at good speed, but our throughput isn't keeping pace with demand, the implication is clear. We need more people—or less demand. What's nice is that the data make the case for us.

    On the other hand, if we see our efficiency is falling off, to the point of impacting speed or throughput, "more people" isn't the answer. Instead we're going to dig in on the choke points: which exception states are on the rise, and why?

    In these cases, the collective measures become essential for understanding how we work. Is too much demand overwhelming the machine? Is there some recurring inefficiency in the way we operate? The data help to surface the what, why, and where.

    Cadence for continuous improvement

    Socratic Trends surfaces all of the above metrics for any body of work. We consult Trends on a weekly basis to understand how a given objective, as well as the plan as a whole, are evolving. In these weekly check-ins, we’re most interested in understanding how demand (i.e. new tasks) has been trending—are we seeing a lot of scope growth? Is work still being completed at the same rate, or is the objective going stale?

    On a monthly basis, we use Trends to evaluate how we’re doing as an organization. The entire point of Trends is to show us how we’re doing against our past historical averages in each key metric area. This is the kind of data that’s essential to any kind of continuous improvement, and which in the days before Socratic we found maddeningly difficult to get.

    We especially like Trends for evaluating how some initiative is (or isn’t) helping us to work better. It allows us to see the before and after effects on productivity. In 2022, for example, we used Trends to benchmark the impact of a 4-day work week. You can read more on the results here.