SMART PRACTICES

Using AI to forecast software projects

Robert Crabbs

2-5-2025

I've seen more approaches to software project estimation than I can recall.call. Story points. T-shirt sizes. Bottom-up estimation. Top-down estimation. Parametric estimation. Three-point estimation… These approaches all have one thing in common. They’re disguises for a whole lot of guessing.

What other choice is there? Tell business stakeholders, “It’ll be ready when it’s ready”? Probably not. So we’re stuck trying to come up with reasonable timeframes. But trying to forecast how long a given project is likely to take is a form of fortunetelling. You assemble some experienced minds, gather their opinions, and hope for the best.

That hope isn’t often rewarded. A study by Oxford University shows that of 5,392 large scale tech projects surveyed, the average overrun was 92 percent. In other words, projects took nearly twice as long as planned. The problem isn’t that planners are reckless or stupid. The problem is that software building involves more complexities, variables and contingencies than any mind can possibly account for.

Mercifully, this is a problem almost tailor-made for AI.

How AI engages

The AI we’re talking about here isn’t the generative kind—it’s the other AI, the one that ingests rafts of data to identify patterns and make suggestions or predictions. Let’s consider, in the simplest terms, how human and machine intelligence might collaborate to create a forecast.

Humans provide two things: an approximate scope of work, and the people assigned to deliver it. Knowing what’s to be done and who will do it, it becomes the job of AI to predict the forecast time required.

To do this, an essential ingredient remains. We need a reliable record of how long past work has taken to complete—our historical actuals. Normally, this would be the end of the conversation. Few organizations have anything approaching a reliable record of how long past work took to complete. But happily, Socratic identifies this automatically. This includes having a detailed snapshot of every task at each point in its lifecycle, the work phases it passed through, and how long it spent in each.

With scope, people, and historical actuals in hand, we have our starting point. Let’s look at ways AI may be brought to bear.

1st Generation: Predictive analytics

An initial version of AI-driven forecasting might behave as follows.

The model considers:

The number of tasks identified (i.e., the scope); The number of tasks identified (i.e., the scope);
How many of the tasks have been assigned (versus those still awaiting assignment), and to whom; How many of the tasks have been assigned (versus those still awaiting assignment), and to whom;
The status of each task: is the work still in backlog? Has it been started? If started, how far along is it? The status of each task: is the work still in backlog? Has it been started? If started, how far along is it?
The historical actual average time to complete a task, by person and task state. For example, the machine must know that for Person A, the average time to complete a task is (say) 14 days when the task is in backlog. But also that time to complete changes to seven days once a task moves from backlog to active work. The historical actual average time to complete a task, by person and task state. For example, the machine must know that for Person A, the average time to complete a task is (say) 14 days when the task is in backlog. But also that time to complete changes to seven days once a task moves from backlog to active work.
The historical weekly average throughput rate of the team(s) assigned the work. The historical weekly average throughput rate of the team(s) assigned the work.

If the math of this sounds too straightforward to require more than a spreadsheet, remember that each of these variables is in a constant state of change.

The number of tasks identified can and will change over the course of the project, but especially early on as scope comes into focus.

Assignees will change too—including the fact that different people may work the same task at different points. (For example, a designer passing the task to a developer, once design work is complete.)

Throughput rates are living things as well. Teams speed or slow their rates of delivery based on personnel changes, technology changes, and a host of other variables not visible to the naked eye.

Safe to say that even this “initial” approach to intelligent forecasting is light years beyond how most companies forecast today. But there are limitations. With this approach, we see:

A single forecasted end date, instead of a probable date range, which suggests a level of precision that simply doesn’t exist. A single forecasted end date, instead of a probable date range, which suggests a level of precision that simply doesn’t exist.
Some unrealistic assumptions about the way people will deliver the work. For instance, the model assumes all assignees are instantly available to begin work, and will work on these tasks alone, with no other commitments to other projects. Some unrealistic assumptions about the way people will deliver the work. For instance, the model assumes all assignees are instantly available to begin work, and will work on these tasks alone, with no other commitments to other projects.
No accounting for complexities like blocking relationships among tasks, or how variables like the time of year (think: summer, or holiday season) impact throughput. No accounting for complexities like blocking relationships among tasks, or how variables like the time of year (think: summer, or holiday season) impact throughput.

Luckily, machine learning is up to these challenges…

2nd Generation: Monte Carlo simulation

For the uninitiated, Monte Carlo is a statistical technique used to model complex systems across a wide range of fields—traffic control, stock predictions, and nuclear reactors, to name a few. The idea is to use the law of large numbers to predict how a complex system might behave.

Monte Carlo runs a myriad scenarios using our best empirical data and knowledge of the "rules of physics" for the system at hand. By gathering results from enough randomized scenarios, we can get an idea not only of what is most likely to occur, but also visualize the range of possible outcomes.

What does this look like in the realm of software project forecasting? The basic steps and inputs are as follows:

(1) Set up N tasks with random (but representative) durations.

(2) Set up a pool of M people to complete the tasks.

(3) Assign the N tasks to the M people.

(4) Determine the completion dates for each task.

(5) Construct a burndown chart from these dates.

(6) Repeat steps 1-5 Q times.

Humans provide N; the machine provides the rest. With Socratic, this includes randomly-sampled task durations based on your organization’s historical actuals, which is needed for the first step. We’ve set the number of model runs (Q) to 1,000. (The more you do, the more statistical significance you gain, but it also starts to absorb some serious compute cycles. In our own analysis, a thousand runs strikes a nice balance between system responsiveness and forecast reliability.)

Simply put, this model reveals all of the possible durations if the same body of work was delivered by the same people, a thousand times. This range of outcomes is reflected by a date range rather than a single date.

(You can see a lighter-weight version of this in action with our public forecaster. The full version, which includes personalized historical actuals, is available to users of Socratic.)

With Monte Carlo simulation, we’re able to absorb complexities the previous, more deterministic algorithm can’t. These include nuances like:

The rate at which tasks fall idle; The rate at which tasks fall idle;
The number and relationship of blocked tasks; The number and relationship of blocked tasks;
The amount of rework involved before a task is completed; The amount of rework involved before a task is completed;
The amount of inevitable scope creep in a project; The amount of inevitable scope creep in a project;
The impact of holidays on people’s availability; The impact of holidays on people’s availability;
The participation of people with specialized skills. The participation of people with specialized skills.

Plainly, this approach represents a big leap from the previous model. As opposed to a deterministic algorithm, this operates as a sort of semi-supervised one. Here the algorithm combines our domain knowledge of how work gets done with observations of real outcomes.

3rd Generation: Deep learning

Neural nets are powerful AI tools used to recognize complex patterns in “unstructured” input data. Think of MP3 files, a digitized time series of recorded sound waves. There are clearly patterns in the data—the human ear can easily differentiate between hard rock and blues, after all. But it’s very difficult to instruct a machine in what precisely makes the difference. Neural nets get around this problem by deducing thousands, if not millions, of interacting rules to arrive at a conclusion. None of those rules need to be intuitive or grounded in anything “physical” — it's simply whatever statistical relationships maximize the accuracy of the network.

A main limitation of neural nets is they need very large datasets to go on. In forecasting terms, this means we need lots and lots of completed projects we can point to, and mark as having been timely, early, or late, so the model can begin to discover the patterns behind each. While we build up this volume of training data, Monte Carlo is the way.

Granted, there’s another disadvantage to this kind of advanced forecasting: the model can’t tell you how it arrived at its conclusion. This is inherent to most state-of-the-art machine learning techniques. As a general rule of thumb, the more complex the model, the more effects it can account for. But also the less explainable its conclusions will be. Its forecasts are likely to be richer; why or how is a mystery.

But isn’t this a tradeoff most project owners would gladly accept?

Using AI to forecast software projects

How AI engages

1st Generation: Predictive analytics

2nd Generation: Monte Carlo simulation

3rd Generation: Deep learning

Resources

Company