Musings of a Software Architect: Estimating and why it is so hard

In this essay I want to consider the topic of estimating in software development. As any software engineer knows, there is a constant pressure to provide estimates for how long a task will take and it is notoriously difficult to come up with decent (by which I mean reasonably accurate) estimates.

In order to understand this, I want to consider what an estimate is, what the limitations on an estimate are and to touch on some of the techniques that can help.

What is an estimate?
An estimate is a prediction of the future. The most common form of estimate required in software engineering is how long it will take to do a task. Depending on the context and the sophistication of the organization, the estimate may be in terms of man-days or elapsed time and may be loaded to some extent.

Why is estimating hard?
Predicting the future is hard in general. If it was easy then stock market analysts and astrologers would be out of business. In the field of software engineering, some tasks can be routine and some tasks are novel. The difference is crucial - if you treat all tasks as novel then it can be impossible to move forwards but if you treat all tasks as routine (often the assumption) then you fail to take into account the real difficulties and you are almost certain to come badly unstuck.

Routine tasks are those that we have done before and that we have a reasonable chance of successfully doing again. They are unlikely to be totally routine but the probability of surprises is low.

Novel tasks are those that involve a high level of creativity or problem solving. We don't know how to do them before we start and so our prediction of how much work will be involved or how risky they are is not likely to be accurate.

There is a host of techniques to help with estimating routine tasks. These have been analyzed extensively and are covered by books and training courses. It is still possible to make mistakes but the probability and the consequences are manageable with some planning.

There is a much smaller range of techniques to help with novel tasks but there is some hope.

Limitations to Estimating
A naive view of estimating is to expect an estimate to be a single figure that should be more or less accurate. For example, if an engineer estimates that it will take five days to program the Wurlitzer component then we might expect the Wurlitzer component to be complete five working days after he starts. if it takes six days then we will view that as a 20% over-run and so on.

The first catch is how well we have defined the task that has been estimated. Does the task include code review and rework? How about unit or integration testing? How about documentation? This type of problem can be overcome simply by taking pains to be clear about the task definition and the easiest way to achieve this is to work out (and publish) a set of organizational norms so that engineers and project managers know what they are talking about without having to argue about it or write lengthy project documents.

The second catch is is that a single figure gives no measure of the risk. If five days is the most likely time for completion then what is the probability of going over by one day, or two etc. A task that has a likely time of five days but a worst case of ten days is significantly different from one with a likely time of five days but a worst case of 20 days. A standard way to represent this is to use 3-part estimates. For each task, estimate the time taken in the best case (if nothing goes wrong), the likely case (assuming normal levels of disruption and problems) and the worst case (assuming lots of problems). As an aside, the real worst case is always infinite, what we are really after is a 95% estimate; a figure within which we expect to finish in 95% of cases.

There are formulae to combine the best, likely and worst case figures to give a single figure for planning purposes but I find the worst case estimate useful simply because of the thinking that it triggers. If you can get an engineer to avoid adding a standard percentage to all estimates to get the worst case then the interesting tasks are those for which the worst case is much larger than the likely case. Look at these tasks more closely to see if there are risks or uncertainties that could be eliminated.

The other value of 3-part estimates, apart from helping to identify risky tasks, is that it makes explicit that there is a margin of error on any estimate - it is not a guaranteed figure. It can be interesting to consider how the margin of error changes over time. Theoretically, the margin of error is linked to the degree of knowledge or uncertainty about a task

Initially, we have a very limited amount of knowledge about a task (certainly if it is novel). As we progress through the task (or through other tasks that affect it), we gain more relevant knowledge and the margin for error should decrease (when the task is complete the margin for error should be zero!). This is why I believe strongly in re-estimating during a project and a task to get better estimates as we go along.

Estimating routine tasks
I will not spend many words on this subject because there is such a rich body of work already available. The key techniques are to use historical data and check-lists and to identify causes of variability. Measures of task size such as function points or lines of code can be a valuable starting point and measures of variability such as engineer experience or the complexity of the other components to be interfaced with can be useful.

Even with routine tasks, estimates will never be 100% accurate. Unforeseen problems and variations can trip up the most experienced engineer. However, with some effort, the 3-point estimates can cover most of the tasks.

If a project includes a mixture of routine and novel tasks then it is worth minimizing the risks on the routine tasks just on principle but the majority of the uncertainty will come from the novel tasks.

Estimating novel tasks
With novel tasks, it is difficult to predict the work based on history so we need an alternative approach.

The only real way I know to approach a novel task is to do some up-front design. This is aimed at eliminating some of the areas of uncertainty and trying to break the task down into smaller parts that are more easily understood. It is important to understand that you need to do some work before you can start to come up with the estimate. It is not reasonable to ask for an estimate for a poorly understood task without allowing for some analysis.

The trick is to decide how much analysis is required before coming up with the estimate: too little and the estimate will be dangerously unreliable, too much and the work will never start.

Depending on the types of uncertainty, you might try prototyping but this does not replace the need for design work.