Types of estimates

2024-Apr-13

There's at least 4 "types" of estimates

Bottom line up-front, I think there are at least 4 types of estimate:

Delivery date
Cycle time
Dev-time
Relative

I only intend to define these here, not advocate for any specific type.

Delivery date

Estimates like "we'll have it to you by 2024-Oct-01" are delivery date estimates. They're pinned to a specific date in the calendar and most often given to customers and used for sequencing "resources" (i.e. people). That is, if a staff engineer is expected to finish project A by a given date, then that's roughly when they can start work on project B.

These incorporate many sources of uncertainty, such as how long the work will take, who will be available to work on it, what projects are ahead of it in the queue, and so on.

This is closely related to the concept of "lead time", which is the duration between receiving the request and the customer getting the goods. This is effectively the lead time with a fixed start date, so I don't differentiate between the two.

Cycle time

An estimate like "this will take 5 days from start to finish" is a cycle time estimate. It's not pinned to a specific date in the calendar, and unlike lead time it ignores any spent waiting for someone to start work on it. It does not care that it'll take 2 devs to write it, 1 to review, and 1 to deploy it. Some tools offer this number directly as the duration between pulling a ticket into the InProgress state and Done state.

Like a delivery date estimate they incorporate sources of uncertainty, such as time spent waiting for a reviewer or deployer to get to the task.

Dev-time

Dev-time is the first type of estimate that makes the cost of parallel work explicit. This is the sum of the hours that we expect the devs to work on a given ticket. For example, if it takes 3 developers 5 hours each to complete a ticket [1] , then the ticket "cost" 15 dev-hours.

This ignores both time waiting for the work to start and hand-off time between developers. It's main sources of uncertainty are how much work a given developer can "actually" do in one hour and it assumes some degree of fungibility between staff.

Relative

Here, you might say that ticket A is twice the "size" of ticket B. There's some idealised ticket with size "1" against which all others are measured. This type of esimtate includes story points and task points. In some tellings, it also includes t-shirt sizes. I've heard folks say that these estimates should include "risk" in the estimate.

For these types of estimates, there's uncertainty between the estimates themselves; e.g. is task A twice the "size" of B, or is it only 1.5x the size with some extra thrown in to account for "risk". Risk is also often not well defined in these discussions. If you only use specific multiples (e.g. Fibonacci numbers), then you introduce quantization error. Despite defining these relative to an idealised task, team members usually develop private definitions that add to the overall uncertainty.

Despite these uncertainties, most teams find they finish a predictable [2] number of story points per week [3].

Estimate confidence

All of the above estimates incorporate uncertainty but do not make their uncertainty explicit. For example, "I'm 50% confident we can deliver to the customer by 2024-Oct-01" is a different estimate to "I'm 85% confident we can deliver to the customer by 2024-Oct-01". Personally, I would hesitate to give either estimate to a customer, but I'd prefer the later (assuming they cost the same to produce.)

This speaks to calibration. An estimator is calibrated when their probabilities consistently represent the future. That is, things with 50% confidence happen roughly 50% of the time, 80% confidence 80% of the time, and so on.

Confidence can easily change over time as developers learn more about the problem or their processes. For example, the depth of the queue, a tasks' position in the queue, and the queue order stability all affect a delivery date estimate.

Moving between different estimate types

We can convert most of these estimates into one another, but we have to accept model error when we do.

For example, if we know a task's position in our queue, the average cycle time, and the number of developers on the team, then we can create a delivery date estimate. We may have much lower confidence in the delivery date estimate, but we get it for free.

Similarly, given a tasks' relative size we can create rough estimates for the dev-time or cycle time. Dev-time can be roughly turned into cycle time by dividing by the expected number of developers and the number of hours we think a developer can work per day.

As mentioned, this introduces model errors. For example converting dev-time to cycle time as above assumes all of the developers work perfectly in parallel. If you only need a very imprecise estimate, this may be preferable to interrupting someone while they're working.

Why does this matter?

As a rule of thumb, having shared operational definitions helps a team communicate. If a project manager asks "how long will this take?", I answer "3 days", and I meant "cycle time" but they understood "lead time", then I'm probably going to burn some customer trust when I don't deliver the good 3 days from now.

Similarly, if we use delivery dates to calculate return on invested capital, that will inflate the apparent cost. There may be a significant amount of time working on higher-value projects before starting on the given project. That'll throw off decisions about what projects have the highest business value.

Conclusion

There are different types of estimates, each with different strengths and weaknesses. If you're having trouble with estimates, it's worth checking you're all talkng about the same thing.

Footnotes

[1]	And they all work for the full 5 hours

[2]	Here, I mean predictable in the same sense that we use in statistical process control: The measure shows only routine variation, so it will consistently fall within the computed limits. Predicting more narrowly than those limits is usually folly.

[3]	This is less remarkable than it first seems, since most teams also deliver a predictable number of stories per week, so `E[StoryPoints/week] = E[StoryPoints/Story] * E[Stories/Week]` (i.e. they're sort-of measuring the same thing.)