Kanban trainingprocess

The emperor has no clothes – why do we estimate?

Estimation is wasteful

In Hans Christian Andersen’s book, The Emperor’s New Clothes, two weavers give the emperor a new suit of clothes that they say are made of fine fabric that is invisible to anyone who is unfit for their position or who is “hopelessly stupid”. But there actually is no suit. In fear that they will be seen as being stupid, no one wants to speak up in case they are the only ones that cannot actually see the suit. While the emperor is showing off his new suit to all the townspeople in a parade , a child cries out, “But he isn’t wearing anything at all!” Now that the boy broke the ice, everyone else chimes in that they cannot see the suit either.

The expression that the emperor has no clothes is now used in contexts where people are widely acclaimed and admired but where others question whether what they have created has any value. A similar expression is, “the elephant in the room”. This expression is used when a major problem or controversial issue is obviously present but avoided as a subject for discussion because it is more comfortable to ignore. An essential factor in both phrases is the willingness and courage for people to engage in a difficult discussion that questions the status quo.

While spending many hours doing estimation has been the status quo – is it really necessary? Spoiler alert: you can forecast without estimating.

Real world experiences of the emperor has no clothes

My first real world experience of the emperor has no clothes occurred when I started my career as a software engineer back in the 1980’s. We worked on huge complex projects for a defense contractor which took about 4 years to get through the software development lifecycle – from requirements through deployment. All teams followed the waterfall process whereby we were allowed do nothing but requirements for an entire year. I am not exaggerating! After creating reams of documentation and following several days of  grueling meetings to prove to the customer that we understood the requirements, we were allowed to proceed to the design phase. We then created even more documentation over the next year and again following a series of meetings we were given permission to code. After one year of coding, we finally attempted to integrate and test everything together at the end. I probably do not need to tell you the outcome of this story. Needless to say, it was not pretty and we could not even get the software to build, let alone integrate and test. After many projects following this waterfall process and living through the associated pains, people started to speak up. It took a huge shift in mindset for folks to realize that the way we have always known to do things could be done differently – could have been done better.

There are plenty of other real world examples of the emperor has no clothes, but I wanted to focus this article on estimation. Today, the majority of people assume we must estimate. After all, how else would we be able to give a status update as to whether or not we are on schedule and within budget? At the same time, most would agree that estimation is not accurate. Not only is estimation not accurate, but to take it one step further – I would argue that estimation is actually wasteful.

Relative estimation in the real world

Today, the most common way agile teams who choose to estimate do so playing planning poker using relative estimation. The concept behind relative estimation is a good one – at least compared to prior estimation techniques. Since we really do not know how much work a team can produce in a given time period, we do not estimate in absolute terms like hours to complete an item of work. Instead, we pick a small item as a baseline and relatively size all items against the baseline and against each other. After the team operates for a certain period of time, the team measures how many relative points of work they complete on average over a given time period – say 2 weeks. This is known in Scrum lingo as the team’s velocity which they can then use to predict the future.

If the team sizes everything in a release backlog and divides that number by the velocity then (assuming nothing changes in the release backlog), they can forecast their release date. This also assumes that team members do not change since then the velocity will also change.

Teams typically size using a modified Fibonacci sequence of 1, 2, 3, 5, 8, 13, 20, 40, 100. You probably already know that the rationale behind this is to depict that larger numbers are even less precise than smaller ones. So, most teams will only work on items sized 8 or less and 13 and above are considered epics that must be decomposed before working on them. Some teams assume 20 and above are epics. Now, while a 1 is not 1 day and a 5 is not 5 days (i.e., they are relative sizes, not absolute) – since all work must complete in a 2 week window for teams executing 2 week sprint cycles, then a 1 has to be about 1 day and an 8 has to be about 8 days. Or else they could not complete within the 2 week sprint.

On most teams I have seen, over time there is a convergence of sizing. That is, the majority of items wind up being sized as 3’s and 5’s. There will be a few items that are very small and can complete in a day or 2. And a few stragglers that will take the entire sprint and be sized at an 8. But the majority will be sized as a 3 or 5. This is depicted in the following diagram.

While most teams do not go back and measure the precision of their estimates, the majority will probably agree that many of the items sized as a 3 took slightly longer than expected and should have been a 5. And many sized at a 5 took  slightly less than expected and should have been sized as a 3.

Relative estimation is wasteful – you can forecast without estimating

We are finally getting to the point of all this. And that is, when teams need to forecast, counting items completed over a given time period is just as accurate as sizing items and counting relative points completed. Relative estimation is wasteful. The following picture depicts a team who started a 2 week sprint with 8 items totaling 30 points and completed 7 of the 8 items worth 25 points.

Here is that same team who did not bother sizing and started the 2 week sprint with the same 8 items and completed the same 7 items.

In the first example, if the team wants to forecast they would 1st size all the items in their release backlog. Let’s assume it contains 50 items totaling an estimated 190 points. Then they will divide that by their 25 point velocity to predict it will take 7.6 sprints before they can release. They will round up to forecast that 8 sprints or 16 weeks are needed to release.

In the 2nd example, the team will not bother sizing the 50 items at all. They will take the same 50 items in their release backlog and divide it by 7 (the average number of items completed over a 2 week period) yielding 7.14, 2-week periods. They can also round up to 16 weeks as their release forecast.

Both teams come up with the same forecast

Both teams will come up with the same forecasted prediction of 16 weeks. Neither is precise and many things can change over the 16 weeks, including requirements changing based on feedback, team members changing, technical unknowns, and the list goes on, and on and on. So, it begs the question why we are bothering to forecast at all? And, for teams who can continuously release, it further begs the question. But that is a topic for yet another blog.

Assuming you need to come up with a forecast then at least in the 2nd example, the team did not spend countless hours playing planning poker to size their items. Their forecast comes for free by simply counting items done over a given time period. Most Kanban tools will give you this metric, known as throughput. All that time wasted doing relative sizing can actually be used by the team to do something productive like coding and testing which in turn will result in needing less time to release in the 1st place. Or they could take all that accumulated wasted time and periodically go out for some beers as a team.

#noestimates movement is the start to questioning the status quo

While many teams agree that the sizing effort is not providing any value, it is still a minority that are speaking out. Most teams simply continue to go through the motions estimating and wasting time. Some do not realize they are wasting time and others do not want to disrupt the elephant in the room. The #noestimates twitter movement is equivalent to the boy that screamed, “But he isn’t wearing anything at all!” That was a great start that created some stir in the industry. But just as it took a while for teams to slowly migrate away from a waterfall model to iterative development, I think we are still a ways out before the majority will view estimation as an activity from the past.

 

 

 

 

4 thoughts on “The emperor has no clothes – why do we estimate?

  1. Without points and velocity, how should a team decide on number of stories/tasks to bring into a sprint? I guess trial and error at the start of a project.

    Surely teams need to have a way of saying “last sprint we did this much work, next sprint we’ll aim for this much”.
    Without any concept of how much effort (points) a story might involve, how does a team ensure they are bringing in the appropriate amount of work for a sprint?

    1. Good question. I will answer in two ways. First, if a team is using Kanban then this is not an issue since Kanban teams do not plan the amount of work to bring into a sprint. There are no sprints / no timeboxes. Rather, the team is using a continuous flow model and brings in work as long as there is capacity and no bottlenecks. Please see my other blog https://kanbanmentor.com/how-kanban-helps-visualize-and-remove-your-bottlenecks/ for more information. Having said that, if a team is using Scrum, they may still apply the concept of noestimation by supporting their decision on how much work to bring into the next sprint using number of stories historically completed (as opposed to story points).

  2. In the Product Backlog, the less prioritized items are generally big in size & have to be further broken down. So, in that case by counting the number of items delivered , we might run into an issue of underestimating because lower down the product backlog, the items are bigger in size, generally, and hence, we might not deliver similar number of items every sprint.

    1. I agree that there is a scenario where a team wants to create a very large backlog and taking the time up front to decompose all items to small items would be wasteful. The team should not work on any items unless they have been broken down into smaller chunks. Decomposition may be a step in their Kanban workflow to ensure all items are ready to be worked on. They still do not need story point estimation and the team can still uses story counts to do forecasting since they are only using counts from completed stories (which are by definition small).

      Now, in the extreme case where someone wants to do a very course grain forecast for the entire backlog, including the larger chunks that have not been decomposed, my recommendation is to T-shirt size the large chunks. In this case, the team would size all Product Backlog Items as S-M-L. Small are the small items that the team could work on (i.e., takes less than 2 weeks to complete). These are the higher priority items. Medium is a lower priority, large item that takes less than 1 month to complete. Large is a lower priority, large item that takes less than 3 months to complete. Again, the team will NEVER work on any items unless they have been decomposed to a Small. They only use the Medium and Large to do a very course-grain “rough” forecast for very large product backlogs (when and if necessary). The team still uses story count for their forecast and they simply multiply Medium stories by 2 and Large by 6.

      As an example of this very rough forecast, let’s assume a Product Backlog contains 50 Small items, 20 Medium items, and 10 Large items. Let’s assume your team completes about 5 items each week. That is, 5 is the team’s story count throughput. The rough forecast to complete the entire backlog would be 30 weeks – computed as: (50/5)+(20*2)/5+(10*6)/5 = 10+8+12 = 30.

      I would strongly recommend this technique as a very last resort. Only when and if necessary in the scenario where you have a very large backlog, it would be a waste of too much time to decompose, and for whatever reason, a very rough forecast is required for everything in the backlog. I would try and minimize the number of Large items. Obviously, the more Medium and Large, the less accurate (it is an extremely very rough estimate). Since this forecast can easily be abused, I would avoid doing it at all. And only in organizations that are mature enough not to abuse it. The number is simply a very rough number and no detailed plans can be made based on it. But it is an option and makes sense under certain conditions.

Leave a Reply

Your email address will not be published. Required fields are marked *