Thursday, May 16, 2013

Story Points: Why are they better than hours?

      Traditional Estimation Funnel
Microsoft Story Point accuracy

Story points give more accurate estimates, they drastically reduce planning time, they more accurately predict release dates, and they help teams improve performance. Hours give worse estimates, introduce large amounts of waste into the system, handicap the Product Owner's release planning, and confuse the team about what process improvements really worked.

Interesting new research has become available. The Standish Group has updated their findings on project success rates based on analysis of the last decade of data with tens of thousands of data points. In addition, Microsoft has new research findings showing that agile estimation is astoundingly more accurate than traditional project estimation. See:

Scrum + Engineering Practices:  Experiences of Three Microsoft Teams
Laurie Williams, Gabe Brown, Adam Meltzer, Nachiappan Nagappan
IEEE Best Industry Paper award winner

Many people who have managed projects with hours have a hard time understanding why story points are better. They have failed to understand some fundamental data that has been published for over 60 years in the industry literature as well as the latest research.

First, let's look at the latest data on project failures. Failure rates are increasing for IT projects during the current disruption of the global financial system. The latest Standish group analysis shows that agile projects have three times the success rate of traditional projects. Jim Johnson now recommends agile practice be used universally on all projects.


In fact the latest Forrester Group research shows that:
Common Project Management Metrics Doom IT Departments to Failure

The venture capitalists I work with say they have never seen a correct GANTT chart in a board meeting. When they dig down into the problem they say none of their management teams knew the velocity of their teams before they implemented Scrum. Not knowing the velocity of production of the teams is the root cause of 100% failure of release plans to be accurate in their board meetings.

The stability of a user story is critical for planning. A three point story today is three points next year and is a measurable part of the product release for a Product Owner. The hours to do a story depend on who is doing it and what day that person is doing it. This changes every day. The GANTT chart assumes a fixed number of hours for some fictitious person (who often is not the person to implement) and assumes fixed dependencies (which are always changing). A study of 80 multimillion dollar projects at GSI Commerce (now owned by eBay) showed that the best experts in the company were totally incapable of estimating how much time a project would take by the people who actually implemented it.

You would think these data would cause people to change their behavior but many companies seem to prefer to continue to fail and be acquired or go bankrupt rather than improve their project management techniques.

Rand Corporation research in the 1940's showed clearly that humans are not good at estimating hours and practical experience repeatedly confirms the research. The recommended the Delphi approach to estimation which was adopted in software development as the Wide Band Delphi technique. The same technique is now embedded in the practice called Planning Poker for agile teams.

The management metric for project delivery needs to be a unit of production. Production is the precondition to revenue and companies say they are in business to grow revenue and margins (even though in project planning they often do the opposite). At least a venture capital group is clear that it is all about the money and money comes from velocity of production combined with quality of the product. Hours are expense and should be reduced or eliminated whenever possible.

The best data on individual developer performance comes from Yale University and has been reported previously on this blog. The best developer on a project takes one hour to complete a task while the worst developer takes 10 hours (within a project) or 25 hours (across projects). For teams the difference is an order of magnitude greater. Larry Putnam's published data show that an hour for the most productive team turns into 2000 hours for the least productive team.

Hours completed tell the Product Owner nothing about how many features he can ship and when he can ship them.

The important metric is the number of story points the team can deliver per unit of calendar time. The points per sprint is the velocity. Therefore we estimate everything in points for the Product Owner so that he create a release roadmap based on team velocity and adjust the plan if velocity changes.

The way we do story point estimation gives better estimates than hourly estimates as they are more accurate and have less variation. A CMMI Level 5 company determined that story point estimation cuts estimation time by 80% allowing teams to do more estimation and tracking than a typical waterfall team. A telecom company noticed that estimated story points with planning poker was 48 times faster than waterfall estimation practices in the company and gave as good or better estimates.

Story points are therefore faster, better, and cheaper than hours and the highest performing teams completely abandon any hourly estimation as they view it as waste that just slows them down.

For a complete break down on the points vs. hours debate see Scrum Inc.'s webinar on the topic. 

85 comments:

Grant (PG) Rule said...

Hi Jeff,
I'm with you 100% of the way… until you get to the bit where you say "The metric of importan[ce] is the number of story points".

Sorry, but story points are just a way of disguising 'planned workhours'… or 'estimated workhours' if you prefer.

We have known for nearly three decades how to measure the output of the software process… ie. the quantity of functionality required or delivered. aka the 'amount of information processing'. You may not like it, but this is what functional size measurement is all about. It's why Allan Albrecht invented Function Points back in 1977/8.

The need for a measure of the output of the software process has changed little since the '70s. However, nowadays the COSMIC FSM Method provides a means to measure the output functionality that a big improvement over the old FPA methods. Ref: < www dot cosmicon dot com>

COSMIC easily integrates with user stories and, using Data Movement Sequence Diagrams, can help to visualise the user's functional requirements. This visualisation is a boon, as it also contributes to communication between the user and the developer (and product owner).

If you'd like to discuss this further, I'd be happy to hear from you.

Best regards,
Grant (PG) Rule < g.rule @ SMSexemplar dot com >

Jeff Sutherland said...

I agree that Function Points are the appropriate metric for comparison across teams and technologies. A CMMI Leve 5 company I do research with has spent two years trying to implement COSMIC function points and still doesn't feel like they have it working properly. Agile teams have neither the time nor the need to do this. Story points are effective as a team based metric. The interesting data is acceleration of velocity and they provide a baseline to accelarate from.

Michel said...

@Grant "Sorry, but story points are just a way of disguising 'planned workhours'… or 'estimated workhours' if you prefer."

You are suggesting that sory point are something real, like some time units. But they represent a relative measurement.

And because there is no direct relationship to 'hours' they compensate for variance in work. That's another reason they 'work' better as absolute units in time to estimate...

Rajaram Parimi said...

We have adopted another variant of Story Point and refer it as a
"Complexity Point (CP)". This way it is used to reflect the relative
complexity and cost of implementing various features in our application framework.

Our basis of a CP estimate is an "Ideal Developer Day" i.e. 8 hours of focussed work per day. Further this is normalized for the skill level of the various developers within a team. For a developer with the relevant experience in the application framework and the technology, an 8 hour task is pegged as 1 CP of estimate. Further the velocity measures the "Actual Developer Days" consumed per CP of estimate. We peg @ 2 Actual Developer Days per CP. Different teams have a velocity
ranging between 2.0 ~ 2.5.

The velocity of the team is basis for capacity calculation per sprint. The plan level indicates the available capacity specific to each team's velocity. During the execution, focusses on retaining the plan level and trying to improve on its velocity. The predicatability with this approach is around 75% to 85% i.e. 7 out of 8 teams make thier plan level.

In our case the CP is a planning tool. By working out a proper velocity per team based on thier history, a predictable plan level can be achieved.

Jeff Sutherland said...

The fundamental point of this blog item is that relating hours to story points causes a huge impediment for the team. If the team in generating continuous process improvement, the hours per story point will be continually decreasing. Assuming a number of hours per story point makes it impossible for the team to show they have improved and fixes in their mind that there is a reasonable number of hours per story point.

The only reasonable amount of hours for a story point is zero and developers should be constantly trying to get there. For the highest performing teams I have worked with they have reduced 10 hours per story point to one and some of them have reduced it to 30 minutes. For them I say why not 15 minutes?

Klas said...

How do you know that the decrease in story points is actually an increase in productivity? It might as well be inflation in the estimates? Especially if the team is measured on their velocity.

As Goldratt put it:
"If you tell me how you measure me, I'll tell you how I'll behave"

Jeff Sutherland said...

My venture companies typically have two stable references stories for 1,2,3,5,8,13,20 points. It is easy to see where stories fit. The Product Owner can help keep people honest on estimation.

The problem for our teams is not inflating estimates. As they go faster they estimate similar stories to have fewer points unless they are very careful to have stable reference stories. For some of them we know they are going twice as fast and velocity has not change. This is an impediment because they cannot tell when a process improvement has helped.

For those teams that lie about estimates and fudge numbers, we generally don't hire them in startups (or get rid of them when they don't produce).

Klas said...

I think having reference stories will help. Could you share some of yours?

Also, as a project progresses, dependencies increase and cost of change (or time to implement) usually increases. This is why XP focus a lot on refactoring. This may mean that the reference stories will have to be reevaluated, or new ones might have to be written to reflect the current architecture/infrastructure, right? Or how do you handle that?

Also, I liked the following solution when opting for using hours:
http://frank.vanpuffelen.net/2007/08/scrum-utilization-vs-velocity.html

He mentions a "focus-factor". Using that would also help the team know when they are removing impediments, right? Or do you see any benefits of story points over that solution?

DanielKr said...

"For the highest performing teams I have worked with they have reduced 10 hours per story point to one and some of them have reduced it to 30 minutes. For them I say why not 15 minutes?"

That sounds implausible unless, of course, it is the effect of the team going from being beginners to being proficient in the tools and platforms they use. So, how big is such a story point, and what did these teams do to increase their productivity by more than an order of magnitude?

Daniel said...

Very little of this posting supports the thesis. Most of it, while interesting, is irrelevant to the question of “story points” versus hours.

“Story points” and hours must be convertible (with a different exchange rate per team and presumably varying over time), since the actual amount of work is the same either way. Which unit one chooses does not change how people use their time, does not change the pace at which they work, does not change the amount of time available for them to work, and does not affect the fact that work is a temporal activity.

Hence the only explanation for better results using “story points” (if they are indeed better) is that there is some common human bias against thinking in hours specifically. That is ALL the debate can legitimately be about, yet nothing in this apology for “story points” addresses that.

The only support in the blog posting is the claim that a telecom firm found that their estimation was 48x faster using “story points” over hours. But is that really what happened? Did they use poker games before, when estimating in hours? Did they use a Fibonacci-like set of buckets before, when they were estimating in hours? Or did all of that change as well? If so, the switch to “story points” says nothing reliable.

Until someone conducts a reasonably well-controlled study in which only “story points” versus hours is allowed to vary, the matter remains one of ideology or superstition.

Jeff Sutherland said...

I just finished a Scrum metrics presentation at Agile 2010 which some people thought was the best at the conference. It showed the importance of story points and fixing the reference. Eight metrics were presented and six of them are comparable across teams. Actual team data was presented showing a team going to 1600% of initial velocity. We already have similar data for half a dozen teams and will be collecting more.

Let me reiterate one again. Those with hours have these problems:

1. They don't know their velocity and project plans are 100% wrong according to our investors on the boards of OpenView Venture Partners portfolio companies.

2. Because they don't know velocity they cannot show clear demonstration of improved performance to management.

3. Without good velocity measurement they cannot implement a performance improvement and know that it really worked.

Hours are an extremely bad practice and should be abandoned. I don't even teach hours any more in Scrum training as I find teams get confused and find it difficult to execute Scrum well.

An finally, hours introduce extra waste into the team. It take a lot of time to do the estimates. They are not as accurate as story points. And teams using hours tend not to improve as fast as those using points.

Daniel said...

1, 2, 3. That is because they haven’t measured their velocity, not because they’re using hours.

The blog posting also makes such statements as, “The best developer on a project takes one hour to complete a task while the worst developer takes 10 hours”, as if this somehow changes when one measures with “story points” instead. It doesn’t. It’s non sequitur. While it may be true, it does not support the thesis in any way.

This is why I state that most of the blog posting does not support the thesis. It’s a lot of stuff in favor of the whole scrum package, but nothing that actually demonstrates the importance of “story points” specifically. It certainly does not address my curiosity about why “story points” over hours.

If the thesis is that velocity is a “quantity of production per unit time” and that therefore you must assess the quantity of production independently of time, then my question is, “What are your units for ‘quantity of production’?” You do not have many answers here. We do not equate the unit of production to the number of lines of code in a “story point”, to the dollar value of the “story point”, or to the number of fan letters accrued by the developer. No. We conduct sprints of a fixed duration, and, as the scrum team refines its estimation abilities, the number of “story points” completed during a sprint is expected to settle into something constant. Therefore the quantity of production is the amount of time it takes. That should be no surprise; when we estimate “story points”, we estimate how “large” a task is compared to other tasks. “Large” means… how much time it will take, and nothing else.

“Story points”. Hours. It’s the same thing, and nothing in the blog posting demonstrates otherwise. You’ll need some psychological study showing why the brain squirms into incompetence at the very mention of “hours” in order to make a case for “story points”.

Jeff Sutherland said...

One more time.

Story points are based on a reference story that is a unit of measure that the product owner understands. Hours do not work for release planning. That's why over 65% of projects worldwide fail.

The reference is independent of team skill, knowledge, or capability.

There is sufficient research in the academic literature to demonstrate this and practical experience will show the same thing if you try it.

Steve said...

Jeff,

I agree with your thoughts about story points for the product backlog.

Do you teach teams to still estimate the sprint backlog tasks in hours for the burndown charts.

Steve

Jeff Sutherland said...

I and some other members of the Scrum Foundation now teach that you can do estimation in three ways.

1. Backlog in story point and no tasking (hyperproductive teams)
2. Backlog in story points and tasks in story points (best current practice as teams are getting better)
3. Backlog in story points and tasks in hours (the old way)

The latest version of the Scrum Guide says only that work should be broken down into pieces of a day or less. Hyperproductive teams know the probability of stories being completed successfully based on size in points and the act in accordance with this data.

Jeff Sutherland said...

I consulted a product owner of a hyperproductive team and she says they only estimate point for product backlog items. Sometimes they do tasking but never estimate tasks. This almost always shows them that the story is not small enough or clear enough so the best practice for hyperproductive teams is small stories with no tasks.

For enabling specifications, this team using epics, a brief description of a large story which spawns many smaller stories that are executable.

It is critical for a hyperproductive team to have a higher level view than a single story.

Steve said...

Thanks for provided details about the three ways to estimate.

Creating hourly estimates for sprint tasks felt like a form of work in progress, so I'm going to have our Scrum teams start to estimate tasks in story points.

Barry Chase said...

So I get the point that Story Points do not equate to Hours. My company is pursuing heavily SCRUM. However, I still have to provide my reporting estimates in our project tracking respository as Effort Months which is 160 hours per month per person.

So at some point, despite utilizing the Agile process, I have to convert not only my Story Points to hourly overall estimate, but also report my hours spent on a given project.

This seems like I am being asked to do what you indicate should not be done. Am I missing something ?

Jeff Sutherland said...

If you management asks for reports that slow people down is that good management? If you have to provide such reports, doesn't that make you question competency of the management. Maybe you should look for a company with more competent management. General Motors went bankrupt. Have you every analyzed why? What does Toyota measure and why do they measure it?

Steve said...

Jeff,

I have a follow up question related to hyperproductive teams:

1. Backlog in story points and no tasking (hyperproductive teams)

So if there is no tasking and user stories are small, is it typical for developers to select a user story to work on and complete the story by themselves. Each developer in most cases would have all the skills to deliver a completed user story.

Thanks,

Steve

Jeff Sutherland said...

When stories are small one developer can often complete them by themselves. However, pairing will usually help in at least half the cases.

Manish said...

Jeff - Teams who does not task the story or who does not estimates the tasks, how do they develop their Burn down charts?

Also i do not see it practical for one developer to complete the story, as somebody have to test that as well or for some reason it might need the involvement of another developer.

We used to have Hours estimation for tasks and that used to show on Vertical axis against the time.

Ming said...

Hi Jeff, first of all I apologize if my comments came a bit late. It's 2012 and the article was written early 2010.

1. Backlog in story point and no tasking (hyperproductive teams)
2. Backlog in story points and tasks in story points (best current practice as teams are getting better)
3. Backlog in story points and tasks in hours (the old way)

I intend to bring a team to hyperproductive state. Can you advice how will I sum the remaining work as I elevate the practices to 2 and eventually 1?

For practice 2 it's not so tough to imagine in my head. If the story point worth is 3, then I split 1, 1, 1 each task. If the story is 5, perhaps 2 and 3 points each task. Possibly, one task assigned to me and the other for another colleague.
But for practice 1, I really appreciate your advice.

Jeff Sutherland said...

Teams estimate stories in points. As their stories get smaller they no longer need to do tasking. Multiple people will work on a story. The fastest teams tend to get all small stories about the same size and can just count number of stories done for the burndown. I've seen one study show that counting number of stories gave the same results as estimating them. Of course stories near to be good ones and small to achieve this.

Ming said...

A hyper productive team does not use tasking but their stories are small. When you say small stories, you meant based on a stable reference story isn't it?
As opposed to hyper productive team estimating story on their superb performance.

Jeff Sutherland said...

Stories are independent of time. Faster teams get more points per time. By small stories I mean stories average a day or less for the team. So as their performance improves they will be able to take larger stories. However, in the case of Scrum Inc., the stories were stable at 1, 2, and 3 points for a year while the team improved 500%. At the beginning of the year a point took a couple of hours. By the end of the year it was averaging about 20 minutes. So this team did not tend to take larger stories into the sprint and consistently broke stories down so that they could be executed quickly with no dependencies.

Ming said...

Just to confirm my understanding. I would like to use sample stories from sales quotation software.


======Part A Big Stories=======
* As sales manager, manage inventory items for his team to make quotation
* System administrator can assign privileges to user groups to control access to sales function


======Part B Small Stories=====
* As sales manager, delete inventory items so items cannot be used in sales quotation
* As salesman, have privilege to view quotation screen but not edit them


======Part C Small Stories=====
* Code connection to connect with SQL Server Express
* Create an xml document


When you say small stories in hyper-productive team, you mean refining Part A to something like Part B for estimation?

You don't mean stories from Part A eventually worded like Part C in the hyper productive team isn't it?

Jeff Sutherland said...

Part C does not have stories. They are technical tasks that have no user and no value statement. These are not small stories. Part B has larger stories broken down from the users point of view.

Ming said...

So in a increasingly productive team, their user stories look increasingly like Part B.
One year before, their user stories is more like Part A.

Please correct me if I am wrong.
For hyper productive Scrum team, their skill to groom stories (INVEST criteria) is more important than learning techniques to estimate accurate hours? In fact, techniques to estimate hours are irrelevant.

Jeff Sutherland said...

For hyperproductive teams, hours are irrelevant. They are good at breaking stories down into smaller stories. Stories tend to all be small and about the same size in which can counting stories is as good as estimating them in points.

Ming said...

Jeff,

If it's small stories, wouldn't there be traces of working code in production.

For large stories, if the functionality is not in production, then the most of the functionality is not there.

When hyperproductive team work with small stories, are there any differences how their product owner decide what feature to release?

Jeff Sutherland said...

We are talking about high performing teams here and many of your concerns are no longer applicable for such teams. What is the velocity of your teams and how much has it improved over time? At the last company where I was CTO we released at the end of every sprint. The Product Owner decided what to turn on so that it was visible to customers. All code in progress was always in the production build to verify that it would not break anything, even features that were partially complete. This was for enterprise software that had to be installed within a hospital server farm. For web applications, there are companies today going to continuous deployment multiple times a day.

Ming said...

In reality, I want the privilege to join such a team. While waiting, I am just trying to imagine this team.

I imagine if stories are small, they are like threads. Threads are weaved together to form a piece of cloth. Following my analogy, wouldn't threads of functionality appear here & there but the users surprised by these appearances.

But you also said,
"The Product Owner decided what to turn on..."

This team is not only hyper productive, their technical ability is masterful!

blazespinnaker said...

The problem is once a team knows it's velocity, they know that 3 story points = 1 day of work = 8 business hours, so 1 story point = 8 hours / 3. Lol.

Story points or hours, it's all the same thing. Just some mental trickery we're trying to play. The key thing isn't points or hours, but try to get people to think relatively by saying "will this task take the same amount of time as that task".

Playing this game of words seems kind of arrogant to me, frankly.

Jeff Sutherland said...

While at any point in time the team will be burning a certain number of points per hour, the goal is to constantly increase the points per hour, at least 10% per sprint if the team is implementing the Scrumming the Scrum pattern. Furthermore, a recent paper by Microsoft shows estimation error on the order of 10-20% with points vs. 400% for hours in the early stage of a project. Please read the research on the Delphi method and why the RAND corporation told DOD not to use hours. If you don't believe the research nothing can help you.

Claudia said...

Hi Jeff,

If the team is burning more story points per hour each sprint, it means that their velocity increases over sprints. Do they commit each sprint more than for the previous one?

Thank you,
Claudia

Jeff Sutherland said...

I now use yesterday's weather as a standard practice for my Scrum teams. This plays into a pattern discovered by OpenView Venture Partners. Teams that tend to finish early accelerate faster. It's all about acceleration, not velocity. Continuous improvement is fundamental. A flat lined team is living in a "Happy Bubble." The Scrum Patterns Group is working on a pattern for that. The ScrumMaster needs to pop the happy bubble.

Big_J said...

Hi Jeff,

We are having a robust discussion in our scrum team about calculating our velocity.
As normal for new teams, we have had a couple of stories that have failed. We have taken them into our next sprint.
Now, do we re-estimate the velocity based on what's left to do or do we just finish them off. An example is a 13 pointer that only is 80% complete. If we take the 13 points into the next sprint and finish it early, it is like points in the bank.
If we were to re-estimate,it would come out to a three pointer.
Now this is important as we will be working out velocity to estimate a finish date for the project.
If we take the 13 points, I feel we get a fairer indication of our true velocity, when averaged over three sprints, but an artificial acceleration for that sprint.
Is there a right and wrong way to do this?
Cheers,
Cain

Jared said...

@Big_J:
You should count none of the points for incomplete stories towards your point total for Sprint A. Count them all in Sprint B when the story is complete.

It might seem draconian but the velocity per sprint is not as important as the average over multiple sprints. This tends to flatten out the peaks and valleys you get when you fail to complete stories and take them into the next sprint.

Murray B said...

We've trialled both story points and hours for numerous projects here and have settled on points. Hours tended to be an encouragement for managers to ask why an estimate didn't match actual time (they never match exactly of course). Story points are a neat abstraction although for estimation purposes we use 1 story point is half a day so there is at least a gut feel if the budget is large enough. (developers think in terms of half days for estimating).

Murray B said...

We've trialled both story points and hours for numerous projects here and have settled on points. Hours tended to be an encouragement for managers to ask why an estimate didn't match actual time (they never match exactly of course). Story points are a neat abstraction although for estimation purposes we use 1 story point is half a day so there is at least a gut feel if the budget is large enough. (developers think in terms of half days for estimating).

Murray B said...

Hi Jeff, good article.

We've trialled both story points and ideal hours for our projects here and the best results were from a bit of a hybrid.

Most of our work is fixed price so we need to know from the outset how long full project delivery is estimated to take, and for that we use an "exchange rate" of 1 story point = half a day. Most developers (especially those who have not worked on agile projects before) tend to think in terms of days anyway for their estimates, even when there is a reference story to refer to. This exchange rate helps us work out the cost to the client.

But once the project begins, we deal in story points all the way and resist any attempt by managers to refer back to the original association - progress is measured using story points on the burn down chart only.

cmonthenet said...

Jeff, I was not able to find the Microsoft Research article that you referred to but would be interested in reading it. Could you please provide a title or better yet a link to the article/paper. Thank you.

Jeff Sutherland said...

Scrum + Engineering Practices: Experiences of Three Microsoft Teams (see blog item for link)
Laurie Williams, Gabe Brown, Adam Meltzer, Nachiappan Nagappan
IEEE Best Industry Paper award winner

Captain said...

Jeff I'm afraid I agree with the points that Daniel made. I don't think you're reading what he said carefully enough. A quick example.

I have 2 identical teams of 3 working let's say 40 hours a week. Team A works in hours, Team B in story points.

There are 3 tasks, one big task and 2 equally smaller tasks.

Team A estimates they can get task 1 done in 60 hours, task 2 in 30 hours and task 3 in 30 hours.

Team B estimates 20 story points for task 1, and 10 a piece for task 2 and 3.

They both complete the work in the same time. so 120 hours for team A and 40 story points for Team B.

Do you see where i'm going here? The only difference between the 2 is a relative factor, which is the story-point/time factor. A total of 120 hours and 40 story points mean that 1 story point is equivalent to 3 hours of work when the sprint's workload is taken into account.

The two measure the exact same thing. It's just a number that in the end that a developer has estimated and can therefore be related to each other.

Now I do believe story points are better, as with most things in computing, relative tends to cause less problems than absolute, but there is no denying that at the end of the day this choosing hours over story points will not make the project fail any more than if you use story points. It's the processes around it that will make or break a software project.

Jeff Sutherland said...

At any point in time there is a relationship between points and hours. However, that is constantly changing as the team gets better. Hyperproductive teams abandon the notion of hours as it slows them down. I have seen no exceptions to this.

grimaldi said...

Interesting, though controversial, article! I am forced to agree with Daniel though. Ultimately, there will be a conversion factor that gets developed between story points and hours. And to say that this is not possible because velocity will always be increasing does not make any sense to me. There is only so much process improvement that one can achieve for any software development team before it levels off (while still preserving a modicum of work/life balance as well)! Granted, there will be new tools and techniques that come to the fore periodically (which will provide a sustained boost in productivity) but those will be on an intermittent/infrequent basis.

IMHO, after implementing on numerous projects, a way you can introduce some “relativity” into the hours estimation technique (and make it more flexible) is to use a weighted average for each “story” (i.e., task). So for Story A, the developers might say that optimistically, it will take 4 hours, the expected time is 8 hours, but the pessimistic time average is 12 hours. You then compute the weighted average using the formula (1Opt + 4Exp + 1Pess)/6. In this case, we can glean that the developers seem to have a pretty good feel for how long the task will take as the weighted average comes out to equal the expected value of 8 hours. But for Story B, the developers come up with 8 = Opt, 16 = Exp, and 40 = Pess. This tells us that there is more uncertainty associated with this task which will be reflected in the resulting weighted average of 18.67 hours being greater than the expected value.

Curious as to what your thoughts are on this! :^)

Jeff Sutherland said...

At any given point in time there is a direct relationship between points and hours and it changes, just like time to drive to work changes every day depending on how fast you go. We know the average peak velocity for great teams is 15 function points per developer per month. I don't think anyone posting on this list has reached that state so don't worry about it. Just know that your velocity can increase if you make more process improvements. Great teams always think they can stretch further so thinking you are there is a clear sign of disfunction. When Toyota Japanese management visited the first Toyota plant in the U.S. after it had been up running six months they saw that the teams had not shut down the line often enough to fix problems. The senior Japanese manager went to the andor cord and pulled it stopping the entire production line. Gathering the workers together he gave them a lecture on "No Problem is Problem!!!".

Santosh Kasula said...

Hi Jeff,

I took CSM course in July@Boston. I have to work towards creating small user stories, so that I don't need to break them into tasks. But we already have a backlog that we estimated in terms of story points. Now I am trying to estimate my tasks in terms of story points. Should the total of tasks' story point equal to estimated story point for the user story? Because when I divide a story point into tasks and take the smallest task as the relative estimation point, the total of our tasks' story point is differing from the estimated story point for user story.

Thoughts for the Day said...

Hi Jeff,
I must I still coach teams on story points for estimating user stories and hours for estimating tasks. The process has worked very well for the teams and really has helped them understand or see when they are over committing during a sprint planning sessions. It also has helped teams understand that estimates are estimates and not written in stone.
For teams new to Scrum I think this hours & story points eases their transition into Scrum. Although for more practiced teams I will suggest that we move away from the hours practice and keep everything based upon points ans see what they think.
Paul

Mihai Marinescu said...

Hi Jeff,
I am PM and ScrumMaster for a small team (3 people) and we choose to use Story Points for User Stories and ideal hours for Technical Tasks that belong to a User Story. For us i admit it's cumbersome to work in two units and my developers tend to use 1 story point to 1 ideal day of work. I keep on saying that it's not the way to do it. I also see that logging time is not going so well as i expected. The reason we use ideal hours for technical tasks is that we want to track an burndown for the sprint we are working on. As i read your article i see that you suggest to use story points for everything and suggest we don't need ideal hours. I would like to try that but i see problems like these :

1. I have a small team and that means we would need a lot of User Stories in a Sprint to actually have a useable sprint burndown chart.
2. Because of 1 we need to split the User Stories to be very small and this it's hard for me to convince the developers to do that since it must be end to end functionality that brings something useful for the customer. I don't know how others are doing this but i know what the books are saying. The reality is a big difference from the books imo.
3. Books say "keep your stories independent" so when you go down and split features into 1 day User Stories then it's becoming VERY hard if not impossible to not have dependencies, how does people solve this problem ? I see this problem in our team and this is on User Stories which are larger than 1 day of work. I can imagine what will happen if we went to 1 day of work User Story.
4. User Stories must be written by a Product Owner, how do you teach a product owner to be able to split User Stories in smaller ones and abstract from the fact that the user story itself alone will not give any value to the customer without also implementing User Story A, B, etc ?

/Mihai

Jeff Sutherland said...

At ScrumInc, we started a team which now is seven. Two people are half time so 6 FTE equivalents. Our velocity varies sprint to sprint but runs about 200 points and increasing. The average story size is 2 points so there are 100 stories in the sprint. We have 1.5 product owners and you have a ScrumMaster trying to be Product Owner which causes problems. Our team members have to help our product owners a lot by writing stories. When we started we were only doing 40 points so only needed 20 stories in a sprint and you are a third our size so only need 6-7 stories in a sprint until you go 500% faster when you need about 32 stories in a sprint. You have a long list of questions so I'm going to answer in detail in the ScrumLab.

jpartogi said...

I still don't understand what you mean in this part:
"Hours completed tell the Product Owner nothing about how many features he can ship and when he can ship them."

If 1 week has 40 working hours per person, that means a 2 week Sprint would have 80 working hours per person. If 1 feature takes 4 hours to complete wouldn't the Product Owner also know when he will get the feature?

jpartogi said...

I still don't understand what you mean in this part:
"Hours completed tell the Product Owner nothing about how many features he can ship and when he can ship them."

If 1 week has 40 working hours per person, that means a 2 week Sprint would have 80 working hours per person. If 1 feature takes 4 hours to complete wouldn't the Product Owner also know when he will get the feature?

jpartogi said...

I still don't understand what you mean in this part:
"Hours completed tell the Product Owner nothing about how many features he can ship and when he can ship them."

If 1 week has 40 working hours per person, that means a 2 week Sprint would have 80 working hours per person. If 1 feature takes 4 hours to complete wouldn't the Product Owner also know when he will get the feature?

Jeff Sutherland said...

The Product Owner owner does not know they can complete a point in 4 hours unless he knows the velocity of the team. The point is to get the Product Owner the velocity. Velocity is measure in units of features delivered. Over 50% of people doing Scrum do not know their velocity and cannot give a reliable release date. This is a huge impediment in that it cripples continuous improvement and makes it impossible to go hyperproductive. It also leads to serious negative side effects that we observe in our company assessments of Scrum implementations. In particular if facilitates managers micromanaging employees and derails self-organization.

Fabrice Aimetti said...

Hello Jeff,

Your post is great! I have translated it into french :
Points de Story - Pourquoi sont-ils préférables aux heures ?

Regards,
Fabrice

chronologist said...

@Grant (PG) Rule: Sorry that the argument of FPs always keeps coming back when anyone talks about SPs. I developed my thoughts around this here: http://chronologist.com/blog/2011-10-08/function-points-are-fantasy-points/ and here: http://chronologist.com/blog/2011-10-11/story-points-are-not-function-points/.

@ All those confusing SPs with time: Mike Cohn explained this very eloquently several times. SPs are a measure of effort, not of time. I will run 100m in 30 seconds... while Usain Bolt will do that in less than 10 seconds. SPs are like those 100m.

@Jeff: Thanks for valuing SPs. There are even more interesting ways of using SPs, especially when connecting to the financial estimates thereof. See my chapter in "Agility Across Time and Space: Implementing Agile Methods in Global Software Projects" (Springer, 2010). It would be interesting to hear your opinion on that.

Paraic Hegarty said...

I totally get the difference between story points and hours and the big psychological difference between estimating using one rather than the other.

However, we measure our sprint efficiency for every sprint and track its moving average. In the sprint we finished on Friday we had 1.12, up from 1.02 the previous sprint - although our 3 month moving average remains 1.08.

We were asked to cost a change request and, of course, we have an hourly rate card. We estimated in story points and used a 20% allowance for budgetary purposes. I don't believe I did anything 'wrong' in using our sprint efficiency to convert story point estimates to man hour estimates for costing purposes.

Did I?

Jeff Sutherland said...

At any point in time you can divide the sprint velocity by hours in the sprint to get current hours per point. However, that will change over time based on team performance. In a paper we published on distributed Scrum the hours per point went up when half the team moved back to India because the Indian half of the team started assigning tasks and stopped self-organizing. When this was fixed, the hours per point went back down. Later the teams started working overtime which always causes more defects and fixing them drives up the hours per point. Ideally, your hours per point are going down 10% sprint to sprint because you are implementing the pattern "Scrumming the Scrum." See scrumplop.org.

Mario Moreira said...

I love this discussion. It highlights what I believe to be the primary culprit in estimation discussion. IMHO as long as we continue to use the work estimate, people bring their traditional mindset and attempt to use hours, days, weeks, etc. into the story points discussion. As Jeff points out, its not about schedule, its about functionality (which I term as scope). This is why I strongly advocate dropping the work "estimate/estimation" completely from the Agile lexicon and go with the term "size". Why? Size refers to functionality or scope while estimate typically is aligned with schedule. So in effect, they are measuring two different things. You can read more about this in my blog article at: http://cmforagile.blogspot.com/2012/10/size-matters-using-size-instead-of.html Cheers!

The Cranky Ex-Manager said...

For me the problem with "hours" is what happens when I finish early -- it happens rarely but it happens. If I estimated 10 hours and it only took 8, then I think "I've got 2 hours to spare -- time for a break!" but if the estimate was 10 points and I finished early, I'm likely to pick up the next work item, giving the team a productivity boost.

As a side effect, I don't suffer from "student syndrome" wherein I wait until the last minute to start working on the next story.

Mark H said...

In my experience there are fundamental flaws using either points or ideal hours, so what works best for a project tends to be contextual for that team.

A big problem with points is that when you have a multi-disciplinary team, a "reference story" quickly loses its value. If each story - encapsulating an entire piece of business functionality - requires input from an IA, a designer and a developer, then you have 3 sets of people all trying to compare against a reference which may have varying degrees of input required per person. E.g. If the reference story is to build a sitemap for a website, that may be trivial for the designer but quite complex for both the IA and the developer... whereas "website footer" may require almost no effort for the developer and IA, but is several days of work for a designer. You end up with 3 distinct points of reference for story points, which means the total points for each story is meaningless.

Another problem with points is that useful project planning is practically impossible until you've completed at least one sprint and know what your velocity is - anything else is pure guess work. And the thing about planning is that people tend to want the plans ahead of time, and they want it to relate to time and/or money. That's why they're called plans.

Estimating in hours also has problems; as Jeff has said, as a team picks up momentum they tend to get more productive. Some of this gain can be hidden when estimating in hours, because people naturally adjust their expectations of effort as they know more about the problem... so hour-estimates tend to go down at the same rate that efficiency goes up, leading to a levelling of recorded velocity. This doesn't affect story points if done properly - the key there being that "if".

Based on my experience, the lesser of these two evils is to use hours, or at least to map points to hours. The vast majority of businesses need to plan, they need to do that in advance and they need real-world metrics in those plans. Hour estimates have flaws, but they're workable and can be easily explained to most people - in practice the business sponsors don't tend to care much about velocity if they can see that work is being completed at a reasonable pace.

Points may work well for continuous, ongoing work which is fairly homogeneous (a support desk, for instance), but for time-boxed projects involving multiple disciplines - especially if each project is covering new ground - they are much harder to use effectively.

And lastly, saying "Maybe you should look for a company with more competent management" is effectively admitting that an approach is useless to many businesses, and is arrogant in the extreme. If your response to "Please help me plan this project" is to say "I won't support that way of planning", that is also an admission of failure. You may be able to pick and choose the companies you work with based on their management style, but most people don't have that luxury. Asserting that an agile doctrine is more important than established business planning, which is essentially what you're saying, is the very definition of the tail wagging the dog. Whatever happened to "Individuals and interactions over processes and tools"?

Dan said...

What is wrong with function points?

Why do we need a new metric for sizing projects? IMHO story points technique is a digest of FP with a limited use and restriced to a team.
Using FP is not as easy as using SP but if you learn how to use them there is a good ROI.

Chris Fortuin said...

My 3 simple rules for estimating:
1. Avoid wasting time in decomposing work to increase accuracy
2. Take whatever unit of measurement you prefer (ie. story points, hours)
3. Be consistent in your estimates

Next, execute the next phase/iteration of your project and compare your baseline estimates with your actuals.

Finally, use the great equalizer to convert your estimates in whatever unit of measurement to actual money/time:
> In Waterfall it's the Cost Performance Index (CPI) and Schedule Performance Index (SPI) as part of Earned Value Management (EVM);
> In Agile we re-invented the wheel and called it Velocity.

Adri said...

Hi Jeff, how do you measure the budget based on the story points since the work is not based in hours?

Adri said...

Hi Jeff, how do you estimate the budget using story points since the resources cost are based in the men-hour?

Jeff Sutherland said...

Budgeting in points is faster, easier, and more accurate than in hours. Once you have a number on points per sprint for a team, you know the cost of a team sprint. A product release takes a specific number of team sprints based on the product backlog. Systematic, a CMMI Level 5 company, does almost 100% fixed price projects with Scrum. They can estimate up front at 20% of the cost of submitting a waterfall proposal and they consistently deliver at half their waterfall price with 40% fewer defects. Go to the link on the right side of the blog "Jeff Sutherland's Papers" and download the Scrum and CMMI papers to get an idea of how they do this.

Chad Allen said...

While I'm a believer in story point estimations, I wanted to point out that the Microsoft article you reference does not provide evidence that points are better than hours: "The teams used Planning Poker to estimate the person hours required to complete functionality within an iteration." The article posits that estimating small stories using planning poker is an effective process, but it is very explicit about the fact that the teams used hours and not points. I see this article referenced a lot with respect to point estimation, and while it has a lot of good evidence that SCRUM is effective, it doesn't support any arguments for using story points vs. hours. You reference some other literature that describes the effectiveness of points vs. hours. Can you share that here?

Russell Smith said...

Hi Jeff

I would like to ask about two things your have said. In the article:

"A three point story today is three points next year.."

and in one of your responses:

"It's all about acceleration, not velocity. Continuous improvement is fundamental. A flat lined team is living in a "Happy Bubble." The Scrum Patterns Group is working on a pattern for that. The ScrumMaster needs to pop the happy bubble."

I recently started a thread in the Scrum Practitioners LinkedIn group (see here: http://linkd.in/11f2pE1) where I asked the following:

"My teams use Poker Planning and the Fibonacci series in order to size stories. They're reasonably experienced with it now and are able to agree on a size for each story relatively quickly and with confidence. Recently we had a story to size and it was almost exactly the same as a story from a sprint some time ago. My expectation was that it would get the same number of points (5) but it didn't, it got 3 because the team said that they were now more familiar with the technology than they were when they sized the last one, and therefore the effort would be less. It was physically the same amount of work, but the time it would take would be shorter because they'd need less 'thinking/figuring out' time.

I suggested that it should be the same size, and the increase in the teams velocity would reflect the fact that their experience/knowledge with the technology had increased.

Then I thought - what is the velocity figure for? Is it so we can measure the teams performance, in which case it should increase over time? Or is it something that we'd like to remain stable so that the PO (and any other stakeholder) can use it to forecast release dates (amongst other things)?"


Until coming across your article my mind had settled on the latter. In fact, even after reading your article my mind hasn't changed, although I am beginning to doubt my conclusion.

Are you able to expand on what you have said about this so that I can better understand?

Thanks!

Jeff Sutherland said...

The whole point of velocity is to see it increase because the team can do stories faster. Reference stories must be stable to achieve this. If you do not have stable references, velocity is meaningless. All your historical data becomes worthless. You can't tell if the team is getting better and you can't tell if a process improvement increased or decreased velocity.

ScrummyScrum said...

I tend to agree with Daniel and his posts. I also don't think you can directly associate the fact that 65% of projects worldwide fail because they use a measurement unit of "hours". I would say that the primary reason for this staggering failure rate is due to communication problems and introducing confusion of a unit of measure (whether it be "story points" or "hours") contributes to these communication problems.

Love Scrum and have been using it successfully for 4.5 years now and do not see the failure rates that we should expect based upon the content of this blog posting.

Story points are based on a reference story that is a unit of measure that the product owner understands. Hours do not work for release planning. That's why over 65% of projects worldwide fail.

ScrummyScrum said...

I tend to agree with Daniel and his posts. I also don't think you can directly associate the fact that 65% of projects worldwide fail because they use a measurement unit of "hours". I would say that the primary reason for this staggering failure rate is due to communication problems and introducing confusion of a unit of measure (whether it be "story points" or "hours") contributes to these communication problems.

Love Scrum and have been using it successfully for 4.5 years now and do not see the failure rates that we should expect based upon the content of this blog posting.

Story points are based on a reference story that is a unit of measure that the product owner understands. Hours do not work for release planning. That's why over 65% of projects worldwide fail.

Jeff Sutherland said...

Good point. Actually 86% of waterfall projects worldwide fail and it is not all because of hours, although hours give much worse estimates and contribute to the high failure rate.

Let's reiterate that the error rate of hourly estimates is much higher than points and it can take 8 hours to estimate something using hours that can be estimated in 10 minutes with points.

So if better estimates and 50 times less effort are required for points and you continue to insist on hours, there is no hope in convincing you of anything.

Unknown said...

Hi Jeff,

just one small question... what is the best practice to be used at the initiation point of a project? for example in our company when starting a new project, the management is entitled to get an initial estimate for the whole project or the first 1 or 2 releases... how can this be determined by the SCRUM Team at the start?????

Tom said...

Hello Jeff,

Would you say that it is important that the estimates for a team's set of reference stories are all made at the same time? Why?

(Apologies if the answers are obvious and/or well known.)

Thanks!

Jeff Sutherland said...

Comments on a couple of good questions.

1. Estimating fix price projects (all estimates up front) is commonly required. You just create a proper product backlog and estimate it. Stories close in a small and stories further out are in large chunks. Systematic, a CMMI 5 company, does almost all fixed price projects and their data show this strategy eliminates 80% of the total project management cost for a project while giving as good or better estimates.

2. Estimates are estimates for the team to get a story done (not an individual) and reference stories need to be stable over time. We review our reference stories every quarter and make sure that they are stable. A liter of water is a liter today and a liter tomorrow no matter how fast you drink it. To plan we need a velocity that is based on a stable reference.

Pedro Ferreira said...

Hey Jeff.

I completely agree with you about the value added by estimating in Story Points. With the historical data from the past, you can give with a high degree of confidence, the amount of SP that a team produces in a Sprint.

I have now (re)started to estimate in SP using th Fibonaccy Series (the team felt more confortable by estimating in hours, tasks that have been drilled down to deep), and I have a question.

We use the smallest story as a "ruler" that has 2 Story Points. So far so good. But when we start facing big values , like 13SP or 20SP, we, humans, struggle to properly compare. Do you believe that it is OK to double check an estimation, with another story that has been sizes a few moments ago?

Imagine that the team estimates :

Story 1 = 1 SP
Story 2 = 8 SP
Story 3 = 16L has just been

As a sanity check, is it valid to question if Story 3 that is 13 SP in size, is accually a little bit less than 2x the work for Story 2?

Is not a question of changing the Story Points referential (1SP), but instead compare 2 pieces of work that .

Andrew Webster said...

I have a question: How do you measure accuracy of estimation on a sprint-by-sprint basis?
I'm trying to use the "Hyperproductive" metrics, and while 100% accuracy is easy to define, I'm not clear what is meant by say, 80% accuracy. Does that mean that all stories in a sprint have been completed within +/- 20% of their original estimate? What algorithm do you use?

Jeff Sutherland said...

The 95% confidence interval on a story is about 50% to 200%. This means if a story is two points, most of the time it will be between 1 point and 4 points. The average at the end of a sprint will have about 20% error (check your statistics book). So you should expect your velocity to fluctuate plus or minus 20%. W. Edwards Deming taught us not to try to control normal error as it leads to erratic system behavior.

Lisa Zentz said...

Personally, a is one epic: bullet 1. Bullet 2 is a rule. Both B's are subs under the Epic which will have additional sub tasks ( prod driven/technical debt). I am new to urge thread but assume this extends beyond academia into quantifiable execution.

As a result, like Daniel, anxiously await an irrational and tactical explanation of story points vs J
Hours. Also, more information in how agile/story points more effectively met CMMI 5 requirements? I knew we weren't blazing a trail - thank you.

Lisa Zentz said...

Personally, a is one epic: bullet 1. Bullet 2 is a rule. Both B's are subs under the Epic which will have additional sub tasks ( prod driven/technical debt). I am new to urge thread but assume this extends beyond academia into quantifiable execution.

As a result, like Daniel, anxiously await an irrational and tactical explanation of story points vs J
Hours. Also, more information in how agile/story points more effectively met CMMI 5 requirements? I knew we weren't blazing a trail - thank you.

Jeff Sutherland said...

There are several CMMI 5 papers you can find by clicking on "Jeff Sutherland's Papers" on the right side of this blog homepage. Systematic does virtually all fixed price projects at CMMI Level 5 and is the only CMMI 5 company with Scrum as the standard process. They cut 80% of project management costs out of projects by using product owner, product backlog, and story points for estimation. They consistently deliver projects at half the waterfall price.

Dale said...

I really like this discussion.

There is one part of it that I have found to not work well in practice. Jeff says in the main post, "A three point story today is three points next year..." If the story point is a team's estimation of how hard something is for them to implement, than as the team gets more skilled the same story should get easier. Over time the story point estimations should drop for the same complexity, right? If you have turn-over and green developers coming in this may average out for your group. If you are a company that focuses on long-term employee support than the number of points for a given story should drop. The points should also drop as more and more very similar stories are executed by the team, right? Or am I missing something here.

I have found velocity to be greatly abused by some VCs and executive management. They do not understand the relative nature of velocity based on the relative nature of story points. If you are trying to invest in company A vs. company B one having a velocity of 100 compared to 200 is meaningless. I have met several executives that would find all sorts of ways to artificially inflate velocity because, "it will be easier to attract investors." I cringe when I hear these statements.

I would like to hear someone - preferably with data - speak to why story points for a given story should not decrease as a team increases it's abilities. If my experience of this is normal than "improvement" can result in more throughput without a proportional increase in velocity.

Jeff Sutherland said...

Grant,

Many people have the fundamental misconeption your articulate in your note. Story points are a measure of feature delivery. As the team gets faster, because reference stories are stable, velocity increases. The miles from my house to OpenView Venture Partners in Boston where I do Scrum training is the same yesterday, today, and tomorrow. It usually takes 45 minutes to an hour. If I get exactly in the right timing, it can take less than 20 minutes, a 300% improvement in velocity. There has been no change in points.