They're not perfect (nothing is), but they're actually pretty good. Every task has to be completable within a sprint. If it's not, you break it down until you have a part that you expect is. Everyone has to unanimously agree on how many points a particular story (task) is worth. The process of coming to unanimous agreement is the difficult part, and where the real value lies. Someone says "3 points", and someone points out they haven't thought about how it will require X, Y, and Z. Someone else says "40 points" and they're asked to explain and it turns out they misunderstood the feature entirely. After somewhere from 2 to 20 minutes, everyone has tried to think about all the gotchas and all the ways it might be done more easily, and you come up with an estimate. History tells you how many points you usually deliver per sprint, and after a few months the team usually gets pretty accurate to within +/- 10% or so, since underestimation on one story gets balanced by overestimation on another.
It's not magic. It prevents you from estimating things longer than a sprint, because it assumes that's impossible. But it does ensure that you're constantly delivering value at a steady pace, and that you revisit the cost/benefit tradeoff of each new piece of work at every sprint, so you're not blindsided by everything being 10x or 20x slower than expected after 3 or 6 months.
Sorry if it comes through as rude, but this is how I keep repeatedly being told story points work.
If you look at all those properties together, story points are completely useless.
The only moment time it makes sense is when you have a SHARED understanding of the smallest point AND you can translate it to time. When you do that, story points are useful. Also, they become time, so there is no reason to use points.
I’d like to disagree on that one. A single story point shouldn’t be translated to time, but should reflect the relative complexity between tasks (ie. a 7 is harder than a 3 and so on).
You could assign relative complexity based on a number of things:
- number of integrations to other systems, - is the area well known to the team, - is the code well tested, - is CI/CD set up, - do we need a lot of alignment or can we just get started, - etc.
So you’re not estimating time, but complexity or hardness.
Then, supposing you have a stable team, you can go back six months and find out “we do on average 90 points per month” or similar
- Time
- Risk of being wrong
When you do what you just said "I am not estimating time, I'm estimating risk"."This will take between 1 and 3 days" gives you both: the risk (complexity, hardness) which is represented by the gap, and time: how long it takes.
When a non engineer asks for an estimate, they usually mean one of these two things:
1. How long it takes?
2. Have you had experience with something similar before?
The second one can come also through the question "how challenging do you think that is?" To which we answer "easy but long" "hard" (never done it) or things like that. That's easier to answer, but doesn't translate to dates.For the first one, you CANNOT use what you just described, since it doesn't represent time, so you cannot give dates in any form.
That's the purpose of story points and planning poker. They don't represent time guarantees or delivery dates. That's not a bug, it's a feature.
They represent estimated effort, with a recognition that uncertainty is generally roughly proportional to size. Which is why point estimates are usually restricted to approximately Fibonnaci sequences values (or sometimes doubling). Often it will be limited to 1, 2, 3, 5, 8, 13, 20, where stories aren't allowed to be larger than 20 -- if they are, you need to break them apart.
So to be clear, when you say that estimates are composed of two parts -- time and risk -- planning poker intentionally conflates them into a single number. If a task is both large and truly high-risk, then it should be broken into a research story and an implementation story, where the size of the research story can be better estimated, and then implementation waits for the next sprint depending on what research tells us. If it's small and high-risk, then you basically just "try", and accept that it might not get delivered, and then revisit in the next sprint whether it's worth trying again and if it needs to be revised.