And yes, when there's a task that one person happens to know most, people will often defer to them. But that in itself is educational, as the experienced dev explains why the given task is easy/hard. And every task is different, so the person you're deferring to will be different, and you still often get the two or three people who know the task best disagreeing until they hash it out, etc. And very often it's another person who can point out "well it would be that easy except three sprints ago I did x and so you'll now need to do y...". And of course plenty of tasks really are brand-new so everyone's figuring it out together.
If you're really not having actual discussions around complexity in planning poker, then the facilitator/lead/manager might be doing it wrong. You do have to create an environment where people are expected to speak up and disagree, to demonstrate that this is welcomed and expected and rewarded, and not just some kind of checkbox exercise where the most senior dev gives their estimation and everyone agrees. This is also a reason why it's literally done with cards where everyone is forced to put their number on the table at the same time, so that you don't wind up with some senior person always going first and then everyone else just nodding and agreeing.
How are you estimating the points if not thinking about how hard the task is for you and how long is it going to take you?
And then another matter is that points do not correlate to who later takes that work. If you are 5 seniors and 3 juniors and average on a task being a 3, but the task falls to a junior, they will take longer as is expected for his experience.
https://www.atlassian.com/agile/project-management/estimatio...
Points are not intrinstic or objective attributes, like the sky being blue. The scale is arbitrarily chosen by any given team, and relative to past work. But a common reference point is that 1 point is the "smallest" feature worth tracking (sometimes 1/2), and 20 points is usually the largest individual feature a team can deliver in a sprint. So it's common for teams to be delivering something between e.g. 50 and 200 points per sprint. Teams very quickly develop a "feel" for points.
> And then another matter is that points do not correlate to who later takes that work.
Yes, this is by design. Points represent complexity, not time. An experienced senior dev might tend to deliver 30 points per sprint, while a junior dev might usually deliver 10. If a team swaps out some junior devs for senior devs, you will expect the team to deliver more points per sprint.
I hear this often, but I've never met someone for whom points didn't eventually turn into a measurement of time - even using the exact process you're describing.
I think any process that's this hard to implement should be considered bad by default, barring some extraordinary proof of efficacy.
The goal isn't to avoid time estimation completely, that would be crazy. People estimate how many points get delivered per sprint and sprints have fixed lengths of time. You can do the math, you're supposed to.
The point is that points avoid a false sense of precision: >>46748310
The process is quite easy to implement. And it does wind up with extraordinary efficacy gains on a lot of teams, that's the whole reason why it's so popular. But you do have to actually learn about it. Here:
https://www.atlassian.com/agile/project-management/estimatio...
> An experienced senior dev might tend to deliver 30 points per sprint
Seems a bit ironic that complexity doesn't measure time but then we are measuring how much complexity can someone deliver on average on a given time. Isn't complexity directly proportional to uncertainty factors, and therefore inversely proportional to confidence of time to completion?
Basically, yup. It takes a few sprints to start to establish a meaningfully reliable sense of velocity, and the estimation accuracy is why planning poker takes a couple of hours of real discussion over feature complexity, rather than just a few minutes of superficial guesses. But the end result is a far more accurate ability to estimate what a team can reliably deliver in a sprint, and is really good at bringing stakeholders down to earth in terms of what can actually realistically be delivered.
> Seems a bit ironic that complexity doesn't measure time but then we are measuring how much complexity can someone deliver on average on a given time.
What's ironic? And no, it's not about "someone", it's about the the team. Different people on the team will be able to deliver different numbers of points depending on their skill, experience, etc. This is a major reason for not using time -- it actively recognizes that different people take different amounts of time, that things like sick days and meetings are taken into account, etc.
> Isn't complexity directly proportional to uncertainty factors
Yes, this is an explicit assumption of the Fibonnaci-style points usually used.
> and therefore inversely proportional to confidence of time to completion?
Yes, which is precisely why stories over a certain size are disallowed (the feature must be broken up into parts), and why sprints are measured in a very small number of weeks -- to avoid the accumulation of too much uncertainty.
The Fibonnaci sequence of point values has wound up just being a lot simpler for most people, as it encapsulates both size and error, since error tends to grow proportionally with size.
I.e. nobody is arguing over whether it's 10h +/- 1h, versus 12h +/- 1h, versus 12h +/- 2h, versus 11h +/- 3h. It's all just 5 points, or else 8 points, or else 13 points. It avoids discussion over any more precision than is actually reliably meaningful.
Otherwise, it would be impossible to have 20-point stories done in a 10-business-day sprint! Under the usual assumption that a single person is responsible for the whole story.
For the teams I've been on, a point has usually been more like a third of a day or half a day, i.e. 2-3 hours of uninterrupted concentration, and the 1/2 point card is used rarely. Sounds like you've probably used 1/2 point stories a lot more...
But this is why points are arbitrary. Each team decides whatever precise scale it wants. And it really depends on the type of work you're doing too -- whether the smallest stories tend to be things that are day-sized or things that are 2-hour sized.
2-12d conveys a very different story than 6-8d. Are the ranges precise? Nope, but they're useful in conveying uncertainty, which is something that gets dropped in any system that collapses estimates to a single point.
That said, people tend to just collapse ranges, so I guess we all lose in the end.
In agile, 6-8d is considered totally reasonable variance, while 2-12d simply isn't permitted. If that's the level of uncertainty -- i.e. people simply can't decide on points -- you break it up into a small investigation story for this sprint, then decide for the next sprint whether it's worth doing once you have a more accurate estimate. You would never just blindly decide to do it or not if you had no idea if it could be 2 or 12 days. That's a big benefit of the approach, to de-risk that kind of variance up front.
Having implemented it myself, I agree it is easy to implement. My argument is that it is overly difficult to maintain. My experience is that incentives to corrupt the point system are too high for organizations to resist.
Funnily enough - I work closely with a former director of engineering at Atlassian (the company whose guide you cite) and he is of the opinion that pointing had become "utterly dishonest and a complete waste of time". I respect that opinion.
If you have citations on pointing being effective I'd be very interested. I consider myself reasonably up to date on SWE productivity literature and am not aware of any evidence to that point - I have yet to see it.
That's just too slow for business in my experience though. Rightly or wrongly, they want it now, not in a couple of sprints.
So what we do is we put both the investigation and the implementation in the same sprint, use the top of the range for the implementation, and re-evaluate things mid-sprint once the investigation is done. Of course this messes up predictability and agile people don't like it, but they don't have better ideas either on how to handle it.
Not sure if we're not enough agile or too agile for scrum.
Why? It's not like it was some fad that didn't work. When things work, organizations tend to stick with them.
I'm not saying it never happens, but the whole reason for the planning poker process is to surface the things that might turn a 3 point story into a 13 point story, with everyone around the table trying to imagine what could go wrong.
You should not be getting 2-12 variance, unless it's a brand-new team working on a brand new project that is learning how to do everything for the first time. I can't count how many sprint meetings I've been in. That level of variance is not normal for the sizes of stories that fit into sprints.
I think it often depends a lot on who the stakeholders are and what their priorities are. If the particular feature is urgent then of course what you describe is common. But when the priority is to maximize the number of features you're delivering, I've found that the client often prefers to do the bounded investigation and then work on another feature that is better understood within the same sprint, then revisit the investigation results at the next meeting.
But yes -- nothing prevents you from making mid-sprint reevaluations.
I'm not aware of any citations, just like I'm not aware of any citations for most common development practices. It seems to be justified more in a practical sense -- as a team or business, you try it out, and see if it improves productivity and planning. If so, you keep it. I've worked at several places that adopted it, to huge success, solving a number of problems. I've never once seen a place choose to stop it, or find something that worked better. If you have a citation that there is something that works better than points estimation, then please share!
It's just wisdom of the crowds, or two heads are better than one. Involving more people in making estimates, avoiding false precision, and surfacing disagreement -- how is that not going to result in higher-quality estimates?
If you don't strictly work on a Sprint schedule, then I think it's reasonable to have high variance estimates, then as soon as you learn more, you update the estimate.
I've seen lots of different teams do lots of different things. If they work for you and you're shipping with reliable results then that's excellent.
Stepping back - my experience is that points are solving a problem good organizations don't have.
The practice I see work well is that a senior person comes up with a high level plan fror a project with confidence intervals on timeline and quality and has it sanity checked by peers. Stakeholders understand the timeline and scope to be an evolving conversation that we iterate on week-by-week. Our rough estimates are enough to see when the project is truly off-track and we can have a discussion about timelines and resourcing.
I just don't see what points do for me other than attempt to "measure velocity". In principle there's a metric that's useful for upper management, but the moment they treat it as a target engineers juice their numbers.
Make sure you account for how often someone comes back from working on a 3-point story and says "actually, after getting started on this it turned out to be four 3-point tasks rather than one, so I'm creating new tickets." Or "my first crack at solving this didn't work out, so I'm going to try another approach."
Granted, they're point estimates not time estimates, but it's the same idea -- what was our velocity this sprint, what were the tickets that seemed easier than expected, what were the ones that seemed harder, how can we learn from this to be more accurate going forwards, and/or how do we improve our processes?
Your tone suggests you think you've found some flaw. You don't seem to realize this is explicitly part of sprints.
I'm describing my experiences with variances based on many, many, many sprints.
On the one hand, they simply can't. They're a measurement of effort, and a junior dev will take more time to finish a story than a senior dev will. On the other hand, at the sprint velocity level, yes of course they're supposed to be equivalent to time, in the sense that they're what the team expects to be able to accomplish in the length of a sprint. That's not dishonest, that's the purpose.
> The practice I see work well is that a senior person comes up with a high level plan fror a project with confidence intervals on timeline and quality and has it sanity checked by peers... I just don't see what points do for me other than attempt to "measure velocity".
Right, so what happens with what you describe is that you're skipping the "wisdom of the crowds" part, estimation is done too quickly and not in enough depth, and you wind up significantly underestimating, and management keeps pushing the senior person to reduce they're estimates because there's no process behind them, and planning suffers because you're trying to order the backlog based on wishful thinking rather than good information.
What points estimation does is provide a process that aims to increase accuracy which can be used for better planning, in order to deliver the highest-priority features faster, and not waste time on longer features that go off track where nobody notices for weeks. Management can say, "can't they do it faster?", and you can explain, "we have a process for estimation and this is it." It's not any single employee's opinion, it's a process. This is huge.
> but the moment they treat it as a target engineers juice their numbers.
How? Management doesn't care about points delivered, they care about features delivered. There's nothing to "juice". Points are internal to a team, and used with stakeholders to measure the expected relative size of tasks, so tasks can be reprioritized. I've never seen sprint velocity turn into some kind of management target, it doesn't even make sense. I mean, I'm sure there's some dumb management out there that's tried it. But what you're describing isn't my experience even remotely.