They're not perfect (nothing is), but they're actually pretty good. Every task has to be completable within a sprint. If it's not, you break it down until you have a part that you expect is. Everyone has to unanimously agree on how many points a particular story (task) is worth. The process of coming to unanimous agreement is the difficult part, and where the real value lies. Someone says "3 points", and someone points out they haven't thought about how it will require X, Y, and Z. Someone else says "40 points" and they're asked to explain and it turns out they misunderstood the feature entirely. After somewhere from 2 to 20 minutes, everyone has tried to think about all the gotchas and all the ways it might be done more easily, and you come up with an estimate. History tells you how many points you usually deliver per sprint, and after a few months the team usually gets pretty accurate to within +/- 10% or so, since underestimation on one story gets balanced by overestimation on another.
It's not magic. It prevents you from estimating things longer than a sprint, because it assumes that's impossible. But it does ensure that you're constantly delivering value at a steady pace, and that you revisit the cost/benefit tradeoff of each new piece of work at every sprint, so you're not blindsided by everything being 10x or 20x slower than expected after 3 or 6 months.
For instance someone says a ticket is two days' work. For half the team that could be four days because people are new to the team or haven't touched that codebase, etc. But because the person who knows the ticket and context well enough says 2, people tend to go with what they say.
We end up having less of those discussions you describe to come to an agreement that works more on an average length of time the ticket should take to complete.
And then the org makes up new rules that SWEs should be turning around PRs in less than 24 hours and if reviews/iterating on those reviews takes longer than two days then our metrics look bad and there could be consequences.
But that's another story.
And yes, when there's a task that one person happens to know most, people will often defer to them. But that in itself is educational, as the experienced dev explains why the given task is easy/hard. And every task is different, so the person you're deferring to will be different, and you still often get the two or three people who know the task best disagreeing until they hash it out, etc. And very often it's another person who can point out "well it would be that easy except three sprints ago I did x and so you'll now need to do y...". And of course plenty of tasks really are brand-new so everyone's figuring it out together.
If you're really not having actual discussions around complexity in planning poker, then the facilitator/lead/manager might be doing it wrong. You do have to create an environment where people are expected to speak up and disagree, to demonstrate that this is welcomed and expected and rewarded, and not just some kind of checkbox exercise where the most senior dev gives their estimation and everyone agrees. This is also a reason why it's literally done with cards where everyone is forced to put their number on the table at the same time, so that you don't wind up with some senior person always going first and then everyone else just nodding and agreeing.
How are you estimating the points if not thinking about how hard the task is for you and how long is it going to take you?
And then another matter is that points do not correlate to who later takes that work. If you are 5 seniors and 3 juniors and average on a task being a 3, but the task falls to a junior, they will take longer as is expected for his experience.