Basically, yup. It takes a few sprints to start to establish a meaningfully reliable sense of velocity, and the estimation accuracy is why planning poker takes a couple of hours of real discussion over feature complexity, rather than just a few minutes of superficial guesses. But the end result is a far more accurate ability to estimate what a team can reliably deliver in a sprint, and is really good at bringing stakeholders down to earth in terms of what can actually realistically be delivered.
> Seems a bit ironic that complexity doesn't measure time but then we are measuring how much complexity can someone deliver on average on a given time.
What's ironic? And no, it's not about "someone", it's about the the team. Different people on the team will be able to deliver different numbers of points depending on their skill, experience, etc. This is a major reason for not using time -- it actively recognizes that different people take different amounts of time, that things like sick days and meetings are taken into account, etc.
> Isn't complexity directly proportional to uncertainty factors
Yes, this is an explicit assumption of the Fibonnaci-style points usually used.
> and therefore inversely proportional to confidence of time to completion?
Yes, which is precisely why stories over a certain size are disallowed (the feature must be broken up into parts), and why sprints are measured in a very small number of weeks -- to avoid the accumulation of too much uncertainty.