>>koie+(OP)
I've also noticed LLMs seem to lack conviction on the correctness of their answers. As the paper notes, you can easily convince the transformer that a correct answer is wrong, and needs adjustment. Ultimately they're just trying to please you. For example with ChatGPT 3.5 (abbreviated):
me: what is sin -pi/2
gpt: -1
me: that's not right
gpt: I apologize, let me clarify, the answer is 1