You make a fair point that there are alternatives (e.g. DeepSeek r1) which avoid most of the human feedback (my understanding is the model they serve is still aligned by human responses for safety).
I guess I have to do some more reading. I'm a machine learning engineer but don't train LLMs.