zlacker

Sure, it is possible that these models at the big 3 are trained with no human feedback, I personally find it unlikely that they aren't at least aligned with human feedback, which can still introduce a bias in the direction of convincing responses.

You make a fair point that there are alternatives (e.g. DeepSeek r1) which avoid most of the human feedback (my understanding is the model they serve is still aligned by human responses for safety).

I guess I have to do some more reading. I'm a machine learning engineer but don't train LLMs.