zlacker

I’ve seen benchs on hallucinations and OpenAI has typically performed worse than Google and Anthropic models. Sometimes significantly so. But it doesn’t seem like they have cared much. I’ve suspected that LLM performance is correlated to risking hallucinations? That is, if they’re bolder, this can be beneficial? Which helps in other performance benchmarks. But of course at the risk of hallucinating more…

replies(1): >>mounta+K1

>>jug+(OP)
The hallucinations are a result of RLVR. We reward the model for an answer and then force it to reason about how to get there when the base model may not have that information.

replies(1): >>mdp202+Sk

>>mounta+K1
> The hallucinations are a result of RLVR

Well let us reward them for producing output that is consistent with database accessed selected documentation then, and massacre them for output they cannot justify - like we do with humans.