zlacker

[parent] [thread] 1 comments
1. mounta+(OP)[view] [source] 2025-05-06 17:46:17
The hallucinations are a result of RLVR. We reward the model for an answer and then force it to reason about how to get there when the base model may not have that information.
replies(1): >>mdp202+8j
2. mdp202+8j[view] [source] 2025-05-06 19:52:33
>>mounta+(OP)
> The hallucinations are a result of RLVR

Well let us reward them for producing output that is consistent with database accessed selected documentation then, and massacre them for output they cannot justify - like we do with humans.

[go to top]