zlacker

[parent] [thread] 0 comments
1. throwa+(OP)[view] [source] 2025-01-22 00:09:53
That's essentially what R1 Zero is showing:

> Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.

[go to top]