zlacker

[parent] [thread] 5 comments
1. Philpa+(OP)[view] [source] 2025-01-21 23:16:13
That's not really true - the current generation, as in "of the last three months", uses reinforcement learning to synthesize new training data for themselves: https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero
replies(2): >>XorNot+h4 >>bandra+c5
2. XorNot+h4[view] [source] 2025-01-21 23:44:49
>>Philpa+(OP)
Right but that's kind of the point: there's no way forward which could benefit from "moar data". In fact it's weird we need so much data now - i.e. my son in learning to talk hardly needs to have read the complete works of Shakespeare.

If it's possible to produce intelligence from just ingesting text, then current tech companies have all the data they need from their initial scrapes of the internet. They don't need more. That's different to keeping models up to date on current affairs.

replies(2): >>throwa+d8 >>YetAno+eN
3. bandra+c5[view] [source] 2025-01-21 23:49:51
>>Philpa+(OP)
It worked well for the Habsburg family; what could go wrong?
◧◩
4. throwa+d8[view] [source] [discussion] 2025-01-22 00:09:53
>>XorNot+h4
That's essentially what R1 Zero is showing:

> Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.

◧◩
5. YetAno+eN[view] [source] [discussion] 2025-01-22 05:56:42
>>XorNot+h4
O3 high compute requires 1000s of dollars to solve one medium complexity problem like ARC.
replies(1): >>artifi+g33
◧◩◪
6. artifi+g33[view] [source] [discussion] 2025-01-22 22:26:45
>>YetAno+eN
Light bulbs used to be expensive too, nails as well.
[go to top]