I'm not sure why I've never heard of this being done, it would be a good use of GPUs in between training runs.
If it's possible to produce intelligence from just ingesting text, then current tech companies have all the data they need from their initial scrapes of the internet. They don't need more. That's different to keeping models up to date on current affairs.
EVERY youtube video?? Even the 9/11 truther videos? Sandy Hook conspiracy videos? Flat earth? Even the blatantly racist? This would be some bad training data without some pruning.
> Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.
Part of the reason that kids need less material is that the aren't just listening, they are also able to do experiments to see what works and what doesn't.
but also myriad of hardcore private repositories of many high-tech US enterprises hacking amazing shit (mine included) :)