> There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.
Precisely that. Which is why I often argue, that for 99%+ of the content in the training data, its marginal contribution to the training process - itself infinitesimal in isolation - is still by far the most value that content will ever bring to the world.