So there is probably a big pile of Reddit comments, twitter messages, and libgen and arxiv PDFs I imagine
So there is some shit, but also painstakingly encoded knowledge (ie writing), and yeah it is miraculous that LLMs are right as often as they are
It has a phenomenal recall. I just asked it about "SmartOS", something I knew about, vaguely, in ~2012, and it gave me a pretty darn good answer. On that particular subject, I think it probably gave a better answer than anyone I could e-mail, call, or text right now
It was significantly more informative than wikipedia - https://en.wikipedia.org/wiki/SmartOS
But I still find it easy to stump it and get it to hallucinate, which makes it seem dumb
It is like a person with good manners, and a lot of memory, and which is quite good at comparisons (although you have to verify, which is usually fine)
But I would not say it is "smart" at coming up with new ideas or anything
I do think a key point is that a "text calculator" is doing a lot of work ... i.e. summarization and comparison are extremely useful things. They can accelerate thinking
The most recent numbers from libgen itself are 2.4 million non-fiction books and 80 million science journal articles. The Atlantic's database published in 2025 has 7.5 million books.[0] The publishing industry estimates that many books are published each year. As of 2010, Google counted over 129 million books[1]. At best an LLM like Llama will have have 20% of all books in its training set.
0. https://www.theatlantic.com/technology/archive/2025/03/libge...
1. https://booksearch.blogspot.com/2010/08/books-of-world-stand...