Obituary for Cyc

>>todsac+(OP)
I would argue that Lenat was at least directionally correct in understanding that sheer volume of data (in Cyc's case, rules and facts) was the key in eventually achieving useful intelligence. I have to confess that I once criticized the Cyc project for creating an ever-larger pile of sh*t and expecting a pony to emerge, but that's sort of what has happened with LLMs.

>>vannev+14
That’s hilarious, but at least Llama was trained on libgen, an archive of most books and publications by humanity, no? Except for the ones which were not digitized I guess

So there is probably a big pile of Reddit comments, twitter messages, and libgen and arxiv PDFs I imagine

So there is some shit, but also painstakingly encoded knowledge (ie writing), and yeah it is miraculous that LLMs are right as often as they are

>>chubot+ca
libgen is far from an archive of "most" books and publications, not even close.

The most recent numbers from libgen itself are 2.4 million non-fiction books and 80 million science journal articles. The Atlantic's database published in 2025 has 7.5 million books.[0] The publishing industry estimates that many books are published each year. As of 2010, Google counted over 129 million books[1]. At best an LLM like Llama will have have 20% of all books in its training set.

0. https://www.theatlantic.com/technology/archive/2025/03/libge...

1. https://booksearch.blogspot.com/2010/08/books-of-world-stand...

>>crater+xx
On libgen.mx they claim to have 33,569,200 books and 84,844,242 articles

zlacker