zlacker

[return to "Obituary for Cyc"]
1. vannev+14[view] [source] 2025-04-08 19:44:13
>>todsac+(OP)
I would argue that Lenat was at least directionally correct in understanding that sheer volume of data (in Cyc's case, rules and facts) was the key in eventually achieving useful intelligence. I have to confess that I once criticized the Cyc project for creating an ever-larger pile of sh*t and expecting a pony to emerge, but that's sort of what has happened with LLMs.
◧◩
2. chubot+ca[view] [source] 2025-04-08 20:26:50
>>vannev+14
That’s hilarious, but at least Llama was trained on libgen, an archive of most books and publications by humanity, no? Except for the ones which were not digitized I guess

So there is probably a big pile of Reddit comments, twitter messages, and libgen and arxiv PDFs I imagine

So there is some shit, but also painstakingly encoded knowledge (ie writing), and yeah it is miraculous that LLMs are right as often as they are

◧◩◪
3. crater+xx[view] [source] 2025-04-08 23:44:14
>>chubot+ca
libgen is far from an archive of "most" books and publications, not even close.

The most recent numbers from libgen itself are 2.4 million non-fiction books and 80 million science journal articles. The Atlantic's database published in 2025 has 7.5 million books.[0] The publishing industry estimates that many books are published each year. As of 2010, Google counted over 129 million books[1]. At best an LLM like Llama will have have 20% of all books in its training set.

0. https://www.theatlantic.com/technology/archive/2025/03/libge...

1. https://booksearch.blogspot.com/2010/08/books-of-world-stand...

◧◩◪◨
4. UltraS+k51[view] [source] 2025-04-09 07:20:05
>>crater+xx
On libgen.mx they claim to have 33,569,200 books and 84,844,242 articles
[go to top]