Sad to hear:
a. of his passing
b. that CYC didn't eventually meet it's goals
Discussed >>37354000 (172 comments)
The contracts at the time were mostly skunkworks/internal to the client companies, so not usually highly publicized. A couple examples are mentioned on their website: https://cyc.com/
They're actively working on this, with the goal of ultimately building a language-independent representation[0] of ordinary encyclopedic text. Much like a machine translation interlanguage, but something that would be mostly authored by humans, not auto-generated from existing natural-language text. See https://meta.wikimedia.org/wiki/Abstract_Wikipedia for more information.
[0] Of course, there are some very well-known pitfalls to this general idea: what's the true, canonical language-independent representation of nimium saepe valedīxit? So this should probably be understood as mostly language-independent, enough to be practically useful.
Doug Lenat – I was positively impressed with Wolfram Alpha - >>510579 - March 2009 (17 comments)
And of course, recent and related:
Doug Lenat has died - >>37354000 - Sept 2023 (170 comments)
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
https://static.googleusercontent.com/media/research.google.c... and https://norvig.com/chomsky.html
In short, Norvig concludes there are several conceptual approaches to ML/AI/Stats/Scientific analysis. One is "top down": teach the system some high level principles that correspond to known general concepts, and the other is "bottom up": determine the structure from the data itself and use that to generate general concepts. He observes that while the former is attractive to many, the latter has continuously produced more and better results with less effort.
I've seen this play out over and over. I've concluded that Norvig is right: empirically based probabilistic models are a cheaper, faster way to answer important engineering and scientific problems, even if they are possibly less satisfying intellectually. Cheap approximations are often far better than hard to find analytic solutions.
For example, on the wikidata help page, they talk about the height of Mount Everest:
https://www.wikidata.org/wiki/Help:About_data#Structuring_da...
Earth (Q2) (item) → highest point (P610) (property) → Mount Everest (Q513) (value)
and Mount Everest (Q513) (item) → instance of (P31) (property) → mountain (Q8502) (value)
So that's all fine, but it misses a lot of context. These facts might be true for the real world, right now, but they won't always be true. Even in the not-so-distant past, the height of Everest was lower, because of tectonic plate movement. And maybe in the future it will go even higher due to tectonics, or maybe it will go lower due to erosion.Context awareness gets even more important when talking about facts like "the iPhone is the best selling phone", for example. That might be true right now, but it certainly wasn't true back in 2006, before the phone was released.
Context also comes in many forms, which can be necessary for useful reasoning. For example, consider the question: "What would be the highest mountain in the world, if someone blew up the peak of Everest with a bomb?" This question isn't about the real world, right here and right now, it is about a hypothetical world that doesn't exist.
Going a little further afield, you may want to ask a question like "Who is the best captain of the Enterprise?". This might be about the actual US Navy CVN-64 ship named "Enterprise", the planned CVN-80, or the older ship CV-6 Enterprise which fought in WW2. Or maybe a relevant context to the question was "Star Trek", and we're in one of several fictional worlds instead, which would result in a completely different set of facts.
I think some ability to deal with uncertainly (as with Probabilistic Graphical Models) is also necessary to deal with practical applications of this technology. We may be dealing with a mix of "objective facts" (well, let's not get into a discussion about the philosophy of science) and other facts that we may not be so certain about.
It seems to me that successful symbolic reasoning system will be very, very large and complex. I'm not at all sure even how such knowledge should be represented, never mind the issue of trying to capture it all in digital form.
https://dspace.mit.edu/handle/1721.1/14257?show=full
It's quite interesting.
How much iron does a steel mill need this year? Well, that depends on how many customers they'll get, which depends on what price they sell steel at.
https://www.youtube.com/watch?v=3wMKoSRbGVs
Lenat thought CYC and neural nets could be complementary, with neural nets providing right brain/fast thinking capability, and CYC left brain/slow (analytic/reflective) thinking capability.
It's odd to see Lenat discuss CYC the way he does - as if 40 years on everything was still going well despite it having dropped off the public radar twenty years ago.
There's also a lengthy Lex Fridman interview with Doug Lenat, from just a year ago, here:
https://www.youtube.com/watch?v=3wMKoSRbGVs
It seems as if the "common sense expert system" foundation of CYC (the mostly unstated common knowledge behind all human communication) was basically completed, but what has failed to materialize is any higher level comprehensive knowledge base and reasoning system (i.e some form of AGI) based on top of this.
It's not clear from the outside whether anyone working at Cycorp still really believes there is a CYC-based path to AGI, but regardless it seems not to be something that's really being funded and worked on, and 40 years on probably fair to say it's not going to happen. It seems that Cycorp stays alive by selling the hype and winning contracts to develop domain-specific expert systems, based on the CYC methodology and toolset, that really have little reliance on the "common sense" foundations they are nominally built on top of.