Then again, I felt the same way when I studied it at university almost 20 years ago. It was pretty obviously a pipe dream then, too.
I think what really stopped Cyc from getting a wider traction is its closed nature[0]. People do use Princeton WordNet, which you can get for free, even though it's a mess in many aspects. The issue and mentality here is similar to commercial Common Lisp implementations, and the underlying culture is similar (oldschool 80s AI). These projects were shaped with a mindset that major progress in computing will happen with huge government grants and plans[1]. However you interpret the last 30 years, it was not exactly true. It's possible that all these companies earn money for their owners, but they have no industry-wide impact.
I was half-tempted once or twice to use something like Cyc in some project, but it would probably be too much organizational hassle. Especially if it turned out to be something commercial I wouldn't want to be dependent on someone's licensing and financial whims, especially if it can be avoided.
[0] There was OpenCyc for a time, but it was scrapped.
[1] Compare https://news.ycombinator.com/item?id=20569098
tangential question: is there a standard language for "knowledge", like how we describe math for "computation" ?
Are a part of our brains essentially compilers from human language to an internal definition of "knowledge" that leads to consciousness ?
[Edit] Here's a wider overview: https://en.wikipedia.org/wiki/Knowledge_representation_and_r...
I still think the potential of lambda calculus in knowledge representation and logical deduction is high and under-represented in research.
Just theorizing, but I think a large part of the problem is the difficulty in interfacing this knowledge base with manual, human entry. Another pitfall is the difficulty in determining strange or unanticipated logical outcomes, and developing a framework to catch or validate these.
There were some big positives. Everyone there is very smart and depending on your tastes, it can be pretty fun to be in meetings where you try to explain Davidsonian ontology to perplexed business people. I suspect a decent fraction of the technical staff are reading this comment thread. There are also some genuine technical advances (which I wish were more publicly shared) in inference engine architecture or generally stemming from treating symbolic reasoning as a practical engineering project and giving up on things like completeness in favor of being able to get an answer most of the time.
There were also some big negatives, mostly structural ones. Within Cycorp different people have very different pictures of what the ultimate goals of the project are, what true AI is, and how (and whether) Cyc is going to make strides along the path to true AI. The company has been around for a long time and these disagreements never really resolve - they just sort of hang around and affect how different segments of the company work. There's also a very flat organizational structure which makes for a very anarchic and shifting map of who is responsible or accountable for what. And there's a huge disconnect between what the higher ups understand the company and technology to be doing, the projects they actually work on, and the low-level day-to-day work done by programmers and ontologists there.
I was initially pretty skeptical of the continued feasibility of symbolic AI when I went in to interview, but Doug Lenat gave me a pitch that essentially assured me that the project had found a way around many of the concerns I had. In particular, they were doing deep reasoning from common sense principles using heuristics and not just doing the thing Prolog often devolved into where you end up basically writing a logical system to emulate a procedural algorithm to solve problems.
It turns out there's a kind of reality distortion field around the management there, despite their best intentions - partially maintained by the management's own steadfast belief in the idea that what Cyc does is what it ought to be doing, but partially maintained by a layer of people that actively isolate the management from understanding the dirty work that goes into actually making projects work or appear to. So while a certain amount of "common sense" knowledge factors into the reasoning processes, a great amount of Cyc's output at the project level really comes from hand-crafted algorithms implemented either in the inference engine or the ontology.
Also the codebase is the biggest mess I have ever seen by an order of magnitude. I spent some entire days just scrolling through different versions of entire systems that duplicate massive chunks of functionality, written 20 years apart, with no indication of which (if any) still worked or were the preferred way to do things.
Almost nobody is really working on AGI and this is the main issue. A notable counter example is the recent reconversion of John Carmack.
- At least right now, we have a good amount of common-sense information about the world (I don't know when "last time" was for you).
- That said, we have a lot of highly specialized knowledge in various domains, so if you took a random sample of the knowledge base (KB) it may not be as common-sense-centric as you'd hope. But the KB is also incredibly large, so that doesn't mean we don't have much common-sense, just that we have even more other stuff.
- Often for contracts we get paid to construct lots of domain-specific knowledge, even if the project also uses the more general knowledge, so this biases the distribution some.
- Information that's already well-taxonomized is low-hanging fruit for this kind of system; its representation doesn't take nearly as much extra thought and consideration, so it's a faster process, which also biases the distribution some.
Even Wikidata mostly looks like that, despite being intended quite clearly as a "general purpose" knowledge base. Mostly because this sort of information is easily extracted from existing, referenced sources. The "general purpose" character of it all comes into play wrt. linking across those specialized domains.
"Alive loves Bob"
What do you know? Nothing. Was it Alice who said she loves Bob, or was it Bob who said it is Alice who loves him, maybe Carol saw the way Alice looks at Bob and then conclude she must love him. What is love anyway? How exactly is the love Alice has for Bob quantitively different than my love of chocolate. It might register similar brain activity in a MRI scan, and yet we humans recognise them as qualitatively different.
A knowledge base is useless if you can't judge wether a fact is true or false. The response to this problem was for the semantic web community to introduce a provenance ontology, but any attempt to reason over statements about statements seem to go nowhere. IMHO you can't solve the problem of AGI without also having a way for a rational agent to embody its thoughts in the physical world.
Would a human dislike touching a/an incandescent bulb while the electric lamp is powered on?
Yes.
?HUMAN dislikes being a performer in the ?TOUCHING.
• Embodied agents dislike performing acts that cause them discomfort.
• ?HUMAN is an embodied perceptual agent.
• ?HUMAN is a human.
• Every human is an embodied perceptual agent.
• ?HUMAN deliberately performed ?TOUCHING.
• ?TOUCHING causes some discomfort.
• Touching something that is too hot to touch causes pain.
• The quantity range pain includes all points in some discomfort.
• ?PART is too hot to touch.
• When an incandescent bulb is on, it is too hot to touch.
• ?PART is an incandescent bulb.
• ?PART’s current state is powered on.
• When a lamp with a bulb is on, so is the bulb.
• ?PART is a physical part of ?DEVICE.
• ?PART is a physical part of ?DEVICE.
• ?PART is a physical part of ?PART.
• ?DEVICE’s current state is powered on.
• ?PART is a light bulb.
• ?PART is an incandescent bulb.
• Every incandescent bulb is a light bulb.
• ?DEVICE is an electric lamp.
• ?PART was affected during ?TOUCHING.
We have a couple thousand of these, which we've aimed to make as diverse as possible1) How did they manage to make money for so long to keep things afloat? I'm guessing through some self-sustainable projects like the few business relationships listed in the wiki?
2) What's the tech stack like? (Language, deployment, etc)
Wikidata is also worth considering for that task. It is:
* Directly linked from Wikipedia [1]
* The data source for many infoboxes [2]
* Seeded with data from Wikipedia
* More active and integrated in community
* Larger in total number of concepts
Wikidata also has initiatives in lexicographic data [3] and images [4, 5].
On the subject of Cyc: the CycL "generalization" (#$genls) predicate inspired Wikidata's "subclass of" property [6], which now links together Wikidata's tree of knowledge.
---
1. See "Wikidata" link at left in all articles, e.g. https://en.wikipedia.org/wiki/Knowledge_base
2. https://en.wikipedia.org/wiki/Category:Infobox_templates_usi...
3. https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/...
4. https://www.wikidata.org/wiki/Wikidata:Wikimedia_Commons/Dev...
5. See "Structured data" tab in image details on Wikimedia Commons, e.g. https://commons.wikimedia.org/wiki/File:Mona_Lisa,_by_Leonar...
6. https://www.wikidata.org/wiki/Property_talk:P279#Archived_cr...
So you could incorporate some kind of cooling rate, then change the above to "When an incandescent bulb has been on x of the last y minutes it is too hot to touch".
This all seems just impossibly complicated (not that I can think of something simpler!) - am I missing anything?
1) One-off contracts, sometimes with ongoing licenses, from various large organizations who have use cases for inference. We did a lot of government contracting for a while; now we mostly stay in the private sector.
2) An in-house dialect of Common Lisp that compiles to Java (it predates Clojure). Deployment is still fairly ad-hoc, but we build Containers.
I will say also that our focus on "common sense" means we make deliberate choices about where to "bottom out" the granularity with which we represent the world; otherwise we'd find ourselves reasoning about literal atoms in every case. We generally try to target the level at which a human might conceive of a concept; humans can treat a collection of atoms as roughly "a single object", but then still apply formal logical to that object and its properties (and "typical" properties). In one sense it isn't a perfect representation, but in another sense it strikes the right balance between perfect and meaningful.
I worked at an AI company before, and it was the latter.
Is there doubt as to whether a neuron can be represented computationally?
The categories of projects that I was familiar with were basically proof of concept work for companies or government R&D contracts. There are lots of big companies that will throw a few million at a long-shot AI project just to see if it pays off, even if they don't always have a very clear idea of what they ultimately want or a concrete plan to build a product around it. Sometimes these would pay off, sometimes they wouldn't but we'd get by on the initial investment for proof of concept work. Similarly, organizations like DARPA will fund multiple speculative projects around a similar goal (e.g. education - that's where "Mathcraft" came from IIRC) to evaluate the most promising direction.
There have been a few big hits in the company's history, most of which I can't talk about. The hits have basically been in very circumscribed knowledge domains where there's a lot of data, a lot of opportunity for simple common sense inferences (e.g. if Alice worked for the ABC team of company A at the same time Bob worked for the XYZ team of company B and companies A and B were collaborating on a project involving the ABC and XYZ teams at that same time, then Alice and Bob have probably met) and you have reason to follow all those connections looking for patterns, but it's just too much data for a human to make a map of. Cyc can answer questions about probable business or knowledge relationships between individuals in large sets of people in a few seconds, which would be weeks of human research and certain institutions pay a high premium for that kind of thing.
2) Oh god. Get ready. Here's a 10k foot overview of a crazy thing. All this is apparent if you use OpenCyc so I feel pretty safe talking about it. Cyc is divided into the inference engine and the knowledge base. Both are expressed in different custom LISPy dialects. The knowledge base language is like a layer on top of the inference engine language.
The inference engine language has LISPy syntax but is crucially very un-LISPy in certain ways (way more procedural, no lambdas, reading it makes me want to die). To build the inference engine, you run a process that translates the inference code into Java and compiles that. Read that closely - it doesn't compile to JVM bytecode, it transpiles to Java source files, which are then compiled. This process was created before languages other than Java targeting the JVM were really a thing. There was a push to transition to Clojure or something for the next version of Cyc, but I don't know how far it got off the ground because of 30 years of technical debt.
The knowledge base itself is basically a set of images running on servers that periodically serialize their state in a way that can be restarted - individual ontologists can boot up their own images, make changes and transmit those to the central images. This model predates things like version control and things can get hairy when different images get too out of sync. Again, there was an effort to build a kind of git-equivalent to ease those pains, which I think was mostly finished but not widely adopted.
There are project-specific knowledge base branches that get deployed in their own images to customers, and specific knowledge base subsets used for different things.
If I had to guess what it's been actually used for, I'd wager it's money laundering or counter-terrorism type stuff; it's fairly well suited to finding connections between people and entities given a large data-set, and unlike many ML models, it can tell you why it thinks someone is suspicious, which might be needed for justifying further investigation. This is a completely wild-ass guess though so take with a giant grain of salt.
That was my motivation for writing Hode[1], the Higher-Order Data Editor. It lets you represent arbitrarily nested relationships, of any arity (number of members). It lets you cursor around data to view neighboring data, and it offers a query language that is, I believe, as close as possible to ordinary natural language.
(Hode has no inference engine, and I don't call it an AI project -- but it seems relevant enough to warrant a plug.)
Two general questions, if you don't mind:
1. How would you characterize the relationship between the politics and structure in your company?
2. Do feel that the layer of people actively isolating the top embodied the company's culture?
My own Hode, described in an earlier comment[2], makes it easy for anyone who speaks some natural language to enter and query arbitrary structured data.
[1] https://en.wikipedia.org/wiki/Attempto_Controlled_English
Consider that humans learn though having bodies to explore the world with, while forming a variety of social relations to learn the culture. Which is very different from encoding a bunch of rules to make up an intelligence.
Cyc doesn't do anything Bayesian like assigning specific probabilities to individual beliefs - IIRC they tried something like that and it had the problem where nobody felt very confident about attaching any particular precise number to priors and also the inference chains can be so long and involve so many assertions that anything less than 1 probability for most assertions would result in conclusions with very low confidence levels.
As to what they actually do, there are a few approaches.
I know that for one thing, there are coarse grained epistemic levels of belief built into the representation system - some predicates have "HighLikelihoodOf___" or "LowLikelihoodOf___" versions that enable very rough probabilistic reasoning that (it's argued - I have no position on this) is actually closer to the kind of folk-probabilistic thinking that humans actually do.
Also Cyc can use non-monotonic logic, which I think is relatively unique for commercial inference engines. I'm not going to give the best explanation here, but effectively, Cyc can assume that some assertions are "generally" true but may have certain exceptions, which makes it easy to express a lot of facts in a way that's similar to human reasoning. In general, mammals don't lay eggs. So you can assert that mammals don't lay eggs. But you can also assert that statement is non-monotonic and has exceptions (e.g. Platypuses).
Finally, and this isn't actually strictly about probabilistic reasoning, but helps represent different kinds of non-absolute reasoning: knowledge in Cyc is always contextualized. The knowledge base is divided up into "microtheories" of contexts where assertions are given to hold as if they're both true and relevant - very little is assumed to be always true across the board. This allows them to represent a lot of different topics, conflicting theories or even fictional worlds - there are various microtheories used for reasoning events in about popular media franchises, where the same laws of physics might not apply.
This type of thing usually comes through unplanned breakthroughs. You can't discover that the earth revolves around the sun just by paying tons of money to researchers and asking them to figure out astronomy. All that would get you would be some extremely sophisticated Copernican cycle-based models.
Personally, I believe that AI is possible (hard AI thesis) and that computationalism with multiple realizability is right, since none of the philosophical arguments against hard AI and computationalism have convinced me so far. But there are as many opinions on that as there are people working on it.
1. You have to solve the interaction problem (how does the mind interact with the physical world?)
2. You need to explain why the world is not physically closed without blatantly violating physical theory / natural laws.
3. From the fact that the mind is nonphysical, it does not follow that computationalism is false. On the contrary, I'd say that computationalism is still the best explanation of how human thinking works even for a dualist. (All the alternatives are quite mystical, except maybe for hypercomputationalism.)
The physics of bridges is well known. That is basically a solved problem. Human consciousness/intelligence is an open problem, and may never be solved.
Computers have not superseded humans in mathematical research. That is way beyond anything that we can program into a computer. Computers are better at computation, which is not the same thing.
The degree to which it's effective seemed to me to be a case-by-case thing. While working there I tended to suspect that Cyc people underestimated the degree to which you could get a large fraction of their results using something like Datomic and it was an open question (to me at least) whether the extra 10% or whatever was worth how much massively more complicated it is to work with Cyc. I might be wrong though, I kind of isolated myself from working directly with customers.
One issue is just that "useful" always invites the question "useful to whom?"
Part of the tension of the company was a distinction between their long term project and the work that they did to pay the bills. The long term goal was something like, to eventually accumulate enough knowledge to create something that could be the basis for a human-ish AI. Whether that's useful, or their approach to it was useful, is a matter for another comment. But let's just say, businesses rarely show up wanting to pay you for doing that directly, so part of the business model is just finding particular problems that they were good at (lots of data, lots of basic inference required using common sense knowledge) that other companies weren't prepared to do. Some clients found Cyc enormously useful in that regard, others were frustrated by the complexity of the system.
Here are some common knowledge in English that I would love to see the system answer.
- Is a dog owner likely to own a rope-like object? (Yes, they likely own a leash.)
- Does the average North American own more than 1 shoestring? (Yes, most people have at least 2 shoes, and most shoes have shoestrings.)
- Is it safer to fly or to travel by car?
I understand that any practical system of this kind would have to be very coarse, but even at the coarse level, does it have any kind of "error bar" indicator, to show how "sure" it is of the possibly incorrect answer? And can it come up with pertinent questions to narrow things down to a more "correct" answer?
Here's what I'll say: the degree of isolation between different mindsets and disagreement (that was typically very amicable if it was acknowledged at all) is emblematic of the culture of the company. There are people there with raaadically different ideas of what Cyc is for, what it's good at and even about empirical things like how it actually works. They mostly get along, sometimes there's tension. Over the years, the Cyc as its actually implemented has drifted pretty far from the Cyc that people like Doug Lenat believe in, and the degree to which they're willing or able to acknowledge that seems to sort of drift around, often dependent on factors like mood. Doug would show up and be very confused about why some things were hard because he just believes that Cyc works differently than it does in practice, and people had project deadlines, so they often implemented features via hacks to shape inference or hand-built algorithms to deliver answers that Doug thought ought to be derived from principles via inference. Doug thinks way more stuff that Cyc does is something that it effectively learned to do by automatically deriving a way to solve the general form of a problem, rather than a programmer up late hand-coding things to make a demo work the next day, and the programmers aren't going to tell him because there's a demo tomorrow too and it's not working yet.
Of course, I don't recall them mentioning any of the more dystopian things it could be (and sounds like has been) used for :/.
* https://news.ycombinator.com/item?id=21784105
On second thought, it might have been an Alan Kay presentation. I couldn't find that either but looking I did find this interesting Wired article from 2016:
https://www.wired.com/2016/03/doug-lenat-artificial-intellig...
Are you leaving the reason unsaid, or am I in fact reading your argument correctly: "We don't understand consciousness, and we don't understand quantum, therefore it is likely consciousness relies on quantum." There's already plenty of mystery in an ordinary deterministic computation-driven approach to intelligence.
The latter thing sounds like something Doug Lenat has wanted for years, though I think it mostly comes up in cases where the information available is ambiguous, rather than unreliable. There are various knowledge entry schemes that involve Cyc dynamically generating more questions to ask the user to disambiguate or find relevant information.
2. If the world is not physically closed then physical theory and natural laws are not violated, since they would not apply to anything beyond the physical world.
3. True, but if the mind can be shown to perform physically uncomputable tasks, then we can infer the mind is not physical. In which case we can also apply Occam's razor and infer the mind is doing something uncomputable as opposed to having access to vast immaterial computational resources.
Finally, calling a position names, such as 'mystical', does nothing to determine the veracity of the position. At best it is counter productive by distracting from the logic of the argument.
A thoughtful perspective that is useful for my own understanding.
Thank you.
2) Do you think it's possible that Cyc would lead to AI advances that are impressive to the layman like AlphaGo or GPT-2?
I was referring to why a software that parse the semantics of Wikipedia articles and make them queryable through natural language questions, is something that humanity isn't able to do?
2) Where can I found a complete explanation of why Cyc hasn't yet been enough to build true natural language understanding, which technical difficulties needs to be solved? Examples would be welcome.
3) Is would be really nice if you showed progress in real time and allowed community to contribute intellectually. You could make a github repository without the code. But where we could see all the technical issues per tag, so we could follow the discussions and eventually share useful knowledge with you in order to accelerate progress.
> if the mind can be shown to perform physically uncomputable tasks
That's true. Many people have tried that and many people believe they can show it. Roger Penrose, for example. These arguments are usually based on complexity theory or the Halting Problem and involve certain views about what mathematicians can and cannot do. As I've said, I've personally not been convinced by any of those arguments.
Your mileage may differ. Fair enough. Just make sure that you do not "know the answer" already when starting to think about the problem, because that's what many people seem to do when they think about these kind of problems and it's a pity.
> calling a position names, such as 'mystical', does nothing to determine the veracity of the position. At best it is counter productive by distracting from the logic of the argument.
That wasn't my intention, I use "mystical" in this context in the sense of "does not provide any better understanding or scientifically acceptable explanation." Many of the (modern) arguments in this area are inferences to the best explanation.
By the way, correctly formulated computationalism does not presume physicalism. It is fully compatible with dualism.
Really when it comes to practical applications using Cyc, there are three alternatives to consider and only two of them actually exist.
1. There are custom domain specific solutions, involving tailored (limited) inference engines and various kinds of smart databases.
2. There's Cyc.
3. There's a hypothetical future Cyc-like inference system that isn't burdened by 30 years of technical debt.
I personally suspect that some of Cycorp's clients would do better with domain-specific solutions because they don't realize how much of their problem could be solved that way and how much of the analysis coming from Cyc is actually the result of subject matter experts effectively building domain-specific solutions the hard way inside of Cyc. With a lot of Cycorp projects, it's hard to point your finger at exactly where the "AI" is happening.
There are some domains where you just need more inferential power and to leverage the years and years of background knowledge that's already in Cyc. Even then I sometimes used to wonder about the cost/effort effectiveness of using something as powerful and complicated as Cyc when a domain-specific solution might do 90% as well with half the effort.
If someone made a streamlined inference engine using modern engineering practices with a few years of concentrated work on making it usable by people who don't have graduate degrees in formal logic, and ported the most useful subset of the Cyc knowledge base over, that math would change dramatically.
1) We often use HOL. CycL isn't restricted to first order logic and we often reason by quantifying over predicates.
2) I don't know where you could read an explanation of it, other than the general problem that NLU is hard. It is something people at the company are interested in, though, and some of us think Cyc can play a big role in NLU.
Moreover, having models of things that are interesting and relevant to humans seems pretty important for any system that interacts with humans.
And it always seemed reasonable that any system that aims to use natural language should be able to represent the meaning of the sentences it uses in a clear and understandable format.
Also "organizing the world's information" should make it usable in an automated fashion based on semantic models.
2) How far we're from real self-evolving cognitive architectures with self-awareness features? Is it a question of years, months, or it's already solved problem?
3) Does it make sense to use embeddings like https://github.com/facebookresearch/PyTorch-BigGraph to achieve better results?
4) Why Cycorp decided to limit communication and collaboration with scientific community / AI-enthusiasts at some point?
5) Did you try to solve GLUE / SUPERGLUE / SQUAD challenges with your system?
6) Is Douglas Lenat still contribute actively to the project?
Thanks
I could be sold on the idea that Cyc or something Cyc-like could be a piece of the puzzle for AGI.
I say "Cyc-like" because my personal opinion is that the actual Cyc system is struggling under 30-odd years of rapidly accruing technical debt and while it can do some impressive things, it doesn't represent the full potential of something that could be built using the lessons learned along the way.
But the longer I worked there the more I felt like the plan was basically:
1. Manually add more and more common-sense knowledge and extend the inference engine
2. ???
3. AGI!
When it comes to AI, the questions for me are basically always: what does the process by which it learns look like? Is it as powerful as human learning, and in what senses? How does it scale?
The target is something that can bootstrap: it can seek out new knowledge, creatively form its own theories and test them, and grow its own understanding of the world without its knowledge growth being entirely gated by human supervision and guidance.
The current popular approach to AI is statistical machine learning, which has improved by leaps and bounds in recent years. But when you look at it, it's still basically just more and more effective forms of supervised learning on very strictly defined tasks with pretty concrete metrics for success. Sure, we got computers to the point where they can play out billions of games of Chess or Go in a short period of time, and gradient descent algorithms to the point where they can converge to mastery of the tasks they're assigned much faster - in stopwatch time - than humans. But it's still gated almost entirely by human supervision - we have to define a pretty concrete task and set up a system to train the neural nets via billions of brute force examples.
The out-of-fashion symbolic approach behind Cyc takes a different strategy. It learns in two ways: ontologists manually enter knowledge in the form of symbolic assertions (or set up domain-specific processes to scrape things in), and then it expands on that knowledge by inferring whatever else it can given what it already knows. It's gated by the human hand in the manual knowledge acquisition step, and in the boundaries of what is strictly implied by its inference system.
In my opinion, both of those lack something necessary for AGI. It's very difficult to specify what exactly that is, but I can give some symptoms.
A real AGI is agentive in an important sense - it actively seeks out things of interest to it. And it creatively produces new conceptual schemes to test out against its experience. When a human learns to play chess, they don't reason out every possible consequence of the rules in exactly the terms they were initially described in (which is basically all Cyc can do) or sit there and memorize higher-order statistical patterns in play through billions of games of trial and error (which is basically what ML approaches do). They learn the rules, reason about them a bit while playing games to predict a few moves ahead, play enough to get a sense of some of those higher order statistical patterns and then they do a curious thing: they start inventing new concepts that aren't in the rules. They notice the board has a "center" that its important to control, they start thinking in terms of "tempo" and "openness" so-on. The end result is in some ways very similar to the result of higher-order statistical pattern recognition, but in the ML case those patterns were hammered out one tiny change at a time until they matched reality, whereas in the human there's a moment where they did something very creative and had an idea and went through a kind of phase transition where they started thinking about the game in different terms.
I don't know how to get to AI that does that. ML doesn't - it's close in some ways but doesn't really do those inductive leaps. Cyc doesn't either. I don't think it can in any way that isn't roughly equivalent to manually building a system that can inside of Cyc. Interestingly, some of Doug Lenat's early work was maybe more relevant to that problem than Cyc is.
Anyway that's my two cents. As for the second question, I have no idea. I didn't come up with anything while I worked there.
I think people like Rodney Brooks are of the belief you need to start with robots that learn their environment and build up from there.
1. Manually add more and more common-sense knowledge and extend the inference engine
2. ???
3. AGI!
That's the same impression I had in the early days of expert systems. I once made the comment, "It's not going to work, but it's worth trying to find out why it won't work." I was thinking that rule-based inference was a dead end, but maybe somebody could reuse the knowledge base with something that works better.
Lol. It really sounds like none of the projects need Cyc. Sounds like the model is to bait smart engineers to work at an engineery company and then sell engineering consulting to companies who would never be able to land their own smart engineers.
That's a benefit of having a VM with JIT.
Which is laughably small in retrospect. I wonder what current estimates are.
Just like the Encyclopedia Brittanica has found its match in WikiPedia so CYC will find its match in something open. The engine - if the comments here are to be believed as still currently relevant - is a core that may be relevant and a huge number of domain specific hacks. Let's hope sooner or later CYC management comes to their senses and revives OpenCYC.
> > we have no reason to believe intelligence relies on [as-yet mysterious aspects of quantum physics]
you wrote
> We actually do have reason to believe that ...
and later clarified
> [some true premises], therefore there may be unknown physics involved in consciousness, and those unknown physics may not be computable.
Saying something could be is different from saying we have reason to believe it. There may be a soul. Absent convincing evidence of the soul, though, we shouldn't predicate other research on the idea that it exists.
More generally, the fact that currently humans are the only entity observed doing X does not mean you need to understand humans to understand X.
It's difficult to evaluate CYC given Cycorp's secrecy. Domingos called it something like a colossal failure, which is hard to believe, but it's hard to argue given how little info gets out of Cycorp. So does Cycorp have any interest in changing this? Do they care about improving their reputation in the AI community?
How would you compare Cyc (both the knowledge base and the inference engine) with current efforts in the semantic web? Is OWL/RDF just too simple to encode the kind of logic you think common sense needs?
Some people think that reasoning is the same. If we had a database of enough common sense facts there would be a tipping point were it becomes useful.
If we do build AI, maybe we'll never know if it's conscious. You can't know whether any other human is conscious, either. But you can know whether they make you laugh, or cry, or learn, or love. The knowable things are good enough.
I know the Lucas Godel incompleteness theorem type arguments. Whether successful or not, the counter arguments are certainly fallacious. E.g. just because I form a halting problem for myself does not mean I am not a halting oracle for uncomputable problems.
But, I have developed a more empirical approach, something that can be solved by the average person, not dealing with whether they can find the Godel sentence for a logic system.
Also, there is a lot of interesting research showing that humans are very effective at approximating solutions to NP complete problems, apparently better than the best known algorithms. While not conclusive proof in itself, such examples are very surprising if there is nothing super computational about the human mind, and less so if there is.
At any rate, there are a number of lines of evidence I'm aware of that makes the uncomputable mind a much more plausible explanation for what we see humans do, ignoring the whole problem of consciousness. I'm just concerned with empirical results, not philosophy or math. As such, I don't really care what some journal's idea of the burden of proof is. I care about making discoveries and moving our scientific knowledge and technology forward.
Additionally, this is not some academic speculation. If the uncomputable mind thesis is true, then there are technological gains to be made, such as through human in the loop approaches to computation. Arguably, that is where all the successful AI and ML is going these days, so that serves as yet one more line of evidence for the uncomputable mind thesis.
The internal factors are less about intentionally hiding things and more about not committing any resources to being open. A lot of folks within Cycorp would like for the project to be more open, but it wasn't prioritized within the company when I was there. The impression that I got was that veterans there sort of feel like the broader AI community turned their back on symbolic reasoning in the 80s (fair) and they're generally not very impressed by the current trends within the AI community, particularly w.r.t. advances in ML (perhaps unfairly so), so they're going to just keep doing their thing until they can't be ignored anymore. "Their thing" is basically paying the bills in the short term while slowly building up the knowledge base with as many people as they can effectively manage and building platforms to make knowledge entry and ontological engineering smoother in the future. Doug Lenat is weirdly unimpressed by open-source models, and doesn't really see the point of committing resources to getting anyone involved who isn't a potential investor. They periodically do some publicity (there was a big piece in Wired some time ago) but people trying to investigate further don't get very far, and efforts within the company to open things up or revive OpenCyc tend to fall by the wayside when there's project work to do.
2. I don't know that much about this subject, but it's a point of common discussion within the company. Historically, a lot of the semantic web stuff grew out of efforts made by either former employees of Cycorp or people within a pretty close-knit intellectual community with common interests. OWL/RDF is definitely too simple to practically encode the kind of higher order logic that Cyc makes use of. IIRC the inference lead Keith Goolsbey was working on a kind of minimal extension to OWL/RDF that would make it suitable for more powerful knowledge representation, but I don't know if that ever got published.
Applications in litigation support/e-discovery?
If we build AI we could only know if its conscious if we know what conscious is, and that is something we do not know, and perhaps will never know. It could be fundamentally beyond our comprehension.
And I don't think we have a completely firm grasp on what is possible computationally with a given amount of physical resources, given the development of quantum computing.
That jumps out at me, because I do a lot of "unconscious thinking" to solve problems and I feel like I've read where other people describe similar experiences.
Besides the cliche of solving problems in your sleep, I sometimes have an experience where consciously focusing on solving a problem leads to a blind alley, and distracting my conscious mind with something else somehow lets a background task run to "defrag" or something. But on the other hand there is "bad" distraction too - I'm not sure offhand what the difference is.
It's possible that I'm far from typical, but I also suspect people of different types and intellects might process things in very different ways too.
But to me, I definitely have a strong sense much of the time that my conscious mind engages in the receipt of information about something complex and then the actual analysis is happening somewhere invisible to me in my brain. I'm frequently conscious that I'm figuring something out and yet unaware of the process.
It particularly seems weird to me that other people often seem to be convinced they are conscious of their thought processes, because surely the type of person who is not a knowledge worker isn't? I'm not sure if my way of thinking is the "smart way", the "dumb way", or just weird, but I'm sure that there is significant diversity among people in general.
Sometimes I wonder if the model of AI is the typical mind of a very small subset of humanity that's unlike the rest, kind of like the way psychological experiments have been biased towards college students since that's who they could easily get.
That's not true either.
There are plenty of materialists who think the universe is not computable, thus it's totally possible to believe that the mind is not computable despite being entirely physical.
> Interestingly, some of Doug Lenat's early work was maybe more relevant to that problem than Cyc is.
Yeah, Eurisko was really impressive, I often wondered why people don't work on that kind of stuff anymore.
Which6is the most promising AGI project according to you?
* it doesn't represent the full potential of something that could be built using the lessons learned along the way.*
the lessons learned What are those lessons? I would like to benefit from them instead to reproduce your past mistakes.
I was visiting MCC during the startup phase and Bobby Inman spent a little time with me. He had just hired Doug Lenat, but Lenat was not there yet. Inman was very excited to be having Lenat on board. (Inman was on my board of directors and furnished me with much of my IR&D funding for several years.)
From an outsider’s perspective, I thought that the business strategy of Open Cyc made sense, because many of us outside the company had the opportunity to experiment with it. I still have backups of the last released version, plus the RDF/OWL releases.
Personally, I think we are far from achieving AGI. We need some theoretical breakthroughs (I would bet on hybrid symbolic, deep learning, and probabilistic graph models). We have far to go, but as the Buddha said, enjoy the journey.
I wonder what would have happened with Cyc if twenty years ago a funding manager at DARPA had provided incentives to have Cyc entirely open. This might have led to major code refactoring, many more contributions, etc. even understanding that adding common sense knowledge to Cyc requires special skills and education.
So, if a macro phenomena, i.e. the human mind, is uncomputable, then it is not emergent from the low computable physical substrate.
The hypothesis that the mind is computable but is using heuristics, of various levels of sophistication, explains the data better and is more parsimonious than your hypothesis, because we already have reason to believe that the mind uses heuristics extensively.
Where you see uncomputable oracular insights, others see computable combinations of heuristics. If you introspect deeply enough while problem-solving, you may be able to sense the heuristics working prior to the flash of intuition.
I'm a ML researcher working on Deep Learning for robotics. I'm skeptical of the symbolic approach by which 1) ontologists manually enter symbolic assertions and 2) the system deduces further things from its existing ontology. My skepticism comes from a position of Slavic pessimism: we don't actually know how to formally define any object, much less ontological relationships between objects. If we let a machine use our garbage ontologies as axioms with which to prove further ontological relationships, the resulting ontology may be completely disjoint from the reality we live in. There must be a forcing function with which reality tells the system that its ontology is incorrect, and a mechanism for unwinding wrong ontologies.
I'm reminded of a quote from the Alien, Covenant movie.
Walter : When one note is off, it eventually destroys the whole symphony, David.
I think I agree that my problem solving is connected with conscious thought, but the heavy lifting is mostly (or at least frequently) done by something that "I" am not aware of in detail.
When someone is explaining something complicated, pretty often, maybe not always, my (conscious) mind is pretty blank. I can say "yeah, I'm following you", but I feel like I'm not. Then when I start working on it, I feel like I am fumbling around for the keys to unlock some background processing that was happening in the meantime.
Also, when I am in a state where I am consciously writing something elaborate, and I feel connected to the complex concepts behind it, sometimes I get stuck in a blind alley. My context seems too narrow, and often I can get unstuck by just doing something unrelated to distract my conscious mind, like browsing news on my phone and then it's like a stuck process was terminated and I realize what I need to change on a higher level of abstraction.
It's possible I have some sort of inherent disability that I am compensating for by using a different part of my brain than normal, I suppose.
I don't think your argument will seem compelling to anyone who doesn't already have a strong prior belief that the mind is non-physical.
I'm not familiar with the boolean circuit problem, but I wonder if it's an instance where the NP hardness comes from specific edge cases, and whether your experiment tested said edge cases. Compare with the fact that the C++ compiler is Turing complete: its Turing completeness arises from compiling extremely contrived bizzarro code that would never come up in practice. So for everyday code, humans can answer the question, "Will the C++ compiler enter an infinite loop when it tries to compile this code?", quite easily, just by answering "No." every time. That doesn't mean humans can solve the halting problem, though.
But, the bigger point is why are not others doing this kind of research? It does not seem out of the realm of conceptual possibility, since someone as myself came up with a test. And the question is prior to all the big AI projects we currently have going on.
If I use a mechanical grabber aid to reach something, then it isn't figuring out how to do anything. But if I ask Wolfram Alpha the answer to a math problem, it isn't me doing it.
One assigns a prior to a class of hypotheses, and the cardinality of that set does not change the total probability you assign to the entire hypothesis class.
If one instead assigns a constant non-zero prior to each individual hypothesis of an infinite class, a grievous error has been committed and inconsistent and paradoxical beliefs can be the only result.
> in the ML case those patterns were hammered out one tiny change at a time until they matched reality, whereas in the human there's a moment where they did something very creative and had an idea and went through a kind of phase transition where they started thinking about the game in different terms.
Phase transition, or the "aha" moment, where things start to logically make sense. Humans have that moment. Knowledge gets crystallized in the same sense water starting to form a crystal structure. The regularity in the structure offers the ability to extrapolate, which is what current ML is known to be poor at.
Agreed.
However, when you write:
> the evidence makes the uncomputable partial Oracle the most likely hypothesis, since the space of uncomputable partial oracles is much much larger
you seem to argue that a hypothesis is more likely because it represents a larger (indeed infinite) space of sub-hypotheses. Reasoning from the cardinality of a set of hypotheses to a degree of belief in the set would in general seem to be unsound.