Cyc - zlacker

>>mdszy+(OP)
I worked for Cycorp for a few years recently. AMA, I guess? I obviously won't give away any secrets (e.g. business partners, finer grained details of how the inference engine works), but I can talk about the company culture, some high level technical things and the interpretation of the project that different people at the company have that makes it seem more viable than you might guess from the outside.

There were some big positives. Everyone there is very smart and depending on your tastes, it can be pretty fun to be in meetings where you try to explain Davidsonian ontology to perplexed business people. I suspect a decent fraction of the technical staff are reading this comment thread. There are also some genuine technical advances (which I wish were more publicly shared) in inference engine architecture or generally stemming from treating symbolic reasoning as a practical engineering project and giving up on things like completeness in favor of being able to get an answer most of the time.

There were also some big negatives, mostly structural ones. Within Cycorp different people have very different pictures of what the ultimate goals of the project are, what true AI is, and how (and whether) Cyc is going to make strides along the path to true AI. The company has been around for a long time and these disagreements never really resolve - they just sort of hang around and affect how different segments of the company work. There's also a very flat organizational structure which makes for a very anarchic and shifting map of who is responsible or accountable for what. And there's a huge disconnect between what the higher ups understand the company and technology to be doing, the projects they actually work on, and the low-level day-to-day work done by programmers and ontologists there.

I was initially pretty skeptical of the continued feasibility of symbolic AI when I went in to interview, but Doug Lenat gave me a pitch that essentially assured me that the project had found a way around many of the concerns I had. In particular, they were doing deep reasoning from common sense principles using heuristics and not just doing the thing Prolog often devolved into where you end up basically writing a logical system to emulate a procedural algorithm to solve problems.

It turns out there's a kind of reality distortion field around the management there, despite their best intentions - partially maintained by the management's own steadfast belief in the idea that what Cyc does is what it ought to be doing, but partially maintained by a layer of people that actively isolate the management from understanding the dirty work that goes into actually making projects work or appear to. So while a certain amount of "common sense" knowledge factors into the reasoning processes, a great amount of Cyc's output at the project level really comes from hand-crafted algorithms implemented either in the inference engine or the ontology.

Also the codebase is the biggest mess I have ever seen by an order of magnitude. I spent some entire days just scrolling through different versions of entire systems that duplicate massive chunks of functionality, written 20 years apart, with no indication of which (if any) still worked or were the preferred way to do things.

>>catpol+Zz
How do you deal with the fact that human knowledge is probabilistic? I.e. that it's actually mostly "belief" rather than a "fact", and the "correct" answer heavily depends on the context in a somewhat Bayesian way. Best I can tell we don't yet have math to model this in any kind of a comprehensive way.

>>m0zg+4R
Cyc has a few ways of dealing with fallibility.

Cyc doesn't do anything Bayesian like assigning specific probabilities to individual beliefs - IIRC they tried something like that and it had the problem where nobody felt very confident about attaching any particular precise number to priors and also the inference chains can be so long and involve so many assertions that anything less than 1 probability for most assertions would result in conclusions with very low confidence levels.

As to what they actually do, there are a few approaches.

I know that for one thing, there are coarse grained epistemic levels of belief built into the representation system - some predicates have "HighLikelihoodOf___" or "LowLikelihoodOf___" versions that enable very rough probabilistic reasoning that (it's argued - I have no position on this) is actually closer to the kind of folk-probabilistic thinking that humans actually do.

Also Cyc can use non-monotonic logic, which I think is relatively unique for commercial inference engines. I'm not going to give the best explanation here, but effectively, Cyc can assume that some assertions are "generally" true but may have certain exceptions, which makes it easy to express a lot of facts in a way that's similar to human reasoning. In general, mammals don't lay eggs. So you can assert that mammals don't lay eggs. But you can also assert that statement is non-monotonic and has exceptions (e.g. Platypuses).

Finally, and this isn't actually strictly about probabilistic reasoning, but helps represent different kinds of non-absolute reasoning: knowledge in Cyc is always contextualized. The knowledge base is divided up into "microtheories" of contexts where assertions are given to hold as if they're both true and relevant - very little is assumed to be always true across the board. This allows them to represent a lot of different topics, conflicting theories or even fictional worlds - there are various microtheories used for reasoning events in about popular media franchises, where the same laws of physics might not apply.

>>catpol+2U
What is your opinion about the popular trend of making everything probabilistic, especially in ML, in favor of default logic? For example, does it make sense to say "mammals lay eggs by 98%" because of the Platypuses exception?