Cyc - zlacker

>>mdszy+(OP)
I worked for Cycorp for a few years recently. AMA, I guess? I obviously won't give away any secrets (e.g. business partners, finer grained details of how the inference engine works), but I can talk about the company culture, some high level technical things and the interpretation of the project that different people at the company have that makes it seem more viable than you might guess from the outside.

There were some big positives. Everyone there is very smart and depending on your tastes, it can be pretty fun to be in meetings where you try to explain Davidsonian ontology to perplexed business people. I suspect a decent fraction of the technical staff are reading this comment thread. There are also some genuine technical advances (which I wish were more publicly shared) in inference engine architecture or generally stemming from treating symbolic reasoning as a practical engineering project and giving up on things like completeness in favor of being able to get an answer most of the time.

There were also some big negatives, mostly structural ones. Within Cycorp different people have very different pictures of what the ultimate goals of the project are, what true AI is, and how (and whether) Cyc is going to make strides along the path to true AI. The company has been around for a long time and these disagreements never really resolve - they just sort of hang around and affect how different segments of the company work. There's also a very flat organizational structure which makes for a very anarchic and shifting map of who is responsible or accountable for what. And there's a huge disconnect between what the higher ups understand the company and technology to be doing, the projects they actually work on, and the low-level day-to-day work done by programmers and ontologists there.

I was initially pretty skeptical of the continued feasibility of symbolic AI when I went in to interview, but Doug Lenat gave me a pitch that essentially assured me that the project had found a way around many of the concerns I had. In particular, they were doing deep reasoning from common sense principles using heuristics and not just doing the thing Prolog often devolved into where you end up basically writing a logical system to emulate a procedural algorithm to solve problems.

It turns out there's a kind of reality distortion field around the management there, despite their best intentions - partially maintained by the management's own steadfast belief in the idea that what Cyc does is what it ought to be doing, but partially maintained by a layer of people that actively isolate the management from understanding the dirty work that goes into actually making projects work or appear to. So while a certain amount of "common sense" knowledge factors into the reasoning processes, a great amount of Cyc's output at the project level really comes from hand-crafted algorithms implemented either in the inference engine or the ontology.

Also the codebase is the biggest mess I have ever seen by an order of magnitude. I spent some entire days just scrolling through different versions of entire systems that duplicate massive chunks of functionality, written 20 years apart, with no indication of which (if any) still worked or were the preferred way to do things.

>>catpol+Zz
Two questions.

It's difficult to evaluate CYC given Cycorp's secrecy. Domingos called it something like a colossal failure, which is hard to believe, but it's hard to argue given how little info gets out of Cycorp. So does Cycorp have any interest in changing this? Do they care about improving their reputation in the AI community?

How would you compare Cyc (both the knowledge base and the inference engine) with current efforts in the semantic web? Is OWL/RDF just too simple to encode the kind of logic you think common sense needs?

>>drongo+Ks1
1. There are internal and external factors when it comes to Cycorp's secrecy. The external factors come from the clients they work with, who often demand confidentiality. Some of their most successful projects are extremely closely guarded industry secrets. I think people at Cycorp would love to publicly talk a lot more about their projects if they could, but the clients don't want the competition getting wind of the technology.

The internal factors are less about intentionally hiding things and more about not committing any resources to being open. A lot of folks within Cycorp would like for the project to be more open, but it wasn't prioritized within the company when I was there. The impression that I got was that veterans there sort of feel like the broader AI community turned their back on symbolic reasoning in the 80s (fair) and they're generally not very impressed by the current trends within the AI community, particularly w.r.t. advances in ML (perhaps unfairly so), so they're going to just keep doing their thing until they can't be ignored anymore. "Their thing" is basically paying the bills in the short term while slowly building up the knowledge base with as many people as they can effectively manage and building platforms to make knowledge entry and ontological engineering smoother in the future. Doug Lenat is weirdly unimpressed by open-source models, and doesn't really see the point of committing resources to getting anyone involved who isn't a potential investor. They periodically do some publicity (there was a big piece in Wired some time ago) but people trying to investigate further don't get very far, and efforts within the company to open things up or revive OpenCyc tend to fall by the wayside when there's project work to do.

2. I don't know that much about this subject, but it's a point of common discussion within the company. Historically, a lot of the semantic web stuff grew out of efforts made by either former employees of Cycorp or people within a pretty close-knit intellectual community with common interests. OWL/RDF is definitely too simple to practically encode the kind of higher order logic that Cyc makes use of. IIRC the inference lead Keith Goolsbey was working on a kind of minimal extension to OWL/RDF that would make it suitable for more powerful knowledge representation, but I don't know if that ever got published.