Cyc - zlacker

>>mdszy+(OP)
I worked for Cycorp for a few years recently. AMA, I guess? I obviously won't give away any secrets (e.g. business partners, finer grained details of how the inference engine works), but I can talk about the company culture, some high level technical things and the interpretation of the project that different people at the company have that makes it seem more viable than you might guess from the outside.

There were some big positives. Everyone there is very smart and depending on your tastes, it can be pretty fun to be in meetings where you try to explain Davidsonian ontology to perplexed business people. I suspect a decent fraction of the technical staff are reading this comment thread. There are also some genuine technical advances (which I wish were more publicly shared) in inference engine architecture or generally stemming from treating symbolic reasoning as a practical engineering project and giving up on things like completeness in favor of being able to get an answer most of the time.

There were also some big negatives, mostly structural ones. Within Cycorp different people have very different pictures of what the ultimate goals of the project are, what true AI is, and how (and whether) Cyc is going to make strides along the path to true AI. The company has been around for a long time and these disagreements never really resolve - they just sort of hang around and affect how different segments of the company work. There's also a very flat organizational structure which makes for a very anarchic and shifting map of who is responsible or accountable for what. And there's a huge disconnect between what the higher ups understand the company and technology to be doing, the projects they actually work on, and the low-level day-to-day work done by programmers and ontologists there.

I was initially pretty skeptical of the continued feasibility of symbolic AI when I went in to interview, but Doug Lenat gave me a pitch that essentially assured me that the project had found a way around many of the concerns I had. In particular, they were doing deep reasoning from common sense principles using heuristics and not just doing the thing Prolog often devolved into where you end up basically writing a logical system to emulate a procedural algorithm to solve problems.

It turns out there's a kind of reality distortion field around the management there, despite their best intentions - partially maintained by the management's own steadfast belief in the idea that what Cyc does is what it ought to be doing, but partially maintained by a layer of people that actively isolate the management from understanding the dirty work that goes into actually making projects work or appear to. So while a certain amount of "common sense" knowledge factors into the reasoning processes, a great amount of Cyc's output at the project level really comes from hand-crafted algorithms implemented either in the inference engine or the ontology.

Also the codebase is the biggest mess I have ever seen by an order of magnitude. I spent some entire days just scrolling through different versions of entire systems that duplicate massive chunks of functionality, written 20 years apart, with no indication of which (if any) still worked or were the preferred way to do things.

>>dmix+bF
1) The money situation has changed over the years, and they've had times where things have boomed or busted - it's been a while since I left but I think they're still in a "boom" phase. There are a lottt more projects with different companies or organizations than the ones listed on the wiki, but they tend to be pretty secretive and I won't name names.

The categories of projects that I was familiar with were basically proof of concept work for companies or government R&D contracts. There are lots of big companies that will throw a few million at a long-shot AI project just to see if it pays off, even if they don't always have a very clear idea of what they ultimately want or a concrete plan to build a product around it. Sometimes these would pay off, sometimes they wouldn't but we'd get by on the initial investment for proof of concept work. Similarly, organizations like DARPA will fund multiple speculative projects around a similar goal (e.g. education - that's where "Mathcraft" came from IIRC) to evaluate the most promising direction.

There have been a few big hits in the company's history, most of which I can't talk about. The hits have basically been in very circumscribed knowledge domains where there's a lot of data, a lot of opportunity for simple common sense inferences (e.g. if Alice worked for the ABC team of company A at the same time Bob worked for the XYZ team of company B and companies A and B were collaborating on a project involving the ABC and XYZ teams at that same time, then Alice and Bob have probably met) and you have reason to follow all those connections looking for patterns, but it's just too much data for a human to make a map of. Cyc can answer questions about probable business or knowledge relationships between individuals in large sets of people in a few seconds, which would be weeks of human research and certain institutions pay a high premium for that kind of thing.

2) Oh god. Get ready. Here's a 10k foot overview of a crazy thing. All this is apparent if you use OpenCyc so I feel pretty safe talking about it. Cyc is divided into the inference engine and the knowledge base. Both are expressed in different custom LISPy dialects. The knowledge base language is like a layer on top of the inference engine language.

The inference engine language has LISPy syntax but is crucially very un-LISPy in certain ways (way more procedural, no lambdas, reading it makes me want to die). To build the inference engine, you run a process that translates the inference code into Java and compiles that. Read that closely - it doesn't compile to JVM bytecode, it transpiles to Java source files, which are then compiled. This process was created before languages other than Java targeting the JVM were really a thing. There was a push to transition to Clojure or something for the next version of Cyc, but I don't know how far it got off the ground because of 30 years of technical debt.

The knowledge base itself is basically a set of images running on servers that periodically serialize their state in a way that can be restarted - individual ontologists can boot up their own images, make changes and transmit those to the central images. This model predates things like version control and things can get hairy when different images get too out of sync. Again, there was an effort to build a kind of git-equivalent to ease those pains, which I think was mostly finished but not widely adopted.

There are project-specific knowledge base branches that get deployed in their own images to customers, and specific knowledge base subsets used for different things.