The underlying idea, though, has been done before. Basically we know that LLMs can do better if instead of solving a problem on their own, they write a program that solves it (look up PAL).
This one. It looks like they're using GPT3 to translate the natural-language problem context and goal into a format called PDDL (planning domain definition language), then feeding the result into a separate program that generates a plan based on the context and goal.
With that in mind, the thing they're really testing here is how well GPT3 can translate the natural-language prompt into PDDL, evaluated on the basis of whether the generated PDDL can actually solve the problem and how long the resulting solution takes.
Naturally, I could be wrong but that's at least what it looks like.
The paper introduces LLM+P, a framework that combines the strengths of classical planners with large language models (LLMs) to solve long-horizon planning problems. LLM+P takes in a natural language description of a planning problem, converts it into a PDDL file, leverages classical planners to find a solution, and then translates the solution back into natural language. The authors provide a set of benchmark problems and find that LLM+P is able to provide optimal solutions for most problems, while LLMs fail to provide even feasible plans for most problems. The paper suggests that LLM+P can be used as a natural language interface for giving tasks to robot systems. The authors also propose that classical planners can be another useful external module for improving the performance of downstream tasks of LLMs. The paper highlights the importance of providing context (i.e., an example problem and its corresponding problem PDDL) for in-context learning, and suggests future research directions to further extend the LLM+P framework.
PPDL: https://en.wikipedia.org/wiki/Planning_Domain_Definition_Lan...
Planning, yes, but that’s a verb that casts a very wide net.
When might one write PDDL? Be it specific tasks, or industries it is used in - the examples I’ve found online all have a robotic theme, yet the idea seems much more general.
What do they do with it once they’ve written it?
What does it solve (as opposed to just having the existence of a file that outlines objects, predicates, actions etc)?
PDDL is designed to be machine readable, but also human-readable and writable. I would say you would write PDDL when you want to provide a description of the rules of a domain to an algorithm that does automated planning and acting. This could be an autonomous agent of any sort, doesn't necessarily have to be embodied/robotic in nature.
The article uses "planning" to mean "classical planning", which is a very specific thing, although it's such a fundamental concept in AI research that it is very difficult to find a simple definition (there's a lot of useless stuff on the internet about it, like tutorials that don't explain what it is they're tutorial-ing, or slides that don't give much context).
Even the Wikipedia article is not very well written. I followed this link to one of its references though and there's an entire textbook, available as a free pdf:
https://projects.laas.fr/planning/
In general, classical planning is one of those domains where GOFAI approaches continue to dominate over nouveau AI, statistical machine learning-based approaches. You'll have to take my word for that, though, because that's what I know from experience, and I don't have any references to back that up. On the other hand, if it wasn't the case, you wouldn't see papers like the one linked above, I suppose.
To clarify, the paper above makes it clear that LLMs, for one, are useless for planning but at least they can translate between natural language and PDDL, so that a planning problem can be handed off to a classical planning engine, that can actually do the job. How useful is that, I don't know. A human expert would probably do a better job of writing PDDL from scratch, but that's never explored in the linked article.
Story planners start with the premise that the story generation process is a goal-driven process and apply some form of symbolic planner to the problem of generating a fabula. The plan is the story.
https://thegradient.pub/an-introduction-to-ai-story-generati...
As an aside, it is obvious from that The Gradient article I link above that story generation was doing just fine until LLMs came along and claimed they can do it right for the first time ever. I can see that the earlier approaches took some careful hand-engineering, but they also seemed to more reliably generate coherent stories that made sense (although it looks like maybe they didn't have very rich themes and development etc). But then, that's the trade-off you get between classical approaches and big machine learning: either you have to roll up those sleeves and use some elbow grease, or you have to label giant reams of data and pay the giant price of compute needed to train on them. In a sense, the claimed advance of deep learning is that domain experts can be replaced by cheaply paid inexpert labellers, plus some very big GPU clusters.
To summarise, they assume a human expert can provide a domain description, specifying all actions that can be taken at each situation, and their effects. Then it looks like they include that domain description to the prompt, along with an example of the kind of planning task they want it to solve, and get the LLM to generate PDDL in the context of the prompt.
GPT-4 can use its ability to encode problems in PDDL and in-context learning to infer the problem PDDL file corresponding to a given problem (P). This can be done by providing the model with a minimal example that demonstrates what a correct problem PDDL looks like for a simple problem in the domain, as well as a problem description in natural language and its corresponding problem PDDL. This allows the model to leverage its ability to perform unseen downstream tasks without having to finetune its parameters.