LLM with Planning

>>mercat+(OP)
Sometimes i wonder if text generation could be formulated as a planning/optimization problem and if that facility could solve embedded planning problems as a byproduct.

>>PaulHo+81
RL in ChatGPT is used for that: to generate text that maximizes reward. But if you have other domains with their reward functions, then you could plan on them

>>qumpis+o7
My impression is that the complex optimization happens during training but that the actual inference is using some kind of greedy algorithm like beam search. If the inference algorithm was using simulated annealing or something like that that would be different.

zlacker