>>mercat+(OP)
Sometimes i wonder if text generation could be formulated as a planning/optimization problem and if that facility could solve embedded planning problems as a byproduct.
>>PaulHo+81
RL in ChatGPT is used for that: to generate text that maximizes reward. But if you have other domains with their reward functions, then you could plan on them