My hunch is that they summarize the conversation periodically and inject that as additional system prompt constraints.
That was a common hack for the LLM context length problem, but now that context length is "solved" it could be more useful to align output a bit better.