A lot of the issues I'd have when 'pretending' to have a conversation are much less so when I either keep things to a single Q/A pairing, or at the very least heavily edit/prune the conversation history. Based on my understanding of LLM's, this seems to make sense even for the models that are trained for conversational interfaces.
so, for example, an exchange with multiple messages, where at the end I ask the LLM to double-check the conversation and correct 'hallucinations', is less optimal than something like asking for a thorough summary at the end, and then feeding that into a new prompt/conversation, as the repetition of these falsities, or 'building' on them with subsequent messages, is more likely to make them a stronger 'presence' and as a result perhaps affect the corrections.
I haven't tested any of this thoroughly, but at least with code I've definitely noticed how a wrong piece of code can 'infect' the conversation.
'Dont use regex for this task' is a common addition for the new chat. Why does AI love regex for simple string operations?