But, this is all very new stuff, so certainly worth experimenting with all sorts of different approaches.
As far as small models getting confused, I’ve only really tested this with Mixtral, but it’s entirely possible that regular Mistral or other small models would get confused… more things I would like to get around to testing.
This is obviously not efficient because the model has to process many more tokens at each interaction, and its context window gets full quicker as well. I wonder if others have found better solutions.