In JSON Schema, you can do a “oneOf” between two types. You can easily convert a JSON Schema into the grammar that llama.cpp expects. One of the types would be the success response, the other type would be an error response, such as a JSON object containing only the field “ErrorResponse”, which is required to be a string, which you explain to the model that this is used to provide an explanation for why it cannot complete the request. It will literally fill in an explanation when it runs into troublesome data, at least in my experience.
Then the model can “choose” which type to respond with, and the grammar will allow either.
If everything makes sense, the model should provide the successful response you’re requesting, otherwise it can let you know something weird is going on by responding with an error.
Ah I see. So you give the entire "monadic" grammar to the LLM, both as a `grammar` argument and as part of the prompt so it knows the "can't do that" option exists.
I'm aware of the "OR" statements in grammar (my original comment uses that). In my experience though, small models quickly get confused when you add extra layers to the JSON schema.
But, this is all very new stuff, so certainly worth experimenting with all sorts of different approaches.
As far as small models getting confused, I’ve only really tested this with Mixtral, but it’s entirely possible that regular Mistral or other small models would get confused… more things I would like to get around to testing.
This is obviously not efficient because the model has to process many more tokens at each interaction, and its context window gets full quicker as well. I wonder if others have found better solutions.