zlacker

The obvious way to do this would be as adversarial networks like in GANs for image generation. Have the existing LLM as the generator trained exactly as at present but with an additional penalty for being found to have committed an error and have another network trained at the same time as a validator where its fitness function is finding errors in the output of the generator.

People must be doing this, probably just takes a while for the research to bear fruit.

Some of these errors are so obvious I can’t imagine this would be too hard. For an example, try asking an LLM “generate me a system of two equations in two unknowns. Both the coefficients and the solutions must be integers between -10 and 10”. In my experience it will generate a valid system. Some of the time the coefficients will be in the range specified. Probably about a third to a half the time the solution it gives will be wrong and when you ask for an explanation of the solution it will make some basic arithmetic error (eg flipping a sign etc). Then when you point out the error it will correct.