zlacker

During fine tuning the model does not produce reasoning traces, it consumes them. And the researchers presented it with traces deliberately constructed to be wrong except for the answer at the end. That's easy enough to do.