https://arxiv.org/abs/2502.18600
Similar level of results in a fraction of the tokens resulting in similar quality for less cost for longer runs.
But also when interacting and needing to read the token responses I can read shorter responses way faster so my own speed is faster.