Not what I would have expected from a 'one-shot'. Maybe self-supervised would be a more suitable term?
It doesn't do it in one-shot on the GPU either. It feeds outputs back into inputs over and over. By the time you see tokens as an end-user, the clanker has already made a bunch of iterations.