>>famous+OX1
Please read the paper. The authors are using
more precise and specific metrics that qualitatively measure the same thing. Instead of having exact string match being 1 if 100% correct, 0 if there is any failure, they use per-token error. The crux of their argument is that per-token error is a better choice of metric anyway, and the fact that "emergent abilities" do not occur when using this metric is a strong argument that those abilities don't really exist.
However thermal energy does not more precisely or specifically measure a phase transition. They are only indirectly linked - nobody would say that thermal energy is a better measure of state-of-matter than solid/liquid/gas. Your argument makes absolutely zero sense. Frankly it seems intentionally ignorant.