zlacker

[parent] [thread] 0 comments
1. gwern+(OP)[view] [source] 2022-05-24 01:16:58
What you're missing is that the performance on a pretext task like ImageNet top-1 will transfer outside ImageNet, and as you go further into the high score regime, often a small % can yield qualitatively better results because the underlying NN has to solve harder and harder problems, eliciting true solutions rather than a patchwork of heuristics.

Nothing in a Transformer's perplexity in predicting the next token tells you that at some point it suddenly starts being able to write flawless literary style parodies, and this is why the computer art people become virtuosos of CLIP variants and are excited by new ones, because each one attacks concepts in slightly different ways and a 'small' benchmark increase may unlock some awesome new visual flourish that the model didn't get before.

[go to top]