zlacker

Disagree. We know it _can_ learn out of distribution capabilities based on similarities to other distributions. Like the TikZ Unicorn[1] (which was not in training data anywhere) or my code (which has variable names and methods/ideas probably not seen 1:1 in training).

IMO this out of distribution learning is all we need to scale to AGI. Sure there are still issues, it doesn't always know which distribution to pick from. Neither do we, hence car crashes.

[1]: https://arxiv.org/pdf/2303.12712 or on YT https://www.youtube.com/watch?v=qbIk7-JPB2c