From the interviews with him that I have seen, Sutskever thinks that language model is a sufficient pretraining task because there is a great deal of reasoning involved in next token prediction. The example he used was that suppose you fed a murder mystery novel to a language model and then prompted it with the phrase "The person who committed the model was: ". The model would unquestionably need to reason in order to come to the right conclusion, but at the same time it is just predicting the next token.