Imagen, a text-to-image diffusion model

>>kevema+(OP)
I thought I was doing well after not being overly surprised by DALL-E 2 or Gato. How am I still not calibrated on this stuff? I know I am meant to be the one who constantly argues that language models already have sophisticated semantic understanding, and that you don't need visual senses to learn grounded world knowledge of this sort, but come on, you don't get to just throw T5 in a multimodal model as-is and have it work better than multimodal transformers! VLM[1] at least added fine-tuned internal components.

Good lord we are screwed. And yet somehow I bet even this isn't going to kill off the they're just statistical interpolators meme.

[1] https://www.deepmind.com/blog/tackling-multiple-tasks-with-a...

>>Veedra+7w
I firmly believe that ~20-40% of the machine learning community will say that all ML models are dumb statistical interpolators all the way until a few years after we achieve AGI. Roughly the same groups will also claim that human intelligence is special magic that cannot be recreated using current technology.

I think it’s in everyone’s benefit if we start planning for a world where a significant portion of the experts are stubbornly wrong about AGI. As a technology, generally intelligent ML has the potential to change so many aspects of our world. The dangers of dismissing the possibility of AGI emerging in the next 5-10 years are huge.

>>axg11+8z
You should be much more concerned about the prospect of nuclear war right now than the sudden emergence of an AGI.

zlacker