Imagen, a text-to-image diffusion model

>>kevema+(OP)
Can anybody give me short high-level explanation how the model achieves these results? I'm especially interested in the image synthesis, not the language parsing.

For example, what kind of source images are used for the snake made of corn[0]? It's baffling to me how the corn is mapped to the snake body.

[0] https://gweb-research-imagen.appspot.com/main_gallery_images...

>>geonic+bn1
In the paper they say about half the training data was an internal training set, and the other half came from: https://laion.ai/laion-400-open-dataset/

zlacker