Imagen takes text embeddings, OpenAI model takes image embeddings instead, this is the reason. There are other models that can generate text: latent diffusion trained on LAION-400M, GLIDE, DALL-E (1).
>>GaggiX+(OP)
My understanding of the terms text and image embeddings is that they are ways of representing text or images as vectors. But, I don't understand how that would help with the process of actually drawing the symbols for those letters.
>>ALittl+6b
If the model takes text embeddings/tokens as an input, it can create a connection between the caption and the text on the image (sometimes they are really similar).