Google Imagen 2

>>geox+(OP)
I think the competition for text to image services is over and open source, stable diffusion won. It doesn't matter how detailed (or whatever counts as "better") corporate text-to-image products get, stable diffusion is good enough which really is good enough. Unlike the corporate offerings, open source txt2img doesn't have random restrictions (no its not just porn at this point) and actually allows for additional scripts/tooling/models. If you're attempting to do anything on a professional level or produce an image with specific details via txt2img, you likely have a workflow with txt2img being only step one.

Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?

>>boh+VY
Why stable diffusion won? Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

This makes the image much more usable without editing.

>>karmas+C81
> Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

I guess that turns out to be not as important for end users as you'd think.

Anyway, DeepFloyd/IF has great comprehension. It is straightforward to improve that for Stable Diffusion, I cannot tell you exactly why they haven't tried this.

zlacker