zlacker

> Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

I guess that turns out to be not as important for end users as you'd think.

Anyway, DeepFloyd/IF has great comprehension. It is straightforward to improve that for Stable Diffusion, I cannot tell you exactly why they haven't tried this.

replies(1): >>astran+sW

>>doctor+(OP)
Deepfloyd is slower and needs a lot more memory since it's pixel diffusion.

Also not sure if it can be extended with LORAs or by turning it into a video/3D model the same way an LDM can.