zlacker

[return to "Google Imagen 2"]
1. apsec1+H8[view] [source] 2023-12-13 15:39:28
>>geox+(OP)
This would have been an epic release two years ago, but there are now many well-established models in this area (DALL-E, Midjourney, Stable Diffusion). It would be great to see some comparisons or benchmarks to show Imagen 2 is a better alternative. As it stands, it's hard for me to tell if this is worth switching to.
◧◩
2. Mashim+qb[view] [source] 2023-12-13 15:49:52
>>apsec1+H8
> it's hard for me to tell

I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.

Try to do anything with text and SDxl. It's not easy and often messes up. I don't think you can get a clean logo with multiple text areas on sdxl.

Look at the prompt and image of the robin. That is mighty impressive.

◧◩◪
3. Ologn+Bc[view] [source] 2023-12-13 15:54:41
>>Mashim+qb
Stability AI has gaps in SDXL for text, but they seem to do a better job with Deep Floyd ( https://github.com/deep-floyd/IF ). I have done a lot of interesting text things with Deep Floyd
◧◩◪◨
4. Mashim+ge[view] [source] 2023-12-13 15:59:57
>>Ologn+Bc
Looks good. But 24GB of vram is quite a lot for 1024x1024
◧◩◪◨⬒
5. orbita+wg[view] [source] 2023-12-13 16:09:23
>>Mashim+ge
This is a pixel diffusion model that doesn't use latent space encoding, hence the memory requirements. Besides, good prompt understanding requires large transformers for text encoding, usually far larger than the image generation part. DF IF is using T5.

You can use Harrlogos XL to produce text with SDXL, although it's mostly limited to short captions and logos. The other way (controlnets) is more involved. (and is actually useful)

[go to top]