zlacker

[parent] [thread] 6 comments
1. Mashim+(OP)[view] [source] 2023-12-13 15:49:52
> it's hard for me to tell

I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.

Try to do anything with text and SDxl. It's not easy and often messes up. I don't think you can get a clean logo with multiple text areas on sdxl.

Look at the prompt and image of the robin. That is mighty impressive.

replies(3): >>Ologn+b1 >>averev+I1 >>nabaki+A8
2. Ologn+b1[view] [source] 2023-12-13 15:54:41
>>Mashim+(OP)
Stability AI has gaps in SDXL for text, but they seem to do a better job with Deep Floyd ( https://github.com/deep-floyd/IF ). I have done a lot of interesting text things with Deep Floyd
replies(1): >>Mashim+Q2
3. averev+I1[view] [source] 2023-12-13 15:56:48
>>Mashim+(OP)
yeah stable diffusion has very limited understanding of composition instructions. you can reliably get things drawn, but it's super hard to get a specific thing in a specific place (i.e "a man with blonde hairs near a girl with black hairs" is gonna assign hair color more or less randomly and there's no guarantee on how many people will be on the picture) - regional prompting and control net somewhat help, but regional prompting is very unreliable and control net is, well, not text to image.

dalle 3 gets things right most of the time

◧◩
4. Mashim+Q2[view] [source] [discussion] 2023-12-13 15:59:57
>>Ologn+b1
Looks good. But 24GB of vram is quite a lot for 1024x1024
replies(1): >>orbita+65
◧◩◪
5. orbita+65[view] [source] [discussion] 2023-12-13 16:09:23
>>Mashim+Q2
This is a pixel diffusion model that doesn't use latent space encoding, hence the memory requirements. Besides, good prompt understanding requires large transformers for text encoding, usually far larger than the image generation part. DF IF is using T5.

You can use Harrlogos XL to produce text with SDXL, although it's mostly limited to short captions and logos. The other way (controlnets) is more involved. (and is actually useful)

6. nabaki+A8[view] [source] 2023-12-13 16:25:25
>>Mashim+(OP)
> I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.

I wouldn't say this until we are able to try it for ourselves. As we know, Google is prone to severe cherry picking and deceptive marketing.

replies(1): >>quitit+MJ
◧◩
7. quitit+MJ[view] [source] [discussion] 2023-12-13 18:33:27
>>nabaki+A8
Google has this thing of releasing concept videos but communicating them as product demos.

Overselling is not a winning strategy, especially when others are shipping genuinely good products.

Every time Google show off something new the first thing people now ask is what part Google faked (or extreme cherry picking).

[go to top]