zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. ALittl+Rl[view] [source] 2022-05-23 23:01:34
>>kevema+(OP)
Interesting to me that this one can draw legible text. DALLE models seem to generate weird glyphs that only look like text. The examples they show here have perfectly legible characters and correct spelling. The difference between this and DALLE makes me suspicious / curious. I wish I could play with this model.
◧◩
2. Tehdas+Hs[view] [source] 2022-05-23 23:57:13
>>ALittl+Rl
Still has the issue with screwing up mechanical objects. In their demo checkout the wheels on the skateboards, all over the place.
◧◩◪
3. sdento+7O[view] [source] 2022-05-24 03:27:14
>>Tehdas+Hs
For comparison, most humans can't draw a bicycle:

https://www.wired.com/2016/04/can-draw-bikes-memory-definite...

◧◩◪◨
4. dclowd+P31[view] [source] 2022-05-24 06:21:02
>>sdento+7O
I blame it on the surprisingly structural cleverness of a bicycle. Opposing triangles probably isn’t the first thing most people think of when they think of a bicycle (vs two wheels and some handlebars)
◧◩◪◨⬒
5. gwern+iB3[view] [source] 2022-05-24 22:41:58
>>dclowd+P31
They also can't draw pennies, the letter 'g' with the loop, and so on (https://www.gwern.net/docs/psychology/illusion-of-depth/inde...). Bicycles may be clever, but the shallowness of mental representation is real.
[go to top]