zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. throwa+oe[view] [source] 2022-05-23 22:10:51
>>kevema+(OP)
https://github.com/lucidrains/imagen-pytorch
◧◩
2. Cobras+Rm[view] [source] 2022-05-23 23:09:27
>>throwa+oe
Is this a joke?
◧◩◪
3. throwa+cs[view] [source] 2022-05-23 23:53:11
>>Cobras+Rm
No
◧◩◪◨
4. w1nk+Zy1[view] [source] 2022-05-24 11:19:10
>>throwa+cs
To expand a bit for the grandparent, if you check out this authors other repos you'll notice they have a thing for implementing these papers (multiple DALLE-2 implementations for instance). You should expect to see an implementation there pretty quickly I'd guess.
◧◩◪◨⬒
5. xtreme+E22[view] [source] 2022-05-24 14:16:20
>>w1nk+Zy1
Not to diminish their contribution but implementing the model is only one third of the battle. The rest is building the training dataset and training the model on a big computer.
◧◩◪◨⬒⬓
6. w1nk+hT2[view] [source] 2022-05-24 18:20:15
>>xtreme+E22
You're not wrong that the dataset and compute are important, and if you browse the author's previous work, you'll see there are datasets available. The reproduction of DALL-E 2 required a dataset of similar size to the one imagen was trained on (see: https://arxiv.org/abs/2111.02114).

The harder part here will be getting access to the compute required, but again, the folks involved in this project have access to lots of resources (they've already trained models of this size). We'll likely see some trained checkpoints as soon as they're done converging.

[go to top]