zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. y04nn+D7[view] [source] 2022-05-23 21:33:33
>>kevema+(OP)
Really impressive. If we are able to generate such detailed images, is there anything similar for text to music? I would I though that it would be simpler to achieve than text to image.
◧◩
2. tomato+Ra[view] [source] 2022-05-23 21:52:17
>>y04nn+D7
why stop at audio? the pinnacle of this would be text-to-videos, equally indistinguishable from real thing.
◧◩◪
3. burles+8e[view] [source] 2022-05-23 22:09:59
>>tomato+Ra
The way things look when still is much easier to fake than the way things move.

I would expect AI development to follow a similar path to digital media generally, as its following the increasing difficulty and space requirements of digitally representing said media: text < basic sounds < images < advanced audio < video.

What’s more impressive to me is how far ahead text-to-speech is, but I think the explanation is straightforward (the accessibility value has motivated us to work on that for a lot longer).

[go to top]