Imagen, a text-to-image diffusion model

>>kevema+(OP)
Really impressive. If we are able to generate such detailed images, is there anything similar for text to music? I would I though that it would be simpler to achieve than text to image.

>>y04nn+D7
why stop at audio? the pinnacle of this would be text-to-videos, equally indistinguishable from real thing.

>>tomato+Ra
The way things look when still is much easier to fake than the way things move.

I would expect AI development to follow a similar path to digital media generally, as its following the increasing difficulty and space requirements of digitally representing said media: text < basic sounds < images < advanced audio < video.

What’s more impressive to me is how far ahead text-to-speech is, but I think the explanation is straightforward (the accessibility value has motivated us to work on that for a lot longer).

zlacker