zlacker

[parent] [thread] 7 comments
1. y04nn+(OP)[view] [source] 2022-05-23 21:33:33
Really impressive. If we are able to generate such detailed images, is there anything similar for text to music? I would I though that it would be simpler to achieve than text to image.
replies(4): >>nomel+u1 >>redox9+A2 >>tomato+e3 >>tourin+ea
2. nomel+u1[view] [source] 2022-05-23 21:41:36
>>y04nn+(OP)
Compare the size of a raw image file to a raw music file, to get an idea of the complexity difference.
replies(1): >>penney+R2
3. redox9+A2[view] [source] 2022-05-23 21:48:25
>>y04nn+(OP)
Our language is much more effective at describing images than music.
◧◩
4. penney+R2[view] [source] [discussion] 2022-05-23 21:49:47
>>nomel+u1
Think sheet music, not an mp3
replies(1): >>nomel+XL5
5. tomato+e3[view] [source] 2022-05-23 21:52:17
>>y04nn+(OP)
why stop at audio? the pinnacle of this would be text-to-videos, equally indistinguishable from real thing.
replies(1): >>burles+v6
◧◩
6. burles+v6[view] [source] [discussion] 2022-05-23 22:09:59
>>tomato+e3
The way things look when still is much easier to fake than the way things move.

I would expect AI development to follow a similar path to digital media generally, as its following the increasing difficulty and space requirements of digitally representing said media: text < basic sounds < images < advanced audio < video.

What’s more impressive to me is how far ahead text-to-speech is, but I think the explanation is straightforward (the accessibility value has motivated us to work on that for a lot longer).

7. tourin+ea[view] [source] 2022-05-23 22:32:11
>>y04nn+(OP)
SymphonyNet: https://youtu.be/m4tT5fx_ih8
◧◩◪
8. nomel+XL5[view] [source] [discussion] 2022-05-25 16:58:42
>>penney+R2
Fair enough, but that's a little dissimilar to what's being done with these images. These images are a per-pixel construction.
[go to top]