zlacker

[parent] [thread] 9 comments
1. jonahb+(OP)[view] [source] 2022-05-23 22:14:04
I know that some monstrous majority of cognitive processing is visual, hence the attention these visually creative models are rightfully getting, but personally I am much more interested in auditory information and would love to see a promptable model for music. Was just listening to "Land Down Under" from Men At Work. Would love to be able to prompt for another artist I have liked: "Tricky playing Land Down Under." I know of various generative music projects, going back decades, and would appreciate pointers, but as far as I am aware we are still some ways from Imagen/Dalle for music?
replies(2): >>addand+W >>astran+S3
2. addand+W[view] [source] 2022-05-23 22:19:19
>>jonahb+(OP)
I agree. How cool would it be to get an 8 min version of your favorite song? Or an instant DnB remix? Or 10 more songs in the style of your favorite album?
replies(2): >>jonahb+64 >>exac+JA
3. astran+S3[view] [source] 2022-05-23 22:38:54
>>jonahb+(OP)
I believe we’re lacking someone training up a large music model here, but GPT-style transformers can produce music.

gwern can maybe comment here.

An actually scary thing is that AIs are getting okay at reproducing people’s voices.

replies(1): >>gwern+vm
◧◩
4. jonahb+64[view] [source] [discussion] 2022-05-23 22:40:06
>>addand+W
Yeah. I particularly love covers and often can hear in my head X playing Y's song. Would love tools to experiment with that for real.

In practice, my guess is that even though Dall-e level performance in music generation would be stunning and incredible, it would also be tiresome and predictable to consume on any extended basis. I mean- that's my reaction to Dall-e- I find the images astonishing and magical but can only look at them for limited periods of time. At these early stages in this new world the outputs of real individual brains are still more interesting.

But having tools like this to facilitate creation and inspiration by those brains- would be so so cool.

◧◩
5. gwern+vm[view] [source] [discussion] 2022-05-24 01:12:14
>>astran+S3
Voice synthesis has been going steady. Lots of commercial and hobbyist interest: you can use 15.ai for crackerjack SaaS voice synthesis in a slick free UI; and if you want to run the models yourselves, Tortoise just released a FLOSS stack of remarkable quality.

Music, I'm afraid, appears stuck in the doldrums of small one-offs doing stuff like MIDI. Nothing like the breadth & quality of Jukebox has come out since it, even though it's super-obvious that there is a big overhang there and applying diffusion & other new methods would give you something like much like DALL-E 2 / Imagen for general music.

replies(1): >>thorum+4B
◧◩
6. exac+JA[view] [source] [discussion] 2022-05-24 03:48:24
>>addand+W
You can sort of do that with https://fairuseify.ml
replies(2): >>aemble+Fa1 >>jrh206+1j1
◧◩◪
7. thorum+4B[view] [source] [discussion] 2022-05-24 03:51:48
>>gwern+vm
The developer behind Tortoise is experimenting with using diffusion for music generation:

https://nonint.com/2022/05/04/friends-dont-let-friends-train...

◧◩◪
8. aemble+Fa1[view] [source] [discussion] 2022-05-24 09:50:10
>>exac+JA
I tried that site and the music sounds the same. I wonder if you can use this to bypass YouTube content ID check.
◧◩◪
9. jrh206+1j1[view] [source] [discussion] 2022-05-24 11:09:54
>>exac+JA
I believe that this tech is possible, but this site doesn't provide it. Look at the source of the page: it's just a bunch of sleeps and then you 'download' the same file you provided.
replies(1): >>waqf+Hp1
◧◩◪◨
10. waqf+Hp1[view] [source] [discussion] 2022-05-24 12:05:22
>>jrh206+1j1
The tech may be possible, but it won't solve anyone's copyright problems. The result would be a "derived work" of the original, irrespective of whether it sounded similar or not.
[go to top]