zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. jonahb+Ve[view] [source] 2022-05-23 22:14:04
>>kevema+(OP)
I know that some monstrous majority of cognitive processing is visual, hence the attention these visually creative models are rightfully getting, but personally I am much more interested in auditory information and would love to see a promptable model for music. Was just listening to "Land Down Under" from Men At Work. Would love to be able to prompt for another artist I have liked: "Tricky playing Land Down Under." I know of various generative music projects, going back decades, and would appreciate pointers, but as far as I am aware we are still some ways from Imagen/Dalle for music?
◧◩
2. astran+Ni[view] [source] 2022-05-23 22:38:54
>>jonahb+Ve
I believe we’re lacking someone training up a large music model here, but GPT-style transformers can produce music.

gwern can maybe comment here.

An actually scary thing is that AIs are getting okay at reproducing people’s voices.

◧◩◪
3. gwern+qB[view] [source] 2022-05-24 01:12:14
>>astran+Ni
Voice synthesis has been going steady. Lots of commercial and hobbyist interest: you can use 15.ai for crackerjack SaaS voice synthesis in a slick free UI; and if you want to run the models yourselves, Tortoise just released a FLOSS stack of remarkable quality.

Music, I'm afraid, appears stuck in the doldrums of small one-offs doing stuff like MIDI. Nothing like the breadth & quality of Jukebox has come out since it, even though it's super-obvious that there is a big overhang there and applying diffusion & other new methods would give you something like much like DALL-E 2 / Imagen for general music.

◧◩◪◨
4. thorum+ZP[view] [source] 2022-05-24 03:51:48
>>gwern+qB
The developer behind Tortoise is experimenting with using diffusion for music generation:

https://nonint.com/2022/05/04/friends-dont-let-friends-train...

[go to top]