Imagen, a text-to-image diffusion model

>>kevema+(OP)
I know that some monstrous majority of cognitive processing is visual, hence the attention these visually creative models are rightfully getting, but personally I am much more interested in auditory information and would love to see a promptable model for music. Was just listening to "Land Down Under" from Men At Work. Would love to be able to prompt for another artist I have liked: "Tricky playing Land Down Under." I know of various generative music projects, going back decades, and would appreciate pointers, but as far as I am aware we are still some ways from Imagen/Dalle for music?

>>jonahb+Ve
I agree. How cool would it be to get an 8 min version of your favorite song? Or an instant DnB remix? Or 10 more songs in the style of your favorite album?

>>addand+Rf
Yeah. I particularly love covers and often can hear in my head X playing Y's song. Would love tools to experiment with that for real.

In practice, my guess is that even though Dall-e level performance in music generation would be stunning and incredible, it would also be tiresome and predictable to consume on any extended basis. I mean- that's my reaction to Dall-e- I find the images astonishing and magical but can only look at them for limited periods of time. At these early stages in this new world the outputs of real individual brains are still more interesting.

But having tools like this to facilitate creation and inspiration by those brains- would be so so cool.

zlacker