gwern can maybe comment here.
An actually scary thing is that AIs are getting okay at reproducing people’s voices.
In practice, my guess is that even though Dall-e level performance in music generation would be stunning and incredible, it would also be tiresome and predictable to consume on any extended basis. I mean- that's my reaction to Dall-e- I find the images astonishing and magical but can only look at them for limited periods of time. At these early stages in this new world the outputs of real individual brains are still more interesting.
But having tools like this to facilitate creation and inspiration by those brains- would be so so cool.
Music, I'm afraid, appears stuck in the doldrums of small one-offs doing stuff like MIDI. Nothing like the breadth & quality of Jukebox has come out since it, even though it's super-obvious that there is a big overhang there and applying diffusion & other new methods would give you something like much like DALL-E 2 / Imagen for general music.
https://nonint.com/2022/05/04/friends-dont-let-friends-train...