I find this distinction between media and text/code so interesting. To me it sounds like they think "text and code" are free from the controversy surrounding AI-generated media.
But judging from how AI companies grabbed all the art, images, videos, and audio they could get their hands on to train their LLMs it's naive to think that they didn't do the same with text and code.
I've written a fair amount of open source code. On anything like a per-capita basis, I'm way above median in terms of what I've contributed (without consent) to the training of these tools. I'm also specifically "in the crosshairs" in terms of work loss from automation of software development.
I don't find it hard to convince myself that I have moral authority to think about the usage of gen AI for writing code.
The same is not true for digital art.
There, the contribution-without-consent, aka theft, (I could frame it differently when I was the victim, but here I can't) is entirely from people other than me. The current and future damages won't be born by me.
I've written _a lot_ of open source MIT licensed code, and I'm on the fence about that being part of the training data. I've published it as much for other people to use for learning purposes as I did for fun.
I also build and sell closed source commercial JavaScript packages, and more than likely those have ended up in the training data as well. Obviously without consent. So this is why I feel strong about making this separation between code and media, from my perspective it all has the same problem.