It is no wonder why OpenAI had to pay Shutterstock for training on their data and Getty suing Stability AI for training on their watermarked images and using it commercially without permission and actors / actresses filing lawsuits against commercial voice cloners which costs them close to nothing, as those companies either take down the cloned voice offering or shutdown.
These weak arguments from these AI folks sound like excuses justifying a newly found grift.
AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning. As well as when stopping AI from — as an example — learning from research papers means we miss out on life-saving medication. Where do property rights conflict with human rights?
They're important conversations to have, but you've destroyed the opportunity to have them from the starting gun.
I think we have given it plenty of time for such a discussion and the amount of events and actions happening around training on copyrighted works from images, songs and deepfakes for the lawsuits and licensing deals to happen and it all converging to paying for the data; hence OpenAI and may others doing so due to risks in such lawsuits.
> AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning.
Copyright law does not care, nor is the overlying problem about using such a generative AI system for non-commercial uses such as for education or private use-cases. The line is being drawn as soon as it is commercialized and the fair use excuses fall apart. Even if the AI advances, so does the traceability methods and questions on the dataset being used. [0]
It costs musicians close to nothing to target and file lawsuits against commercial voice cloners. Not even training on copyrighted songs was an option for tools like DanceDiffusion [1] due to that same risk which is why training on public domain sounds audio was the safer alternative rather than run the risk of lawsuits and ask questions on the training set by tons of musicians.
[0] https://c2pa.org
[1] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a...