You know, if I've noticed anything in the past couple years, it's that even if you self-host your own site, it's still going to get hoovered up and used/exploited by things like AI training bots. I think between everyone's code getting trained on, even if it's AGPLv3 or something similarly restrictive, and generally everything public on the internet getting "trained" and "transformed" to basically launder it via "AI", I can absolutely see why someone rational would want to share a whole lot less, anywhere, in an open fashion, regardless of where it's hosted.
I'd honestly rather see and think more about how to segment communities locally, and go back to the "fragmented" way things once were. It's easier to want to share with other real people than inadvertently working for free to enrich companies.
In contrast, if I uploaded something to a social media site like Instagram, and then Meta "sublicensed" my image to someone else, I wouldn't have much to say there.
Would love someone with actual legal knowledge to chime in here.
but you agreed to this, when agreeing to the TOS.
> I post a picture on my site that is then used by a large publisher for ads, I would (at least in theory) have some recourse
which you didn't sign any contract, and therefore it is a violation of copyright.
But the new AI training methods are currently, at least imho, not a violation of copyright - not any more than a human eye viewing it (which you've implicitly given permission to do so, by putting it up on the internet). On the other hand, if you put it behind a gate (no matter how trivial), then you could've at least legally protected yourself.
Interesting comparison - as if a human viewed something, memorized it and reproduced in a recognisable way to be pretty much the same, wouldn't that still breach copyright?
ie in the human case it doesn't matter whether it went through an intermediate neural encoding - what matters is whether the output is sufficiently similar to be deemed a copy.
Surely the same is the case of AI?
If on the other hand you ask an image generator for a Rembrandt, you'll get several usable images, and good odds a few them will be outright copies, and decent odds a few of them will be configured into an etsy or ebay product image despite you not asking for that. And the better the generator is, the better it's going to do at making really good Rembrandt style paintings, which ironically, increases the odds of it just copying a real one that appeared many times in it's training data.
People try and excuse this with explanations about how it doesn't store the images in it's model, which is true, it doesn't. However if you have a famous painting by any artist, or any work really, it's going to show up in the training data many, many times, and the more popular the artist, the more times it's going to be averaged. So if the same piece appears in lots and lots of places, it creates a "rut" in the data if you will, where the algorithm is likely going to strike repeatedly. This is why it's possible to get full copied artworks out of image generators with the right prompts.
that is doing a lot of pull. Just because you could "get the full copies" with the right prompts, doesn't mean the weights and the training is copyright infringement.
I could also get a full copy of any works out of the digits of pi.
The point i would like to emphasize is that the using data to train the model is not copyright infringement in and of itself. If you use the resulting model to output a copy of an existing work, then this act constitutes copyright infringement - in the exact same way that using photoshop to reproduce some works is.
What a lot of anti-ai arguments are trying to achieve is to make the act of training and model making the infringing act, and the claim is that the data is being copied while training is happening.