Which of course then arrives at the problem: the original data plainly isn't stored in a byte exact form, and you can only recover it by providing an astounding specific input string (the 512 bit latent space vector). But that's not data which is contained within Stable Diffusion. It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.
This is the most salient point in this whole HN thread!
You can’t sue Stable Diffusion or the creators of it! That just seems silly.
But (I don’t know I’m not a lawyer) there might be an argument to sue an instance of Stable Diffusion and the creators of it.
I haven’t picked a side of this debate yet, but it has already become a fun debate to watch.
You can’t sue Canon for helping a user take better infringing copies of a painting, nor can you sue Apple or Nikon or Sony or Samsung… you can sue the user making an infringing image, not the tools they used to make the infringing image… the tools have no mens rea.
That's plainly untrue, as Stable Diffusion is not just the algorithm, but the trained model—trained on millions of copyrighted images.
SD might know how to violate copyright but is that enough to sue it? Or can you only sue violations it helps create?
That’s said, it does raise the question, “should this precedent be extended to humans?”
i.e. Can humans be taught something based on copyrighted materials in the training set/curriculum?
To address (b) first: Fair Use has long held that educational purposes are a valid reason for using copyrighted materials without express permission—for instance, showing a whole class a VHS or DVD, which would technically require a separate release otherwise.
For (a): I don't know anything about your background in ML, so pardon if this is all obvious, but at least current neural nets and other ML programs are not "AI" in anything like the kind of sense where "teaching" is an apt word to describe the process of creating the model. Certainly the reasoning behind the Fair Use exception for educating humans does not apply—there is no mind there to better; no person to improve the life, understanding, or skills of.
It's like the compression that occurs when I say "Mona Lisa" and you read it, and can know many aspects of that painting.
So while it would be possible to create a "Public Diffusion" that took the Stable Diffusion refinements of the ML techniques and created a model built solely out of public-domain art, as it stands, "Stable Diffusion" includes by definition the model that is built from the copyrighted works in question.