zlacker

> But I have very little sympathy for those perpetuating this tiresome moral panic (a small amount of actual artists, whatever the word "artist" means)

> A small amount of actual artists

It's extremely funny that you say this, because taking a look at the Trending on Artstation page tells a different story.

https://www.artstation.com/?sort_by=trending

replies(1): >>orbita+A7

>>dtn+(OP)
That's what the b) was about, yes.

And ironically, the overwhelming majority of knowledge used by these models to produce pictures that superficially look like their work (usually not at all), is not coming from any artworks at all. It's as simple as that. They are mostly trained on photos which constitute the bulk of models' knowledge about the real world. They are the main source of coherency. Artist names and keywords like "trending on artstation" are just easily discoverable and very rough handles for pieces of the memory of the models.

replies(1): >>dtn+Oe

>>orbita+A7
I don't think the fact that photos are making up the vast majority of the training set is of any particular significance.

Can SD create artistic renderings without actual art being incorporated? Just from photos alone? I don't believe so, unless someone shows me evidence to the contrary.

Hence, SD necessitates having artwork in it's training corpus in order to emulate style, no matter how little it's represented in the training data.

replies(1): >>orbita+Kj

>>dtn+Oe
SD has several separate parts. In the most simplistic sense (not entirely accurate to how it functions), one translates English into a semantic address inside the "main memory", and another one extracts the contents of the memory that the address refers to. If you prevent the first one (CLIP) from understanding artists names by removing the correspondence between names and addresses, the data will still be there and can be addressed in any other way, for example custom trained embeddings. Even if you remove artworks from the dataset entirely, you can easily finetune it on anything you want using various techniques, because the bulk of the training ($$$!) has already been done for you, and the coherency, knowledge of how things look in general, shapes, lighting, poses, etc is already there. You only need to skew it towards your desired style a bit.

Style transfer combined with the overall coherency of pre-trained models is the real power of these. "Country house in the style of Picasso" is generally not how you use this at full power, because "Picasso" is a poor descriptor for particular memory coordinates. You type "Country house" (a generic descriptor it knows very well) and provide your own embedding or any kind of finetuned addon to precisely lean the result towards the desired style, whether constructed by you or anyone else.

So, if anyone believes that this thing would drive the artists out of their jobs, then removing their works from the training set will change very little as it will still be able to generate anything given a few examples, on a consumer GPU. And that's only the current generation of such models and tools. (which admittedly doesn't pass the quality/controllability threshold required for serious work, just yet)