zlacker

[parent] [thread] 0 comments
1. orbita+(OP)[view] [source] 2022-12-16 00:52:25
SD has several separate parts. In the most simplistic sense (not entirely accurate to how it functions), one translates English into a semantic address inside the "main memory", and another one extracts the contents of the memory that the address refers to. If you prevent the first one (CLIP) from understanding artists names by removing the correspondence between names and addresses, the data will still be there and can be addressed in any other way, for example custom trained embeddings. Even if you remove artworks from the dataset entirely, you can easily finetune it on anything you want using various techniques, because the bulk of the training ($$$!) has already been done for you, and the coherency, knowledge of how things look in general, shapes, lighting, poses, etc is already there. You only need to skew it towards your desired style a bit.

Style transfer combined with the overall coherency of pre-trained models is the real power of these. "Country house in the style of Picasso" is generally not how you use this at full power, because "Picasso" is a poor descriptor for particular memory coordinates. You type "Country house" (a generic descriptor it knows very well) and provide your own embedding or any kind of finetuned addon to precisely lean the result towards the desired style, whether constructed by you or anyone else.

So, if anyone believes that this thing would drive the artists out of their jobs, then removing their works from the training set will change very little as it will still be able to generate anything given a few examples, on a consumer GPU. And that's only the current generation of such models and tools. (which admittedly doesn't pass the quality/controllability threshold required for serious work, just yet)

[go to top]