Some of the reasoning:
>Preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes. Finally, even when we focus generations away from people, our preliminary analysis indicates Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects. We aim to make progress on several of these open challenges and limitations in future work.
Really sad that breakthrough technologies are going to be withheld due to our inability to cope with the results.
We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?
If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.
Also, is it wrong to have localized models? For example, should a model for use in Japan conform to the demographics of Japan, or to that of the world?
So even if we managed to create a perfect model of representation and inclusion, people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. Restricting the ability to be creative seems to be a new frontier of censorship.
Do they see it as dangerous? Or just offensive?
I can understand why people wouldn’t want a tool they have created to be used to generate disturbing, offensive or disgusting imagery. But I don’t really see how doing that would be dangerous.
In fact, I wonder if this sort of technology could reduce the harm caused by people with an interest in disgusting images, because no one needs to be harmed for a realistic image to be created. I am creeping myself out with this line of thinking, but it seems like one potential beneficial - albeit disturbing - outcome.
> Restricting the ability to be creative seems to be a new frontier of censorship.
I agree this is a new frontier, but it’s not censorship to withhold your own work. I also don’t really think this involves much creativity. I suppose coming up with prompts involves a modicum of creativity, but the real creator here is the model, it seems to me.
I won't speak to whether something is "offensive", but I think that having underlying biases in image-classification or generation has very worrying secondary effects, especially given that organizations like law enforcement want to do things like facial recognition. It's not a perfect analogue, but I could easily see some company pitch a sketch-artist-replacement service that generated images based on someone's description. The potential for having inherent bias present in that makes that kind of thing worrying, especially since the people in charge of buying it are likely to care, or notice, about the caveats.
It does feel like a little bit of a stretch, but at the same time we've also seen such things happen with image classification systems.