Some of the reasoning:
>Preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes. Finally, even when we focus generations away from people, our preliminary analysis indicates Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects. We aim to make progress on several of these open challenges and limitations in future work.
Really sad that breakthrough technologies are going to be withheld due to our inability to cope with the results.
We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?
If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.
Also, is it wrong to have localized models? For example, should a model for use in Japan conform to the demographics of Japan, or to that of the world?
Also, getting a random sample of any demographic would be really hard, so no machine learning project is going to do that. Instead you've got a random sample of some arbitrary dataset that's not directly relevant to any particular purpose.
This is, in essence, a design or artistic problem: the Google researchers have some idea of what they want the statistical properties of their image generator to look like. What it does isn't it. So, artistically, the result doesn't meet their standards, and they're going to fix it.
There is no objective, universal, scientifically correct answer about which fictional images to generate. That doesn't all art is equally good, or that you should just ship anything without looking at quality along various axes.