I have a feeling that we need to be real with ourselves and solve problems and not paper over them. I feel like people generally expect search engines to tell them what's really there instead of what people wish were there. And if the engines do that, people can get agitated!
I'd almost say that hurt feelings are prerequisite for real change, hard though that may be.
These are all really interesting questions brought up by this technology, thanks for your thoughts. Disclaimer, I'm a fucking idiot with no idea what I'm talking about.
Which real world? The population you sample from is going to make a big difference. Do you expect it to reflect your day to day life in your own city? Own country? The entire world? Results will vary significantly.
If I ask for pictures of Japanese people, I'm not shocked when all the results are of Japanese people. If I asked for "criminals in the United States" and all the results are black people, that should concern me, not because the data set is biased but because the real world is biased and we should do something about that. The difference is that I know what set I'm asking for a sample from, and I can react accordingly.
Your logic seems to rest on this assumption which I don't think is justified. "Skewing search results" is not the same as "hiding the biases of the real world". Showing the most statistically likely result is not the same as showing the world how it truly is.
A generic nurse is statistically going to be female most of the time. However, a model that returns every nurse as female is not showing the real world as it is. It is exaggerating and reinforcing the bias of the real world. It inherently requires a more advanced model to actually represent the real world. I think it is reasonable for the creators to avoid sharing models known to not be smart enough to avoid exaggerating real world biases.
Well the results would unquestionably be biased. All results being black people wouldn't reflect reality at all, and hurting feelings to enact change seems like a poor justification for incorrect results.
> I'd say it doesn't actually matter, as long as the population sampled is made clear to the user.
Ok, and let's say I ask for "criminals in Cheyenne Wyoming" and it doesn't know the answer to that, should it just do its best to answer? Seem risky if people are going to get fired up about it and act on this to get "real change".
That seems like a good parallel to what we're talking about here, since it's very unlikely that crime statistics were fed into this image generating model.
curiously, this search actually only returns white people for me on GIS
Every model will have some random biases. Some of those random biases will undesirably exaggerate the real world. Every model will undesirably exaggerate something. Therefore no model should be shared.
Your goal is nice, but impractical?
I said "It is reasonable... to avoid sharing models". That is an acknowledged that the creators are acting reasonably. It does not imply anything as extreme as "no model should be shared". The only way to get from A to B there is for you to assume that I think there is only one reasonable response and every other possible reaction is unreasonable. Doesn't that seem like a silly assumption?
“When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’
’The question is,’ said Alice, ‘whether you can make words mean so many different things.’
’The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all.”If the only way to do AI is to encode racism etc, then we shouldn't be doing AI at all.
We may not like what it shows us, but blindfolding ourselves is not the solution to that problem.