zlacker

The whole point of this article is that good data science involve detecting and avoiding these biased feedback loops, because software can amplify them.

It literally says: “It’s not that data can be biased. Data is biased.” Know how your data was generated and what biases it may contain. We are encoding and even amplifying societal biases in the algorithms we create.

replies(1): >>gt_+O1

>>nl+(OP)
??? I just read the article... not sure what you're point is. My point is that the author "encoded and even amplified" bias that the data did not have. Most of the article concerns being aware of inherent bias in data. My point concerns bias that the author has ascribed to the data.

replies(1): >>nl+l2

>>gt_+O1
Gender bias in Word2Vec data is a well known problem. The article provides references. It's an unsupervised algorithm, and the Google pretrained vectors are trained from news coverage, so not really a researcher issue (insofar as they select an unusual datasets or something).

Edit: to clarify, your claim is that Word2Vec data isn't biased even though there is a link right there showing how it is? Why do you think that?

If you use that data in a system then you reinforce that bias.