Over the years, I’ve made a point of helping students see the ways in which library systems make biased decisions when it comes to organizing information. The Library of Congress classification system treats women as a category subordinate to families. The debate over whether to change the Library of Congress subject heading “Illegal aliens” to “undocumented immigrants” is a recent illustration of struggles to make subject headings fit contemporary usage and avoid offensive terminology that has gone on for longer than I’ve been a librarian. Students are quick to pick up on this issue. The trick is to know that these failures exist so that you can work around them.
Recently, I’ve had a harder time convincing students that algorithms can be similarly screwy. As one student put it, algorithms are sets of instructions being executed by a machine, and once constructed aren’t touched by human hands. So all the biases one might find in, say, Google search results are merely reflective of larger society. You can’t blame the search engine if it places ads for criminal background checks next to the names of black people; blame the people who clicked on those ads and trained the machine to assume that people searching for information about black folks must be interested in their criminal history. You can’t fault Google if men are shown ads for higher-paying jobs than women are. That’s the breaks! It’s just too bad that young black girls are likely to have to wade through pornography to find websites about their lives.
(Incidentally, Matt Reidsma recently published an important analysis of how library discovery systems have similar problems, often returning bizarrely wrong results. It's a must-read for librarians.)
Bias is not just in search, of course. The information we generate across platforms is used behind the scenes and by third parties. Companies can use location or other personal data to tweak their prices – for example, charging more for products when there is no nearby competition, which tends to disadvantage the already disadvantaged. Students discovered that the Princeton Review charges different prices for their online tutoring program depending on broad regional categories, with the odd effect that Asians were being charged significantly more than whites even when their income was modest. Police are using social media combined with other data sources to identify people who the machine says are criminally inclined. A Whitehouse report cautioned that data might be used in ways that result in digital redlining. Big data can magnify big social problems.
One of the problems with recognizing algorithmic bias is that, while we can look at the Library of Congress classification scheme and search their authority file to see whether a subject heading is used or not (and if not, what alternatives there are), we can’t see how an algorithm is engineered. These are black boxes, unavailable for examination, because they are valuable trade secrets, and each of us sees something different when we search, making it hard to generalize. Besides, if the various levers and pulleys within an algorithm are known, people will try to game it. Google has to constantly tweak their search algorithm to thwart companies that are paid to push search results higher. This makes it harder to predict and work around problems.
All that said, algorithmic bias is not inevitable and bias is not unsurmountable. Algorithms are made by humans, making human decisions. With effort, humans can work toward systems that are less biased. In the meantime it’s good to be aware that algorithms aren’t entirely even-handed and dispassionate sorters of information, any more than library organizational schemes are.