Our technology industry has a diversity problem. This in itself is not a new issue. But the subset of our industry working on artificial intelligence (AI) has a particularly acute diversity problem, and it is having a negative impact on the lives of millions of people, all around the world.
Since 2014, Information is Beautiful have maintained a visualization of the published diversity statistics for some of the world’s largest technology companies. Despite the 2017 US population being 51 percent female, at that time Nvidia only employed 17 percent female staff, Intel and Microsoft 26 percent, Dell 28 percent, and Google, Salesforce and YouTube 31 percent. This reporting also didn’t account for those that identify as non-binary or transgender, nor the fact that the diversity gap widens at the most senior levels of companies: a 2018 report found that only 10 percent of tech executives are female.
The diversity problem goes beyond just gender. Racial diversity in technology is poor, and even less is being done about it. Consider “colorless diversity“: a phenomena where the industry is not investing equally in addressing the imbalance of people of color. In 2015, Erica Joy Baker highlighted that “whether by design or by inertia, the favor [of pro-diversity work] seems to land on white women in particular.” In fact, that year, Baker attended a Salesforce leadership conference in San Francisco and was in the audience for the “Building an Inclusive Workplace” panel. During the panel, Salesforce co-founder and Chief Technology Officer Parker Harris stated that:
“Well, right now I’m focused on women, you know, and it’s back to Marc [Benioff]’s focus on priorities. I have employees, that are, you know, other types of diversity coming to me and saying well why aren’t we focused on these other areas as well, and I said yes we should focus on them but, you know, the phrase we use internally is ‘If everything is important, then nothing is important.’“
This may be a lesson in prioritization for shipping software and meeting sales targets, but it is a single-minded approach that discriminates against the under-represented groups that need the most help.
Fast forward four years, and a new report has shed light on the current state of diversity in undoubtedly the hottest area of our industry: AI. The paper Discriminating Systems: Gender, Race and Power in AI was published by the AI Now Institute of New York University in April 2019. It highlights the scale – and most shockingly, the global impact – of the diversity problem in the institutions doing work at the cutting edge of this field.
The authors of the report state that “recent studies found only 18% of authors at leading AI conferences are women and more than 80% of AI professors are men”. Similar statistics in AI are observed outside of academia. The paper reports that “women comprise only 15% of AI research staff at Facebook and 10% at Google.” Extending the diversity statistics to include race, the paper also notes that “only 2.5% of Google’s workforce is black, while Facebook and Microsoft are each at 4%”.
We have seen how a lack of diversity can stifle innovation, decrease employee retention and in the worst case, allow cases of racism and harassment to go unpunished. However, the lack of diversity in AI can have a harmful effect on the whole of society.
Diversity issues deployed
Consider the following video from 2009. It shows how the face tracking functionality of a Hewlett Packard webcam seems to be unable to recognize black faces.
Six years later it seemed that the latest Google AI classifiers were still making errors, albeit in an even more racist manner.
So how does this happen? To understand we have to look at the process that is followed in order to create image classification software. Typically, AI engineers will start by finding as many labelled examples of images as they possibly can. By labelled, we mean that a human has looked at a picture of a cat and has marked it with the word “cat”. A large collection of these labelled images is called a corpus.
There are many corpora that are publicly accessible, such as ImageNet, which is maintained by the Vision Lab at Stanford University. If you type the word “dog” into ImageNet, you’ll see labelled images of dogs come back in your search. There are similar corpora of image data available online for AI researchers and engineers to use depending on their desired application, such as MNIST for images of handwritten digits and the more general Open Images dataset.
With access to large amounts of accurately labelled data, AI engineers can create classifiers by training them on the examples to be able to recognize similar images. Sometimes existing pre-trained models are extended to cover new inputs using a process called transfer learning. It follows that with enough example images of dogs, a classifier could be created that can take a previously unseen picture of a dog and label it correctly. Therefore the question is how is it possible for black faces to not be recognized by webcam software, or for pictures of two black people to be classified as gorillas? Are racist labels being assigned to input data, or is there something more subtle at play?
The answer is that the racial bias of these classifiers typically goes unchallenged. This effect compounds in two places. Firstly, the data sets that are being used for training the classifiers are typically not representative of real world diversity, thus encoding bias. Secondly, the predominantly racially homogenous staff do not thoroughly and sensitively test their work on images of people more diverse than themselves.
The AI Now paper highlights that a commonly used public dataset of faces called Labeled Faces in the Wild, maintained by the University of Massachusetts, Amherst, has only 7 percent black faces and 22.5 percent female faces, thus making a classifier trained on these images less able to identify women and people of color. How did this dataset end up being so biased? To find out, we must look at how the images were collected.
Data collection was performed automatically from images featured on news websites in the early 2000s. Thus, the corpus “can be understood as a reflection of early 2000s social hierarchy, as reproduced through visual media”. For example, it contains over 500 pictures of the then US President George W. Bush. Although this dataset seems like an excellent opportunity to create a “real world” facial analysis classifier by sampling seemingly random images, these classifiers end up highlighting the lack of diversity in the media at that time.
This phenomena is not limited to classifiers built from the Labelled Faces in the Wild dataset. A 2018 paper has shown that a number of facial analysis techniques misclassify darker-skinned females up to 34.7 percent of the time compared to lighter-skinned males only being misclassified 0.8 percent. Clearly, something is wrong here. What can we do?
The full picture of bias
In her keynote talk at NIPS 2017, Kate Crawford, the co-director and co-founder of the AI Now Institute, explores how bias in the AI systems that we create today are propagating historical and present discriminatory behavior that we see in society itself. Given that AI systems are becoming prevalent in ways that truly affect the outcome of people’s lives, fixing bias is an important issue for the industry to address. Crawford defines bias as a skew that causes a type of harm. She classifies the effect of bias into two broad areas: harms of allocation and harms of representation.
Harms of allocation are where an AI system allocates or withholds opportunities or resources to or from certain groups. For example, AI may be used to make automated decisions on loans and mortgages. It may automatically screen job applicants for their suitability or their criminal background. It may diagnose illness and thus decide on treatment. It may even inform the police as to which neighborhoods they should be spending their time performing “stop and search” operations.
Development of these systems begin with a labelled data set in order to train a model. We have seen that these data sets can encode existing biases that exist within society. If it just so happens that historically people under thirty or African American women often get turned down for mortgages, then AI trained on this data may unfairly encode this bias and thus cause a harm of allocation to future applicants based on their age, sex or gender. Mathematician and author Cathy O’Neil wrote how personality tests, of which 70% of people in the US have taken when applying for a job, can negatively discriminate. Kyle Behm, a college student, failed a personality test when applying for a job at a Kroger store. He recognized some of the questions in the test from a mental health assessment which he had taken whilst undergoing treatment for bipolar disorder, suggesting that people with mental health issues were being unfairly discriminated against in the hiring process.
Harms of representation are when a system misrepresents a group in a damaging way. The example of Google Photos labelling black faces as gorillas is a harm of representation. Latanya Sweeney, a Harvard Professor, published a paper showing that searching for racially associated names in Google yields personalized ads that could be interpreted as discriminating. For example, searching for popular African American male names such as DeShawn, Darnell and Jermaine generated ads suggesting that the person may have been arrested, thus compounding the bias of black criminality. More recently, full-body scanners operated by TSA are prone to false alarms for common hairstyles for people of color.
A succinct example of harmful gender misrepresentation can be seen through the use of Google Translate above. By typing in the two sentences “he is a nurse” and “she is a doctor” and then translating them into a gender-neutral language such as Turkish and then back again, we see the roles of the genders switch. Given that systems such as Google Translate are trained on large corpora of existing literature, the historical gender bias in the sampled texts for those two roles has been encoded into the system, thus propagating that bias. Similarly, searching for “CEO” or “politician” on Google Images will give you a page of results full of typically white men. Crawford’s talk also notes how Nikon camera software would label Asian faces as “blinking” in photos.
Hence, given these biases of allocation and representation, we need to be extremely careful when using AI systems to classify people, since the consequences in the real world can be catastrophic. A controversial paper described the development of an AI classifier that was able to predict the likelihood of an individual being homosexual based on an image of their face. Given that homosexuality is illegal and punishable by death in a number of countries with sharia-based law, this work drew criticism for being highly unethical. What if this technology were to get into the wrong hands? Arguably, similar technology already has. AI that can recognize the ethnicity of faces has been deployed in China by a company called SenseNets in order to monitor Uighur Muslims as part of their repressive crackdown on this minority group.
Fixing AI bias
So what do we do about bias in AI? Given that we can hypothesize that the impact of the industry’s innovation on the world will only ever increase, we need to make a stand now to prevent existing discrimination in our society spreading virally.
The AI Now Institute’s paper argues that in order to fix it, we need to begin by fixing the diversity issue within the AI industry itself. That begins by recognizing that there is a serious problem, and recognizing that without radical action, it will only get worse. Unlike the lack of diversity in technology generally, where those primarily harmed are current and future employees, the diversity bias encoded into AI systems could have a devastating effect on the entire world.
We need to raise the profile of the diversity problem in technology companies so that non-action becomes unacceptable. This goes far beyond identifying the issue as just a “pipeline problem”, where companies exclaim that due to diverse candidates being a minority on the job market, it is harder to find them. Simply blaming the hiring pool puts the onus on the candidates, rather than on the workplaces themselves. Instead, tech companies need to work hard on transparency. This ranges from publishing compensation levels broken down by race and gender, to publishing harassment and discrimination transparency reports, to recruiting more widely than elite US universities and creating more opportunities for under-represented groups by creating new pathways for contractors, temporary staff and vendors to become full time employees.
We also need to address the way in which AI systems are built so that discrimination and bias are addressed. The AI Now paper suggests tracking and publicizing where AI systems are used, performing rigorous testing, trials, and auditing in sensitive domains, and expanding the field of bias research such that it also encompasses the social issues caused by the deployment of AI. It is also noted that assessments should be carried out on whether certain systems should be designed at all. Who green lighted the development of a homosexuality classifier? What should the ramifications be for doing so?
At a time in history where significant advances in the speed and availability of compute resources have made wide-scale AI available for the masses, our understanding of the impact that it is having on society is lagging sorely behind. We have a long way to go in order to fix the diversity problem in AI and in technology in general. However, through the work published by the AI Now Institute, we can see that if we don’t fix it soon, the systems that we create could cause further harmful divide in an already heavily divided world.