As software testers, we understand the importance of functional testing, performance testing, accessibility testing etc., but as we comfortably settle ourselves into 2020, I think it’s time we start talking about diversity testing. It’s not only testing that needs to explore how to incorporate diversity testing; the entire development team should be doing the same.
The diversity issues that can arise when development teams overlook use cases for people of differing backgrounds limit the quality of our software for certain users, and can even impact their lives in detrimental ways. Considering diversity in both our development teams and in our development cycle is crucial to building the technology and the future that we want.
Who is developing the technology of the future?
In an interview with The Guardian1, Ghanaian-American computer scientist and digital activist Joy Buolamwini pointed to her time working on social robotics as a prime example of how human biases can impact software. The robot her team built used open source computer vision software to detect the face of the human with which it was interacting. The software seemed to consistently detect her teammates with lighter skin, but struggled to detect her. Several years later, when she returned to the lab to pursue her graduate studies, she was disappointed to find she ran into the same problem. “I found wearing a white mask worked better than using my actual face.”2
When an all-white team of developers creates facial recognition software, testing it only on themselves, it is next to impossible to catch the issues that people with darker skin might experience when they attempt to use the same software. As Joy Buolamwin made clear, “We have a very narrow vision of what technology can enable right now because we have very low participation. I’m excited to see what people create when it’s no longer just the domain of the tech elite, what happens when we open this up.”3 The more diversity we have involved in the creation of our software, the better and more versatile our software will be.
When development teams are made up of people of the same race, socio-economic status, gender, etc., the issues that exist outside of their experiences are likely to be overlooked throughout the development cycle. Buolamwini’s robot is a clear example of a product that is only made better by a more diverse development team.
What is the cost of ignoring diversity?
You may recall the COMPAS scandal from Florida in 2016, which is another well-known example of ingrained bias.4 COMPAS was a piece of software used by judges to help determine whether individuals charged with crimes should be released from jail prior to their trials. It did so by assigning each person with a risk-status score to determine whether, or not, they were likely to re-commit a crime within two years if they were released.
During an investigation of thousands of defendants’ scores, ProPublica found that there was a much higher number of false positives in risk-status scores for people of colour than their white counterparts.5 In other words, the result of their investigation showed that “they were classified by COMPAS as high risk but subsequently not charged with another crime6.”
Further investigation into the COMPAS software found a few reasons for this discrepancy. For one, the training data used by the artificial intelligence algorithm was historical data. By using historical data, the algorithm likely found that certain groups of people were arrested at a higher rate than others, but was of course unable to take into account the unjust historical and on-going practices of discrimination against minorities and racial groups in the justice system that cause their over-representation.7
The COMPAS scandal leads us to wonder if we are asking the right questions of our artificial intelligence. It can be used in many different ways to help us streamline processes and encourage more consistent results, so long as we understand its limitations. Because we are still seeking to fully understand the power of artificial intelligence and machine learning, we must be cautious with how we include them in our project solutions.
By creating an open relationship between diverse teams of experts and software development teams, we can create products that optimize the value that artificial intelligence provides while avoiding the reinforcement of discrimination, to create products that better serve everyone.
We as software testers can help to eliminate the biases in technology by asking questions and thinking about who our end users might be. Ask yourself whether there are groups of people who will be influenced, disadvantaged, or benefit from this software that are not represented on our team. Is the data we are using the right data? Is the algorithm answering the right questions?
If the software we build reinforces existing prejudices, then our software is inadequate. If the software we are creating works well for one group of people, and poorly for everyone else, then we are creating weak software. If we aren’t testing for diversity, then how can we say we are testing for quality?
If you're looking to learn more or dive deeper into the impact of diversity on the development cycle and in software testing we recommend checking out Algorithms of Oppression by Safiya Umoja Noble and this piece by Heidi Ledford on racial bias in health care algorithms.
1Ian Tucker, “’A white mask worked better’: why algorithms are not colour blind,” Guardian, May 28, 2017 (https://www.theguardian.com/technology/2017/may/28/joy-buolamwini-when-algorithms-are-racist-facial-recognition-bias) Accessed: February 20, 2020
2Guardian, May 28, 2017
3Guardian, May 28, 2017
4Rachel Courtland, “Bias detectives: the researchers striving to make algorithms fair,” Nature, June 20, 2018, (https://www.nature.com/articles/d41586-018-05469-3). Accessed: February 20, 2020
5Nature, June 20, 2018
6Nature, June 20, 2018
7Nature, June 20, 2018
About Danielle (Dani) Gulliver
Danielle Gulliver is a software tester at PQA Testing Ltd. who completed her Bachelor of Computer Science from Carleton University in May of 2017. Although her specialization was in Game Design, her technological interests are broad. In her short career, she has had experience with performance testing, automated testing, content QA, writing blog posts and now, podcasting.