Home » Artificial Intelligence, Featured

Designing Against Bias in Machine Learning and AI

1 September 2023 464 views No Comment
Photo of David Corliss, smiling, mustache and beard.
David Corliss is the principal data scientist at Grafham Analytics. He also serves on the steering committee for the Conference on Statistical Practice and is the founder of Peace-Work.

Bias in machine learning algorithms is one of the most important ethical and operational issues in statistical practice today. Shifting the focus to testing for and mitigating bias at the design stage instead of after code is released can help prevent many of the problems seen in ML and artificial intelligence.

Confirmation bias is common, resulting from the human tendency to acquire information, analyze it, and develop explanations that confirm preexisting beliefs. It can lead to the selective inclusion of data sources that produce the results expected. While this may seem obvious, it can be insidious in actual practice. Algorithm developers can easily dismiss data sources as untrustworthy after seeing the results, making it important to have independent review of the data included in a training set.

A biased training set unrepresentative of the population of interest can result from convenience sampling, where an algorithm is developed using all the people available or who chose to answer a survey. This has been seen in voice recognition systems, where female voices are much more likely to be misunderstood.

Not using over-sampling for smaller population subsets can result in poor model performance for those groups, which are often made up of minorities. Alex Najibi at Harvard found that several facial recognition programs from leading companies had a lower accuracy for women and persons of color.

Prejudice bias results from training an algorithm with data labeling taken from previous human decisions, teaching the algorithm to replicate the very human bias many are created to avoid. While known as prejudice bias in literature, the source of the problem is unscreened previous human decisions in the training set. It can also occur in instances in which no prejudice is involved, such as in AI used to improve identification of part defects in a manufacturing plant.

Another type of bias can occur when there is a huge number of predictors, such as in text analytics. One example is using AI to screen résumés, then losing points if the text includes the name of a school with a female student body because people from that school were seldom hired in the past. If candidate predictors are included without screening individual predictors for bias, biased predictors can appear in the final model.

Lack of transparency in the model is another problem affecting algorithm bias. While not a source of bias in and of itself, lack of algorithm transparency can greatly complicate testing, identifying and confirming potential bias, and mitigating the effects. When the features included in an algorithm are withheld from the people using the algorithm and the people whose lives are significantly affected by its use, bias becomes more difficult to identify and mitigate.

Bias can be measured using disparate impact or benefit on marginalized minorities. An excellent example is the 2016 study of the COMPAS recidivism algorithm by Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin of ProPublica. The study found the algorithm misidentified Black persons as high risk more often than white people. It also found white people were more likely to be misidentified as low risk than Black. A key feature was the examination of the off-diagonal elements of the confusion matrix. The producers of COMPAS pointed to the accuracy of the model—the percentage of correct predictions. However, the bias was found in the records the algorithm got wrong: Being a person of color was an important factor in overestimating the risk of recidivism.

Many metrics can be used to quantify bias in scientific studies. In my own experience, odds ratios are easier to explain to a nonscientific audience, including most legislators, agency managers, and the public.

In recent years, several statistical packages have been developed to measure algorithm bias. One of the best is the Fairlearn package, developed by a Microsoft team led by Miroslav Dudik and released for free public use in 2020. Fairlearn calculates all the statistics needed to quantify algorithm bias, including confusion matrix and odds ratios for disparate impact. It features visualizations to help users understand the amount of bias in different model variants and balance it against predictive strength.

Understanding and avoiding the common sources of bias allows mitigation at the design stage—before an algorithm is released and subsequently found to have problems. New tools such as Fairlearn can be used to quantify and mitigate bias. The establishment and practice of standardized bias testing during design and development will result in AI that is more accurate, widely applicable, and fair to everyone.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.