Home » Featured, JEDI Corner

JSM Session Touches on Equity

1 November 2023 515 views No Comment
The Justice, Equity, Diversity, and Inclusion (JEDI) Outreach Group Corner is a regular component of Amstat News in which statisticians write about and educate our community about JEDI-related matters. If you have an idea or article for the column, email the JEDI Corner manager at jedicorner@datascijedi.org.

A white man with a brown beard, glasses, and a bald head smiles slightlyBrian Tarran is head of the data science platform at the Royal Statistical Society and editor of Real World Data Science. He was previously the editor of Significance magazine.

Filter bubbles. Echo chambers. Groupthink. All things we are told to watch out for, steer clear of, or break out from. Nowadays, though, we hear less about the dangers of ‘vicious circles’—yet the dangers have not receded.

Asian woman with short, dark, curly hair and frameless glasses smiles

Sunghee Lee, University of Michigan

In a session at the 2023 Joint Statistical Meetings, Sunghee Lee of the University of Michigan showed how incomplete data on Asian-American populations risks fueling a vicious circle of inaction and growing inequality.

Lee’s work was presented in the session “Statistically Significant: Equity Concerns in Algorithmic Bias, Privacy, and Survey Representation.” The research done by Lee and presented by her University of Michigan colleague Raphael Nishimura compared the socio-demographics of Asian-American respondents reported by four large-scale sample surveys against the same characteristics collected by the US Census Bureau’s American Community Survey.

What Lee found was that the surveys often differed in important respects. For example, Asian Americans accounted for seven percent of adults aged 18 and over in the ACS, but in the General Social Survey (GSS) and Behavioral Risk Factor Surveillance Survey (BRFSS), they accounted for only four percent and two percent, respectively. And while 27 percent of Asian-American respondents to the ACS were educated to high-school level or below, the equivalent grouping in the BRFSS accounted for 18 percent.

More concerning was that none of the surveys, except the ACS, collected data on Asian Americans’ proficiency with spoken English. According to the ACS, 31 percent of Asian-American adults have “limited English proficiency,” yet Lee found that none of her four selected surveys (GSS and BRFSS, plus the Current Population Survey and National Health Interview Survey) offered questionnaires in Asian languages.

In summary, Lee found the geographic and ethnic heterogeneity of the Asian American population is not reflected in the data-collection efforts she focused on and current data likely over-represents Asian Americans who are born in the US and/or who have high levels of English proficiency. It is this underrepresentation of certain Asian-American subgroups that gives rise to a potential vicious circle—data is not collected, so issues affecting certain population groups are not identified, meaning no action needs to be taken, so no data is collected … and so on.

Data Equity in Data Privacy

Asian woman with blue streaks in her short dark hair smiles widely

Claire McKay Bowen, Urban Institute

Following up on Lee’s work was a talk by Claire McKay Bowen of the Urban Institute, offering an overview of Do No Harm Guide: Applying Equity Awareness in Data Privacy Methods. Bowen, who co-authored the guide with Joshua Snoke, walked delegates through key concepts distilled from conversations with experts about privacy-preserving methods and data sharing.

As Bowen explained, there exists a natural tension between privacy and utility. A data set kept private and never released to the public has low (or no) utility, for example, whereas any data set released to the public inevitably sacrifices some privacy for the sake of utility. The question is, where to strike the balance between privacy loss and utility? To explore this, privacy researchers use what are called privacy-utility curves to visualize the trade-off for different data sets and subsets of the data. Inevitably, different groups can have different privacy loss and utility curves, said Bowen. What this means in practice, as explained in the report, is that “Some groups may need to sacrifice relatively higher levels of privacy loss for the same increase in statistical utility, which means that those groups may obtain higher utility and lower privacy loss relative to other groups.”

A flow chart shows the steps for equitable SDP: Step 1 Identify groups in the data; Step 2: Identify appropriate representatives for the groups; Step 3: Make determinations with representatives and decision-makers; 3a Define statistical utility and privacy loss; 3b: Choose group-level preferences for statistical utility loss thresholds; Step 4: Communicate constraints to representatives and decision-makers; 4a: Use metrics and visuals for group-level tradeoff curves; 4b: Detail technical shortcomings (group size constraints); Step 5: Choose SDP implementation that best satisfies definitions, preferences, and constraints; Step 6: document and publish each step of the process

Claire McKay Bowen presented an “aspirational workflow” for creating equitable statistical data privacy during the session “Statistically Significant: Equity Concerns in Algorithmic Bias, Privacy, and Survey Representation” at JSM 2023 in Toronto, Canada.

Bowen also explored issues around defining groups in the data and identifying those who might best represent the interests of said groups, questions of resource allocation (how to provide access to restricted data and how to train people to properly use data), and the importance of not treating equity and privacy as separate studies.

Particularly useful was Bowen’s presentation of an ‘aspirational workflow’ for creating equitable statistical data privacy. Discussant Susan Gregurick of the National Institutes of Health praised the workflow for its focus on co-design and giving people a voice in the work being done to make sure they are benefiting from it.

Valid Decisions?

A white woman with long dark brown hair smiles widely

Amanda Coston, Microsoft Research

Rounding out the presentations was Amanda Coston, postdoc researcher at Microsoft Research, who set out to examine “the validity and fairness of societally high-stakes decision-making algorithms.” The algorithms Coston focused on were predictive algorithms used to inform decisions around consumer lending, health care, criminal justice, or child welfare. She set out to define what validity means in these contexts and the threats to validity posed by selection bias and missing data.

Coston began by defining validity, in the scientific or statistical sense, as the measures we use to actually measure what we intend to measure. For decision-making algorithms, she said, validity means the model predicts what it is supposed to predict. But, of course, they don’t always.

Image shows a slide that reads Common Threats to Construct Validity. Target Misalignmen: Target Out come does not equal Proxy outcome; Example 1: Clinical Healthcare* Target: Address medical needs; Proxy: Patient's healthcare costs; Risks to construct validity. At right is a dot graph with trend lines

Amanda Coston concludes her talk with discussion of statistical methods that can be used to counteract threats to validity. Photo courtesy of Brian Tarran

She then gave examples of where validity can be undermined. For instance, in a health care context, an algorithm may be used to inform decisions on patient care. The aim may be to predict health care needs and select the most in-need patients for a treatment program based on algorithmic risk score. However, what if the designers of the algorithm use health care costs as a proxy for health care needs? This creates a risk to “construct validity,” said Coston. Drawing on work by Ziad Obermeyer and colleagues in their Science article titled “Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations,” she showed how, at the same risk score, the disease burden on Black patients was more severe than that of white patients, leading to what Obermeyer describes as “substantial disparities in program screening.”

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.