Home » Featured, JEDI Corner

WNAR Session Explores Race, Ethnicity, Ancestry in Statistics

2 October 2023 646 views No Comment
The Justice, Equity, Diversity, and Inclusion (JEDI) Outreach Group Corner is a regular component of Amstat News in which statisticians write and educate our community about JEDI-related matters. If you have an idea or article for the column, email the JEDI Corner manager at jedicorner@datascijedi.org.

Photo of Yates Coley, Big smile, short hair, striped button down shirt.Yates Coley is the communications chair for the ASA Justice, Equity, Diversity, and Inclusion Outreach Group and editor of JEDI Corner.

Among statisticians, there is increased interest in applying our methodological expertise to research related to racism and race-based inequities. However, many in the field lack training or experience in conducting this research in a way that advances racial equity and justice. A recent session at the International Biometrics Society Western North American Region meeting, organized and chaired by Maricela Cruz and Audrey Hendricks, addressed this topic. Here, I provide an overview of the presentations and takeaways from the invited session, “Considerations and Best Practices for Using Race, Ethnicity, Ancestry in Different Areas of Statistics and Data Science Research.”

Mariah TsoMariah Tso is a Diné cartographer and GIS specialist for UCLA’s Ralph J. Bunche Center for African American Studies. She discussed the use of race in research through the lenses of decolonization and data feminism. Her remarks invited the audience to reflect on how standard methods of data collection, analysis, and interpretation can reinforce systems of oppression, rather than challenge them. Among the many insights shared, her presentation suggested the following:

  • Relinquish the ideals of “objectivity” and “scientific truth,” concepts that presume some detached perspective, and acknowledge that we each have a position of power and privilege with respect to our data and research.
    • Question what evidence your research is seeking to assess and for whom. Too frequently, research on racial disparities seeks to provide a white audience with proof of systematic discrimination or deprivation—systems that minoritized people know exist from their lived experiences.
      • Consider how choices made in the data collection and analysis process reflect values (and make choices that reflect your values). For example, an emphasis on “tidy” data values cleanliness and control over messiness and complexity. Yet, forcing people’s identities into neat, simplifying categories can be a violence. Broad, externally defined categories such as Latinx and American Indian flatten the diverse culture, language, and history of Indigenous peoples. Moreover, excluding less prevalent races or cultures from analyses, perhaps due to small sample size or concerns about power, is data genocide—literally erasing a people—and reifies that community’s lack of power in knowledge production.
        • Recognize the limitations of operation within the existing (hierarchical) scientific framework to truly advance liberation. Tso cites an example of facilitating self-identification of race by people who are incarcerated. This change will improve evaluation of disparities in this setting but ultimately doesn’t undermine the racist framework of policing and incarceration.

          Miguel MarinoMiguel Marino is an associate professor of biostatistics in the department of family medicine at Oregon Health & Science University. He is also co-director of the Primary Care Latino Equity Research Center. He discussed the opportunities and challenges of using disaggregation of race and ethnicity data to design and implement culturally appropriate interventions. Motivated by the “Latino Paradox”—relative to non-Hispanic whites, Latinos living in the United States have lower socioeconomic status but also lower all-cause mortality—the overarching goal of the research program presented is to link disaggregated data with health outcomes available in clinical records data to identify protective factors among the Latinx diaspora. To this end, Marino’s work has explored the feasibility of disaggregating data on people with Hispanic or Latino ethnicity by country of origin and preferred language, and he has conducted analyses demonstrating the value of this approach.

          Two example studies by Marino and his colleagues have examined differences in health insurance coverage and vaccination rates between English- and Spanish-preferring Latinos. John Heintzman and colleagues reported that disparities in insurance coverage in 23 community health centers in Oregon were eliminated following Medicaid expansion under the Affordable Care Act in their 2017 Journal of Racial and Ethnic Health Disparities article, “In Low-Income Latino Patients, Post-Affordable Care Act Insurance Disparities May Be Reduced Even More Than Broader National Estimates: Evidence from Oregon.” Drawing from qualitative research in this population, they found that increased coverage among patients who prefer Spanish could have been due to local efforts to increase coverage in Latinos and the result of promotoras, or community health workers in community health clinics.

          In his 2022 Journal of the American Geriatrics Society article, “Influenza and Pneumococcal Vaccination Delivery in Older Hispanic Populations in the United States,” Heintzman and colleagues identified lower rates of vaccination against influenza and pneumococcal for English-preferring Hispanic older adults compared to those who preferred Spanish. This finding is important to inform outreach to increase vaccination rates and would have been missed in an analysis that did not use disaggregation.

          Marino and his colleague’s research has also assessed feasibility of analyses disaggregated by country of birth. Since country of birth is less frequently collected in clinical records, they have developed methods to impute missing nativity data using last name and information on neighborhood composition. Using this approach, they identified variability in cardiovascular risk factors by country of birth in the Health Services Research article, “Disaggregating Latino Nativity in Equity Research Using Electronic Health Records.” Marino hopes demonstrating the potential impact of research with such disaggregated data will encourage policymakers to adopt data collection standards that better support these analyses.

          Betzaida MaldonaldoBetzaida Maldonado is a PhD student in human medical genetics and genomics at the University of Colorado Anschutz Medical Campus. Her talk focused on challenges and considerations for using race, ethnicity, and ancestry data in the context of genomic studies. She began with an overview of the historic use of race and ethnicity categories, pointing out that race and ethnicity categories currently used in biomedical research are not scientifically based—they were developed federally to facilitate civil rights monitoring. Race and ethnicity, as outlined by the Federal Office of Management and Budget Directive Number 15, are distinct. Race refers to a geographic and temporal social construct that classifies individuals based on shared physical characteristics, while ethnicity describes a group’s shared history, language, and culture. Furthermore, the use of the term “ancestry” is rapidly growing in genomics research. Genetic ancestry refers to paths through which regions in our genome have been inherited from our ancestors and is not synonymous with race or ethnicity.

          While race, ethnicity, and genetic ancestry capture different information, they are all too often used interchangeably in genomic studies. Standards recently published by the National Academies recommend investigators more mindfully use race-, ethnicity-, and ancestry-related terminology (as well as increase rigor and reduce harm of research examining health disparities). Of course, even with careful consideration of terms, preferred race, ethnicity, and ancestry labels may change over time. Race is, after all, a social construct without biological basis. These inconsistencies in practice compound the problem of a lack of diversity in genomic studies; without consistent definitions of terms, combining data across different cohort studies is challenging.

          The Population Architecture Using Genomics study seeks to “characterize genetic architecture of complex traits in underrepresented populations through large-scale genetics and epidemiological research.” Maldonado and her mentor, Chris Gignoux, are members of the study’s Race, Ethnicity, and Ancestry working group, which generates quantitative evidence on the impacts of decisions about the use of race, ethnicity, and ancestry in genomic research, including recruitment, data quality control, association analyses, and follow-up studies. As one example, the working group is examining the potential effect of diversity (or lack thereof) and reference population labels in fine-mapping studies, an extension of genome-wide association studies used to identify putative causal genetic variants driving differences in phenotypes. Overall, this work underlines the need for improved diversity and recruitment for genomic studies and recommends researchers be mindful of the use of race, ethnicity, and ancestry labels that could stigmatize certain populations.

          The session concluded with a panel discussion in which three themes emerged. First, all speakers emphasized the importance of fostering relationships with people affected by a research study. Community engagement should change how we do research—including the questions we ask, the data we collect, and how we frame research—and we should remember that valuable expertise isn’t limited to trained scientists; community members can also provide valuable insight.

          Second, speakers pointed to identifying new data sources (or counterdata) to better serve minoritized communities. Statisticians and data scientists should embrace messier data, as well as qualitative data, to support storytelling and community in their research.

          Finally, all speakers agreed the context of a particular research project should be considered when making decisions about data collection, analysis, and dissemination—there is no one-size-fits-all approach to research using race, ethnicity, and ancestry data. We must be mindful of the potential consequences of a project (however unintended) and act ethically to minimize harm and advance justice.

          1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
          Loading...

          Leave your response!

          Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

          Be nice. Keep it clean. Stay on topic. No spam.

          You can use these tags:
          <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

          This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.