Home » President's Corner

Using Our Superpowers to Contribute to the Public Good

1 May 2021 835 views No Comment

Rob Santos

I like to think of we statisticians as Jedi masters of uncertainty. We find ways to garner “The Force”—mathematical statistics—in subduing the dark side of uncertainty in data to achieve the common good: to gain insight and knowledge. But we are not omnipotent. Our superpowers are only as good as our underlying assumptions, assumptions that are all too often embraced with aplomb, yet cannot be proven.

Some election polling in the past two presidential elections has succumbed to the inadequacies of underlying assumptions (specifically, likely voter models), in my humble opinion. Some COVID models of infections and deaths performed well early on, only to falter as a result of real-time changes in underlying assumptions tied to population behaviors (e.g., impact of lock downs, travel bans, mask wearing).

When we are in our comfort zone and (implicit and explicit) assumptions hold, statisticians can appear invincible in their promises of cogent statistical inference. And while the sage words of our statistical Yoda, George Box, holds true today that “all models are wrong, but some are useful,” we statisticians can face a real challenge knowing when some models are useful and when they are not. Let’s discuss with an example.

Consider a class of statistical investigations in the policy arena—randomized controlled trials. The rage over the past couple of decades has been the adoption of randomized controlled trials (RCTs) to demonstrate effectiveness of a given program in addressing a particular social problem. Pick any program in housing, food insecurity, employment, education, recidivism, policing, etc. The usual sponsor-preferred approach is to conduct a demonstration and then follow that with an RCT. Simple and rigorous, right?

Well …, I always advocate to respect the life course of a program. Demonstration projects are necessarily formative in nature. Especially for a first-time, nascent program, necessary adjustments are made on the fly because you cannot possibly get everything “right” until you actually implement and see what’s working or not and what needs tweaking. Think about the COVID vaccine rollout as a program if you want a real-world example.

Note that a formative evaluation (almost always involving qualitative research) illustrates the rigorous version of this tweaking activity. In an ideal world, one would want the new program to mature and be running on all cylinders at the time of a quantitative evaluation that involves an RCT or quasi-experimental design. But that does not always happen—not by a long shot. It raises the question of what actually is being tested. Formally testing a novice program may not represent a fair assessment of its true impact or potency and lead to a missed opportunity to do good.

Then, there is the RCT itself. In the social and health program evaluation world, RCTs are pretty standard and randomly assign enrollees to the new treatment program, or to either nothing or “usual care” if some nominal forms of services are being offered. We, as statisticians, are typically charged with developing power calculations to detect some (often unspecified) level of impact that would be tantamount to a “successful” outcome. And then we are sent on our way until it’s time to analyze the results and presumably declare victory of some sort.

The problem is that an RCT is defined by assigning folks to a treatment or control group, where the control is supposed to be the counterfactual. Alas, there never really exists a true counterfactual—the absence of treatment and nothing else. The reason is that something else almost always exists. If I am unemployed and fail to get into a program to help me get employed, you can bet I will go find some other resources to help me secure a job. If my family experiences food insecurity and does not get into a program to help provide food, I will not stop my efforts until I find some way to get food on the table, and that may well be through some alternative program.

The point is that, for social program evaluations, there is almost never a true counterfactual; instead, RCTs typically measure the impact of a new program against some other unknown single treatment or combination of alternative treatments. Specifically, what is actually being measured is a treatment program’s marginal impact against unspecified other treatments.

Such program evaluations typically do not measure efficacy against a true counterfactual. That can wreak havoc on power calculations because notions of how much of a measurable impact really defines “success” need to be rethought and often lowered in magnitude, which inevitably means a much larger trial and more time and expense.

And then there is the reality of the environment. RCTs can suggest efficacy via statistical inference from a rigorous design, implementation, and analysis. But many social programs to help people in need are implemented at a local level using community-based organizations (CBOs)—which could benefit much from us volunteering our services, by the way. Local CBOs are not always as robust as we wish they were. It is not uncommon for professional staff—including senior staff such as program directors and CEOs/presidents—to have limited tenures, leaving after a couple of years. The loss of a “program champion” due to normal staff turnover can be devastating, even to the most effective program.

I have personally seen this time and again throughout my career. Thus, we see basic assumptions such as the stability of staff infrastructure supporting a program, the nature of the counterfactual in an RCT, and the magnitude of measured impact of a program with respect to alternative, unknown other treatments complicate the job of a statistician. You do need to care enough to scratch below the surface of developing power calculations or analyzing results and exploring the underlying influences on impacts.

My illustration used RCTs for program evaluation to discuss underlying assumptions, but the lessons learned apply to most social science–related endeavors. This is also true for data science projects and big data/AI projects. How do we know we are measuring the right thing? And is it being measured accurately? What assumptions implicitly undergird the validity of results and associated inferences?

We, as statisticians, are in an awesome position to help researchers think through these issues and understand the limitations and strengths of the statistical inferences that flow from rigorous research studies. I often use such opportunities to apply an equity lens to assess the cultural relevance and appropriateness of all aspects of the design, from the underlying logic model to data-collection modes, measures of efficacy, and intended analyses (and interpretations). The most stimulating discussions stem from visioning questions at the design stage when I ask, “If the program works as intended, what would be happening with the program staff, with the program participants, and in the community?” The bottom line is that we can and should invoke our own critical thinking into our statistical work, whether we are a team member on a project or the project’s “episodic” consultant. We best serve our profession and our communities when we are thoughtful and humble.

Yes, I do believe in the power of The Force. And I do believe statisticians can be superheroes. But, like everyone else, we are not infallible. Let’s use our superpowers with grace, honor, and integrity to achieve the public good.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.