The Story Behind the ASA Statement on P-Values
Almost 30 years ago (1988), I published a paper in the Journal of Parapsychology titled “Successful Replication versus Statistical Significance” in which I argued against the use of the standard “p ≤ .05″ as the criterion for judging the success of an experiment. I pointed out the problems with p-values that statisticians were well aware of even then, but many scientists (and journal editors) are only now beginning to understand, such as the role of sample size in determining statistical significance. The paper generated substantial discussion, and at the Parapsychological Association annual conference that year, someone distributed T-shirts to support my point of view.
For the past year, the ASA has had a committee working on elucidating principles that should accompany the use of p-values. I asked Ron Wasserstein, ASA’s executive director, to answer some questions about how this came about.
The ASA recently released a statement on p-values, and you were involved throughout the process. How did the ASA decide to get involved?
Former ASA Vice President George Cobb suggested the ASA take some action. Many statisticians were already quite concerned about statistical issues in the “reproducibility crisis” in science. At its spring 2014 meeting, the board discussed whether to take this on. Board members observed that this was new ground for the ASA, but agreed it was important for us to do.
In what way is this new ground?
Certainly in recent years, the ASA has addressed matters through policy statements—matters of importance such as the role of statistics in data science, value-added models in educational assessment, risk-limiting election audits, qualifications for introductory statistics instructors, and so on. None of these, however, speaks to such a fundamental practice of statistics as does this statement.
Once the ASA decided to get involved, how did the process get started?
The board gave me the responsibility of assembling a panel of experts on the subject. It wasn’t hard to find a great group of statisticians for this task. We looked at the literature and identified people who were actively writing about these matters. We asked those people who else we should be talking to and reached out from there. It was important to the board that a rich variety of perspectives be included. In the end, we approached more than two dozen people, almost all of them saying they would be willing to be involved. There was skepticism on the part of many as to whether a statement could be agreed upon, but the skeptics wanted to be involved nonetheless.
What happened next?
We went to work. We decided on an outline for the statement and broke the work of creating the pieces of the statement into three parts. We formed subgroups to address each part. Each subgroup had a leader, and the leaders began email discussions about the relevant topics. Things moved along, but after a while, we began to hit some snags.
Let’s talk about snags. It took a long time from start to finish on this. Colleagues have expressed surprise at that, saying “I could have written a statement on p-values in an afternoon.” Why did it take so long?
An afternoon won’t get it done, but I admit I had no idea at the outset it would take as long as it did. Like so many things, the issues here are much more complicated than they first appear. They are both foundational and practical, theoretical and methodological. Issues that have been debated at least since Fisher and Neyman and Pearson are still in play, and Bayesian philosophies and methods add to the mix. Then there is a whole other dimension. Statisticians are happy engaging in these debates, but our intended audiences need us to get real, to sort things out and explain what needs to change in the way they practice statistics. There are lots of opinions about this. In the words of George Cobb parodying the birthday problem, “How many statisticians does it take to ensure at least a 50% chance of disagreement about p-values?” George says the answer is either 1 or 2!
In the end, the statement was not much like the original outline. The long effort and thoughtful deliberation paid off, however, in a statement that should accomplish its purposes.
What do you hope this statement will accomplish?
We have big dreams for this statement. We’d love to see the practice of science with respect to its use of statistical inference undergo a cultural shift. We envision a “post p < 0.05 era," one in which scientific argumentation is not based on whether a p-value is small enough. In this era, attention would be paid to effect sizes and confidence intervals. P-values, when used, would be reported as values, rather than inequalities (p = .0168, rather than p < 0.05). Indeed, we envision there being better recognition that measurement of the strength of evidence really is continuous, rather than discrete. In this era, all the assumptions made that contribute information to inference would be examined, including the choices made regarding which data are analyzed and how. In the post p < 0.05 era, sound statistical analysis will still be important, but no single numerical value will substitute for thoughtful statistical and scientific reasoning.
The journals are the gatekeepers that can usher in this era. If the statement succeeds in its purpose, journals will stop using statistical significance to determine whether to accept an article. Instead, journals will accept papers based on clear and detailed description of the study design, execution, and analysis. The papers will have conclusions based on valid statistical interpretations and scientific arguments, and they will be reported transparently and thoroughly enough to be rigorously scrutinized by others. We won’t be left scratching our heads trying to sort out researcher degrees of freedom. The file-drawer effect will be reduced.
We know there are areas of science and related journals following many of these practices already, but we hope the statement will drive others in that direction. We also know this isn’t change that happens overnight.
How can ASA members help?
The statement can’t achieve its purposes if only statisticians know about it. Members can help enormously by sharing the statement through their networks, both social media networks and personal networks. For the really ambitious, giving a seminar on p-values and statistical inference for nonstatistical colleagues in the workplace would be a great way to engage. Many of our members are asked to referee papers for journals in other disciplines, and even may serve as associate editors. They could inform those journal editors of our statement and base their own reviews on its principles.
Will the ASA be following a similar process on other topics? If so, what are some possibilities?
I think we should be, and I think the board agrees. We would love to get ideas from members about appropriate topics. Perhaps we could drill further down into aspects of the ASA’s recent statement on the role of statistics in data science. Are there areas of research practice where there are Bayesian methodologies that should be employed but aren’t? There is a lot of concern about multiple testing and confirmatory vs. exploratory research. If there are areas in which we have things to say that could make a difference, let’s do it!