Home » Additional Features

ASA President’s Task Force Statement on Statistical Significance and Replicability

1 August 2021 7,798 views 4 Comments
The value of hypothesis testing, and the frequent misinterpretation of p-values as a cornerstone of statistical methodology, continues to be debated. In 2019, the president of the American Statistical Association, Karen Kafadar, convened a task force to consider the use of statistical methods in scientific studies, specifically hypothesis tests and p-values, and their connection to replicability. The document written by the task force is reprinted below.

Over the past decade, the sciences have experienced elevated concerns about replicability of study results. An important aspect of replicability is the use of statistical methods for framing conclusions. In 2019, the president of the American Statistical Association established a task force to address concerns a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) This document is the statement of the task force, and the ASA invited us to publicize it. Its purpose is two-fold: to clarify that the use of p-values and significance testing, properly applied and interpreted, are important tools that should not be abandoned and to briefly set out some principles of sound statistical inference that may be useful to the scientific community.

The most reckless and treacherous of all theorists is he who professes to let facts and figures speak for themselves, who keeps in the background the part he has played, perhaps unconsciously, in selecting and grouping them.
(Alfred Marshall, 1885)

P-values are valid statistical measures that provide convenient conventions for communicating the uncertainty inherent in quantitative results. Indeed, p-values and significance tests are among the most studied and best understood statistical procedures in the statistics literature. They are important tools that have advanced science through their proper application.

Much of the controversy surrounding statistical significance can be dispelled through a better appreciation of uncertainty, variability, multiplicity, and replicability. The following general principles underlie the appropriate use of p-values and the reporting of statistical significance and apply more broadly to good statistical practice.

Capturing the uncertainty associated with statistical summaries is critical. Different measures of uncertainty can complement one another; no single measure serves all purposes. The sources of variation the summaries address should be described in scientific articles and reports. Where possible, those sources of variation that have not been addressed should also be identified.

Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data. Setting aside the possibility of fraud, important sources of replicability problems include poor study design and conduct, insufficient data, lack of attention to model choice without a full appreciation of the implications of that choice, inadequate description of the analytical and computational procedures, and selection of results to report. Selective reporting, even the highlighting of a few persuasive results among those reported, may lead to a distorted view of the evidence. In some settings, this problem may be mitigated by adjusting for multiplicity. Controlling and accounting for uncertainty begins with the design of the study and measurement process and continues through each phase of the analysis to the reporting of results. Even in well-designed, carefully executed studies, inherent uncertainty remains, and the statistical analysis should account properly for this uncertainty.

The theoretical basis of statistical science offers several general strategies for dealing with uncertainty. P-values, confidence intervals, and prediction intervals are typically associated with the frequentist approach. Bayes factors, posterior probability distributions, and credible intervals are commonly used in the Bayesian approach. These are some among many statistical methods useful for reflecting uncertainty.

Thresholds are helpful when actions are required. Comparing p-values to a significance level can be useful, though p-values themselves provide valuable information. P-values and statistical significance should be understood as assessments of observations or effects relative to sampling variation, and not necessarily as measures of practical significance. If thresholds are deemed necessary as part of decision-making, they should be explicitly defined based on study goals, considering the consequences of incorrect decisions. Conventions vary by discipline and purpose of analyses.

In summary, p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data. Analyzing data and summarizing results are often more complex than is sometimes popularly conveyed. Although all scientific methods have limitations, the proper application of statistical methods is essential for interpreting the results of data analyses and enhancing the replicability of scientific results.

Editor’s Note: Reprinted with permission from the Institute of Mathematical Statistics.

Statement Authors

    Yoav Benjamini is emeritus professor of applied statistics in the department of statistics and operations research at Tel Aviv University and a member of the Sagol School of Neuroscience and the Edmond Safra Bioinformatics Center. He has been visiting professor at the University of Pennsylvania; University of California, Berkeley; Stanford University; and Columbia University.

    Benjamini is a co-developer of the widely used false discovery rate concept and methodology. His other research topics are replicability and reproducibility in science and data mining, with applications in biostatistics, bioinformatics, animal behavior, geography, meteorology, brain imaging, and health informatics.

    A member of the Israel Academy of Sciences and Humanities and the US National Academy of Sciences, Benjamini has also received the Israel Prize in Statistics and Economics and the Founders of Statistics Prize of the International Statistical Institute.

    Richard D. De Veaux is C. Carlisle and Margaret Tippit Professor of Statistics in the department of mathematics and statistics at Williams College. He has won both the Wilcoxon and Shewell awards (twice) from the American Society for Quality, is a fellow of the American Statistical Association, and an elected member of the International Statistics Institute.

    De Veaux’s research interests are in statistical learning and its application to problems in science and industry. He is the co-author of six highly successful textbooks in statistics at the high-school and college levels and currently serves as the senior vice president of the ASA.

    Bradley Efron is Max H. Stein Professor of Humanities and Sciences and professor of statistics and biostatistics in the departments of statistics and biostatistics at Stanford University and co-director of the mathematical and computational sciences program.

    For his contributions to theoretical and applied statistics, especially the bootstrap, Efron is the recipient of the MacArthur Prize (1983), American Statistical Association Wilk’s Medal (1991), National Medal of Science (2005), Royal Statistical Society Guy Medal in Gold (2014), and 2019 International Prize in Statistics.

    Efron is a former editor of the Journal of the American Statistical Association and founding editor of The Annals of Applied Statistics. He also served as president of the Institute of Mathematical Statistics (1987–1988) and ASA (2005).

    Scott Evans is professor and founding chair of the department of biostatistics bioinformatics and the director of the biostatistics center at The George Washington University Milken Institute School of Public Health. He is the director of the statistical and data management center for the Antibacterial Resistance Leadership Group and co-chair of the Benefit-Risk Balance for Medicinal Products Committee for the Council for International Organizations of Medical Sciences.

    Evans is a recipient of the Mosteller Statistician of the Year Award and Robert Zackin Distinguished Collaborative Statistician Award. He is also an elected member of the International Statistical Institute and a fellow of the American Statistical Association, Society for Clinical Trials, and Infectious Disease Society of America.

    Mark Glickman is senior lecturer on statistics in the Harvard University Department of Statistics and senior statistician at the Center for Healthcare Organization and Implementation Research, a Veterans Administration Center of Innovation.

    Glickman’s research interests include statistical models for rating competitors in games and sports and statistical methods for problems in health services research. He is a fellow of the American Statistical Association and serves on the ASA Board of Directors as Council of Sections Governing Board representative.

    Barry I. Graubard is senior investigator in the biostatistics branch at the National Cancer Institute. He develops design and model-based statistical methods for conducting epidemiologic and public health analyses of studies with complex sample designs, including national surveys, population-based case-control studies, and nested cohort studies.

    Graubard is a fellow of the American Statistical Association and American Association for the Advancement of Science. He is also past chair of the ASA Biometrics Section and a recipient of the Committee of Presidents of Statistical Societies George W. Snedecor Award (1990) and ASA Mentoring Award (2020).

    Xuming He (co-chair) is H.C. Carver Collegiate Professor of Statistics at the University of Michigan. His research interests include theory and methodology in robust statistics, semiparametric regression, dimension reduction, and subgroup analysis.

    He is former co-editor of the Journal of the American Statistical Association and a fellow of the American Statistical Association, Institute of Mathematical Statistics, and American Association for the Advancement of Science. He is also a past president of the International Chinese Statistical Association and president-elect (2021–2023) of the International Statistical Institute.

    Karen Kafadar (ex-officio) is Commonwealth Professor and Chair of Statistics at the University of Virginia. She develops robust statistical methods for applications in the physical, biological, and medical sciences.

    Kafadar is a fellow of the American Statistical Association and American Association for the Advancement of Science and an elected member of International Statistical Institute (ISI). She has also served on several committees for the National Academy of Sciences and has been editor of Technometrics and The Annals of Applied Statistics.

    Among her honors, Kafadar has received the William G. Hunter Award (American Society for Quality, 2001) and the ASA’s Outstanding Statistical Application Award (1995). She was also president of ISI’s International Association of Statistical Computing and served as ASA president in 2019.

    Xiao-Li Meng is Whipple V.N. Jones Professor in the department of statistics at Harvard University and the founding editor-in-chief of Harvard Data Science Review. His research interests are wide-ranging, from statistical foundations for data science to astrostatistics and Monte Carlo methods.

    Meng was named the best statistician under the age of 40 by the Committee of Presidents of Statistical Societies in 2001 and was elected to American Academy of Arts and Sciences in 2020. Previously, he served as president of the Institute of Mathematical Statistics, chair of the department of statistics at Harvard, and dean of the Graduate School of Arts and Sciences at Harvard.

    Nancy Reid, university professor of statistics at the University of Toronto. Her research interests include statistical theory, likelihood inference, design of studies, and statistical science in public policy.

    Reid is a past president of the Statistical Society of Canada, past vice president of the International Statistical Association, and former scientific director of the Canadian Statistical Sciences Institute. She is a foreign associate of the National Academy of Sciences and a fellow of the American Statistical Association, American Association for the Advancement of Science, Royal Society, Royal Society of Canada, and Royal Society of Edinburgh.

    Stephen M. Stigler is Ernest DeWitt Burton Distinguished Service Professor Emeritus in the department of statistics at The University of Chicago. He has written several books and numerous articles about the history of statistics and statistical theory and methods in the natural and social sciences.

    A member of the American Philosophical Society and membre associé of the Académie Royale de Belgique (Classe des Sciences), as well as fellow of the American Academy of Arts and Sciences, Stigler has served as president of the International Statistical Institute and Institute of Mathematical Statistics and as editor of the Journal of the American Statistical Association (Theory and Methods).

    Stephen B. Vardeman is a university professor in the departments of statistics and industrial and manufacturing system engineering at Iowa State University. He is a former editor of Technometrics, a fellow of the American Statistical Association, and an elected member of the International Statistical Institute.

    Vardeman’s professional interests include statistical machine learning, business and engineering analytics, engineering and natural science applications of statistics, statistics and metrology, directional data analysis, industrial applications, statistical education, and the development of statistical theory and methods.

    Christopher K. Wikle is Curators’ Distinguished Professor and chair of statistics at the University of Missouri, with additional appointments in soil, environmental, and atmospheric sciences and the Truman School of Public Affairs. His research interests are in spatiotemporal statistics applied to environmental, ecological, geophysical, agricultural, and federal survey applications, with particular interest in dynamics.

    Wikle is a fellow of the American Statistical Association and Institute of Mathematical Statistics and elected fellow of the International Statistical Institute. He is also associate editor of the Journal of the American Statistical Association (Theory and Methods), Environmetrics, Spatial Statistics, and Weather and Forecasting and is one of six inaugural members of the Statistics Board of Reviewing Editors for Science.

    Tommy Wright is research mathematical statistician and chief of the Center for Statistical Research and Methodology at the US Census Bureau. His current collaborative research interests include probability sampling; uncertainty in overall rankings based on sample surveys; empirical assessment of uncertainty in disclosure avoidance methods; and development of results that bring together his interests in optimal sample allocation, apportionment of the US House of Representatives, and Lagrange’s identity.

    Wright teaches part time at Georgetown University and is an elected member of the International Statistical Institute and fellow of the American Statistical Association.

    Linda J. Young (co-chair) is chief statistician and director of research and development at the National Agricultural Statistics Service. Her research interests include developing statistical methods for analyzing integrated data from disparate sources and integrating web-scraped information into survey processes.

    Young is former editor of the Journal of Agricultural, Biological, and Environmental Statistics and a fellow of the American Statistical Association, International Statistical Institute, and American Association for the Advancement of Science. She is also a past president of the Eastern North American Region of the International Biometric Society and past chair of the Committee of Presidents of Statistical Societies.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading...

    4 Comments »

    • Hrishikesh Vinod said:

      The earlier policy against p-values was like
      “Don’t throw the baby out with the bathwater” the idiomatic expression for when something good (p-values) is eliminated when trying to get rid of something bad (misinterpretations). I am glad ASA fixed the problem with this task force statement.

    • John Xie said:

      The issue on the use and/or misuse of p-values has largely settled because of the official statement by ASA in 2016 (The ASA’s Statement on p-values: Context, Process, and Purpose). On the other hand, the 2019 Editorial (Moving to a World Beyond “p < 0.05”) is a fair summary of the 43 articles published in the TAS special issue focusing on the topic of statistical significance. The key message of the 2019 Editorial is simple and clear "Statistically significant, Don't say it and don't use it", no matter you may claim the statistical significance by referring to a p-value, a confidence interval, a Bayes factor, or a credible interval. It is true that the 2019 Editorial's view should not be interpreted as ASA's official policy, so is this ASA president's task force statement.

      The sentence "Comparing p-values to a significance level can be useful, though p-values themselves provide valuable information." is at least an overstatement about the usefulness of the p-values according to ASA 2016's official p-value statement. Furthermore, a more appropriate version of this sentence maybe: "Depending on the situation, comparing p-values to a significance level can be useful or nonsensical, and often misleading for the purpose of scientific inference."
      My final remark is: It is an embarrassment for the whole statistics community that so far statisticians could not even agree with what the definition of “probability” is in practical sense. I therefore found it ironic to read the sentence “ Indeed, P -values and significance tests are among the most studied and best understood statistical procedures in the statistics literature.”

    • John Xie said:

      The core message of this ASA President’s Task Force Statement is: “In 2019, the president of the American Statistical Association established a task force to address concerns a 2019 editorial in The American Statistician (an ASA journal) might be mistakenly interpreted as official ASA policy. (The editorial recommended eliminating the use of “p < 0.05” and “statistically significant” in statistical analysis.) "
      Since the 2019 Editorial (Moving to a World Beyond “p < 0.05”) is only aiming to be a fair summary of the 43 articles published in the TAS special issue focusing on the topic of statistical significance (Volume 73, 2019 – Issue sup1: Statistical Inference in the 21st Century: A World Beyond p < 0.05), it would be more constructive for the 14 authors of this ASA President’s Task Force Statement to give another summary editorial to show why the 2019 Editorial might not be considered as a fair summary report, rather than simply give a short statement as presented in this webpage.

    • John Xie said:

      My dearest lady ‘Statistical Significance’: a “song of praise” dedicated to those researchers and scholars who have learned, or been teaching, and/or been applying ‘statistical significance’ in their research data analyses, and loved and missed her so much.
      Please do not ask me the question “what is the theoretic foundation that underpins ‘statistical significance’?”. What I can tell you is ‘the charm of statistical significance is irresistible’ and ‘her power is unmeasurable’ as hundreds of statistics textbooks said so. In order to be able to claim my research finding is conclusive, hence publishable, it is worth searching for ‘statistical significance’ with all my heart and strength.
      I don’t care what Fisher has said in his ‘test of significance’, neither would I care what Neyman-Pearson’s ‘hypothesis test’ is all about. All I need to care about is the ‘Null Hypothesis Significance Test (NHST)’ as taught in almost every standard statistics textbook.
      Please do not bother me with ‘sampling population’ or ‘how my sample data were selected from the population’. It is my true belief that the magic of application of NHST would expel the uncertainty from my sample data because of the ‘statistical significance’. That is why ‘statistical significance’ becomes a treasure in my analysis toolbox.
      As long as my analysis results are statistically significant, my scientific research findings would be deemed to be confirmed – conclusive answers found! What is the point then for me to bother about the disciplinary context or scientific interpretation of my statistical analysis results?!
      Don’t tell me that ‘scientific findings may be established through repeated experiments’; the statement: “the very nature of statistical inference is exploratory” also does not sound a good news for me. My heart goes with the ‘statistical analysis miracle’ achieved by ‘statistical significance’ – one set of sample data could tell me the definite answer of ‘yes’ or ‘no’; ‘true’ or ‘false’.
      Who cares about the probability distribution theory; so what if those model assumption conditions cannot be met? As long as the ‘statistical significance’ is achieved as the best stepping-stone towards publications and degrees, any ignorance or worries I had about statistics would be disappeared like clouds flying away.
      Forget about the ‘sampling distribution’ theory; the view to believe that ‘no scientific findings can be established based on an isolated study’ should not be taken seriously. ‘Statistical significance’ is Queen in statistical analysis because she grants us the conclusive results and nobody would care how I have obtained her.
      Trust not my scientific common knowledge and apply not my professional judgment for ‘statistical significance’ is the golden rule for assessing scientific research findings.
      It is a dumb thing to do to explore scientific findings based on laborious and time-consuming repetitive experiments upon disciplinary theory. It is wasting of time trying efforts to make connections between the subject context and statistical models. The best way to fast track/achieve your research findings is to search for the ‘statistical significance’ which is a standard product of those wonderful statistical software.
      Furthermore, employ as many as possible various ‘hypothesis tests’ to increase the ‘validity’ of your data analysis results; make my statistical models as complex as possible so that the analysis outcomes would not be testable. To do so, I would be able to show how professional and smart I am in performing statistical analyses.
      Furthermore, no matter it is the p-values or confidence intervals in a frequentist approach, or it is the Bayes factors or posterior probability credible intervals in a Bayesian approach, with all my heart and my strength, I shall search for my dearest ‘statistical significance’ until I get hold of you.
      Statistical Significance, Statistical Significance, My dear Statistical Significance! I shall hold you until I fall asleep into my sweet dream.
      Statistical Significance, Statistical Significance, My dear Statistical Significance! How could I survive my academic/professional life without you!