Home » Additional Features, News and Announcements

ASA Releases ‘Statement on Statistical Significance and P-Values’

7 March 2016 4,174 views 2 Comments

The ASA “Statement on Statistical Significance and P-Values” includes six principles underlying the proper use and interpretation of the p-value and is intended to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research. The statement is published in The American Statistician, along with more than a dozen discussion papers to provide further perspective on this broad and complex topic. If you’re an ASA member, please visit ASA Connect to comment on the statement and add your thoughts to the discussion. Public comments may be posted below.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)


  • Stuart Hurlbert said:

    It is an “interesting” comment on the discipline and practice of statistics that it takes a special commission to restate and reaffirm six principles the validity of which has been understood for more than half a century and which students should understand after any good 1-semester introductory statistics course. A strong testament to the rarity of good courses – and how little most statisticians know of the historical literature!

    There are many sources of the massive disarray in statistical understanding and practice, and I and a few colleagues have been writing about these for decades, as have others. For one, most statistics texts whether written by statisticians (Hurlbert 2013a) or biologists (Hurlbert 2013b) or other scientists contain fair amounts of bad advice and error. For another, editors and reviewers of journals, including statistical journals, often fail to detect even gross errors in mss and offer bad advice or instructions on statistical matters. The only time I submitted a ms to The American Statistician I ran into an editor and a reviewer who thought I misunderstood the classic definitions of ‘experimental unit’ and ‘blocking.’ So I found a journal more like the ones Fisher used to publish in (Hurlbert 2013a)! Lazy scholarship on the part of authors criticizing statistical practice is a big problem. Much of the time it is evident that they have read very little of the historical literature on the point (or error) they are making, which in the historical literature has been made and corrected over and over again. Especially the last few years, off-the-cuff, ‘drama queen’ authors having been getting a free pass. This is particularly true of the literature critical of P values and null hypothesis testing. But I and my colleague, Celia Lombardi, noticed this in the literature for every topic we’ve done review articles on (e.g. Hurlbert & Lombardi 2009a, b, 2012, 2016).

    But back to the ASA statement. This a pretty good statement considering the cats that had to be herded. No actual errors that I can detect, but perhaps a few weaknesses:

    1. Saying “a p-value near 0.05 taken by itself offer only weak evidence against the null hypothesis” is meaningless because “weak” can only be defined by comparison with something else. It is not “weak” evidence relative to a p-value of 0.30, for sure! Presumably this was a sop to Bayesians who would, if candid, say it should be compared to an ‘objective’ Bayesian posterior of 0.05. Two of our papers (Hurlbert & Lombardi 2009a,b) exposed many cases where Bayesians have used fallacious logic and word games to discredit p-values. But now ASA has officially declared p = 0.05 to be “weak evidence.” What a diabolical tool to put into the hands of rigid, curmudgeonly editors!! Load those Bayesians into the tumbril!

    2. Principle 3 almost gets to the neoFisherian position (Hurlbert & Lombardi ) that alpha’s should not be specified and the term “statistically significant” never used, a position advocated for decades by many top statisticians and other scientists. In the contexts of basic and applied research “binary decisions” are never needed. Quality control contexts, fine. Providing cover to FDA bureaucrats, fine. But in the conduct and presentation of research, never. This principle needs to be clarified by removal the vague waffling and going whole hog neoFisherian.

    3. After all the trouble to be correct and clear in the official statement of principles, the commission perhaps erred in putting ASA’s imprimatur on a rather eclectic list of references. A disclaimer should be added: “Several of these works are considered controversial and some scientists claim they collectively contain many misstatements of fact and illogical arguments. Caveat emptor. They will provide, however, a good entrée to the literature.” No need to get personal!

    I make this suggestion having read at least two-thirds of the works cited – and having pointed out many of the specific problems in them in our 2009 papers in particular.

    Now for a commission on “multiplicity paranoia”!

    Hurlbert,S.H. and C.M. Lombardi. 2009a. Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici 46:311-349. PDF
    Lombardi, C.M. and S.H. Hurlbert, 2009b. Misprescription and misuse of one-tailed tests. Austral Ecology 34:447-468. PDF, Appendix PDF
    Hurlbert, S.H. and C.M. Lombardi. 2012. Lopsided reasoning on lopsided tests and multiple comparisons. Australian and New Zealand Journal of Statistics 54:23-42. PDF
    Hurlbert, S.H. 2013a. Affirmation of the classical terminology for experimental design via a critique of Casella’s Statistical Design. Agronomy Journal 105: 412-418 + suppl. inform. PDF
    Hurlbert, S.H. 2013b. [Review of Biometry, 4th edn, by R.R. Sokal & F.J. Rohlf]. Limnology
    and Oceanography Bulletin 22(2): 62-65. PDF
    Hurlbert,S.H. and C.M. Lombardi. 2016. Pseudoreplication, one-tailed tests, neoFisherianism, multiple comparisons, and pseudofactorialism. Integrated Environmental Assessment and Management 12:195-197, 2016. PDF

  • Statsols said:

    P-values are simply representative of a wide numbers of issues which pertain to real-world statistical analysis since alternative statistical approaches will never be completely robust to misinterpretation even if better. Thus these need to be paired with a real and concerted effort to improve education on issues such as study design and inference. We also need to create better incentive structures for researchers to ensure that these same mistakes are not repeated again.