The Future of Statistical Publications
The ASA will celebrate its 175th anniversary in 2014. In preparation, column “175”—written by members of the ASA’s 175th Anniversary Steering Committee and other ASA members—will chronicle the theme chosen for the celebration, status of preparations, activities to take place, and, best yet, how you can get involved in propelling the ASA toward its bicentennial.
David Banks is a professor of the practice of statistics at Duke University. He was coordinating editor of the Journal of the American Statistical Association; editor of the Journal of Transportation Statistics; and associate editor for Statistical Methodology, the American Mathematical Monthly, STAT, The Electronic Journal of Statistics, and Environmetrics. He co-founded Statistics, Politics, and Policy; moderates the online repository arXiv; and referees about a dozen papers each year.
An anniversary that ends in ‘0’ is an occasion for celebrating the past. When one ends in ‘5,’ it is an opportunity to plan the future. In that spirit, and responding to an invitation from the 175th Anniversary Committee of the American Statistical Association, I urge that we re-evaluate our publication processes. Electronic media are transforming access to information; it is time for the ASA to decide how to manage this change.
I fear our current approach to publishing does not serve us well. It takes too long, so our best scientists are driven to other journals in faster disciplines. Refereeing is noisy and often achieves only minor gains. And the median quality of reviews is deteriorating due to journal proliferation, pressure on junior faculty to amass lengthy publication lists, and the slow burnout of conscientious reviewers.
Our present paradigm has other structural problems. Published articles are static—correction and improvement are impossible. Published research often does not replicate, which is hard to flag. Readers must reach too far to find the code behind the article, and the data are nearly impossible to obtain. Correct work that is not sufficiently novel is excluded and lost. And there is a large gray literature that cannot be easily accessed or assessed (e.g., federal reports, weighting schemes for official surveys, lecture notes, classroom exams, code documentation, data/metadata, PhD theses).
I am far from the first person to raise these concerns. Karl Rohe has a parable that illustrates many of these issues on Page 16. Larry Wasserman, Jim Pitman, Nick Jewell, Nick Fisher, and Roger Peng, among others, have grappled with various features of the problem. Since 2005, the ASA has formed three committees to study the matter; most recently, Len Stefanski is chairing one, which will make recommendations to the ASA Board in November. Among young statisticians, almost all perceive the inefficiencies of traditional publication and share a common sense of the improvements that are possible.
Change will happen. If we fail to plan ahead, the ASA will be forced to adopt whatever system Wiley or Springer or the American Mathematical Society establishes as the standard, but their interests and needs align imperfectly with ours.
Today’s publication process was essentially invented by Henry Oldenburg, the first corresponding secretary of the Royal Society. He received letters from members describing their research, copied them out in summary form, and mailed those summaries to other members. But his hand grew weary, and he began to send notes of the following kind:
Given his technology, Oldenburg’s stringencies were essential. Printing and distribution costs were the limiting factors; pauca sed matura had to be the standard. An entire economic ecology grew up around those constraints: Publishers set type and sold volumes; societies created editorships and referees; and libraries emerged. Authors and editors created content for free and publishers made fortunes; this was the best solution possible. Until the Internet.
Now, we have fresh choices. Electronic articles can be living documents, as on arXiv; better versions layer on top of the old. Articles may use color and dynamic graphics and be as long as necessary, with detailed proofs and worked-out examples (while reader feedback enforces concision). Article quality can be signaled in multiple ways, either by conventional review or by ungameable rating systems, similar to page-ranking algorithms. Readers can use personalized recommender systems to discover papers. And data, code, and gray literature become easy to access. Space limitations prevent a full catalog of the possible features (and bugs), but I expect Stefanski’s report will be more comprehensive.
I invite readers to comment on this topic. Just post your thoughts below.