A Statistical Commons

1 April 2014
Nathaniel Schenker

Welcome to the second in my series of interviews with the workgroup leaders for my 2014 presidential initiatives. My guest this month is David Banks of Duke University. In mid-2013, I appointed David as chair of a workgroup charged with developing a prototype for the long-discussed statistics portal, an enhanced repository for important statistical information that would not ordinarily reside in a journal. The goal was to explore possibilities for the portal and test its feasibility. David and I recruited a terrific group of people to work on this task with him.


David Banks

A Historical Preface
I want to embarrass Nat a little with my gratitude. In 2008, I urged the ASA Board to consider implementing a new service to the members by creating an electronic community in which people could share documents and discuss ideas. I explained the advantages of such a service cogently, articulately, and concisely, and the board listened politely. Tony Lachenbruch was the president that year, and he had been my boss at the FDA a few years previously, so he was overfamiliar with my spiels. He listened with the Christian patience of a suffering saint, and when I was done, the room was stone-cold silent and Tony started upon the next agenda item. At that point, Nat spoke up. Like Hector before the gates of Troy, Nat is a defender of lost causes, and he said, “Maybe we should discuss this a little.” And so the board did.I think the discussion opened a few minds to new possibilities. There was no groundswell of support, but at least the ground had been prepared. Several critical issues were identified: oversight, costs, and the possibility that an electromagnetic pulse event would erase all statistical knowledge. In the years following 2008, the concept of this sort of community—the Statistical Commons—was raised with the board on one or two occasions (think Cato the Elder declaiming “Carthago dalenda est”). And each time, a little progress was made, some of the issues got sorted out, and the ASA Board became more comfortable with the idea that electrons are not a wild-eyed usurpation of professional discourse.

Last year, as incoming ASA president, Nat appointed workgroups. One of them was charged with creating a small proof-of-concept commons. I am honored to chair that, and the other task force members are Will Guthrie (NIST), Richard Levine (San Diego State), Victoria Stodden (Columbia), Ken Van Haren (Square) and Larry Wasserman (Carnegie Mellon). We are making progress, albeit slowly, and this column lays out some of the possibilities and challenges.


NS: David, I appreciate your having taken on this project. What’s a portal?

DB: It’s what we used to call the Statistical Commons. Extensive market testing showed that “portal” conjured negative associations with the movie “Poltergeist.” Statistical Commons is more positive, evoking a sense of sharing.

NS: OK, then what’s the Statistical Commons?

DB: I hope it will grow into a space where members can share material and provide feedback. Potentially, it could be a place where educational materials, gray literature from the federal government, open source books, ASA discussion threads, software, and data could be deposited or linked.

NS: Please elaborate.

DB: Sure! One of the many dull jobs I do is generate test and quiz questions for introductory statistics classes. Each semester, thousands of 101 teachers attempt to breathe fresh phrasing and cool examples into t-tests and probability problems. If I had access to a high-quality question bank, with worked solution keys, it would probably save me about six hours a week. Similarly, it would be nice to have a repository for lecture notes, educational data sets, and so forth.

Regarding federal gray literature, I have a parable. Senior statisticians in the government work force are a dwindling species. Each year, many retire, and few advance to replace them. Somewhere in the bowels of the Department of Transportation, there is an elderly, pallid statistician preparing for her farewell party, and, on her desk, is a binder containing the only available copy of the weighting scheme for the 1980 Commodity Flow Survey. It would be in everyone’s best interest if that document, and others like it, could be posted online so future surveys could be cross-walked against earlier results.

Collaborative publication and freeware books are important social experiments and I hope the ASA will step up and participate. I would love to break the stranglehold that paper publishers have maintained on intellectual property, and the commons may provide a vehicle to achieve that.

The ASA has recently provided the capacity for online discussions among section members, and in doing so, they have created a monster. I like the concept, but it has become intrusive. Recently, someone posted a question to the ASA Section on Statistical Consulting thread about whether SAS code would persist into the future. This spawned a spasm of spam that engulfed all the section members (some people get their discussion bundled in daily, weekly, or monthly bursts, but, even so, there was quite a lot of it). The commons could automatically shunt such discussion to a private room. Members would know about the topic from the first post and then choose to follow up or not.

The value of shared code and data is obvious.

NS: Hasn’t much of this been done before?

DB: Yes, certainly. StatLib provides some curated data sets, but very little code. Michael Lavine has a free online book, titled Introduction to Statistical Thought (PDF download); Jay Kadane has made Principles of Uncertainty freely available; and the new COPSS book, Past, Present, and Future of Statistical Science, (PDF download) is also freely available. But it would be nice to have these, and others, indexed in one place, with short reviews to guide ASA members about their content and level. I don’t know of any central archive for federal statistics documents, but the ASA Section on Teaching Statistics in the Health Sciences has already created a small, specialized portal for educational material.

The advantage of the Statistical Commons is that it puts all of this kind of material in one place, making it easy to curate, update, and propagate.

NS: You mentioned some potential difficulties, such as cost and oversight …

DB: Certainly. Any enterprise of this kind will require volunteer labor to generate content and volunteer editors to moderate that content. This isn’t new—the ASA has been orchestrating such services for many other projects, especially publications. We know how to do this. Whether it will catch on is a key question, but I’ve talked to a lot of statisticians who are younger, smarter, and cooler than us, and they are enthusiastic.

The cost should be minimal. The ASA already has IT experts on staff, and if this is built properly, it should not put a large footprint on their time.

NS: Thanks, David. This is an exciting experiment, and I hope it pans out.

DB: Thanks, Nat. I really appreciate your help and friendship in this effort. Frankly, I’ve always regarded us as the Harold and Kumar of the ASA—you, of course, are Harold.

