Home » A Statistician's View, Departments

Letter to the Editor

1 January 2010 1,202 views One Comment

Dear Editor,

In my ongoing search to find real data that I can use in my teaching and textbook writing, I have spent literally thousands of hours over the last few years perusing professional journals. Most articles that include the results of a statistical analysis report only summary quantities. Since virtually all journals now include the email address for a corresponding author, I have recently been writing to those authors to request selected data. The results have been pretty discouraging. In the month-long period from October 20 to November 20, I sent about 25 messages. As of the time I composed this letter, I had heard back from only a handful of people, and no one has yet provided data. Here are [some] of the responses I received:

“I’m sorry, but I will need to hear an appropriate justification for your request. I will have to review the data files to look for what you want. It will take some work so I think that it is not unreasonable for me to request that if I’m going to have to do a fair amount of work, I would like to know why you think I need to do this.” (N.B.: I did explain in my initial communication why I wanted the data.)

“I have gone back and looked to see if we are able to release raw data. Unfortunately, the industry-sponsors for the program are really hesitant to release raw data on part reliability.”

“I’m at the end of my dissertation and I am working nonstop. Don’t have time.”

“I am flattered that you would like to use our data. However, I no longer have the actual data in my possession so, unfortunately, I can’t send it along.” (N.B.: The paper appeared in 2008.)

“The ethics committee protocols do not allow me to forward any raw data to anyone, even if it is anon[ymous]. This is a growing trend in the UK at least, anyway.”

“Unfortunately, I am currently unable to access the data due to the version of SPSS I am currently using (versus the version the data is originally stored on).”

I’m not sure whether my difficulties in obtaining requested data can be attributed to my not being in the same discipline as the contact people, or whether they are worried that I might question their analyses (Indeed, I have found some significant errors in analyses once the data were available.), or whether there is some other explanation. But it seems entirely reasonable to me that the authors of an article to be published in a nonproprietary journal have an obligation to make their data available to anyone requesting it. I wonder if the American Statistical Association might undertake a campaign to make this a reality. Perhaps journals could be strongly encouraged not to publish an article until the authors have provided all the data, which could then be placed in some sort of data repository, accessible to a wider audience. Perhaps other ASA members have ideas about this issue and will share them with the statistical community.

Jay Devore
Professor, Department of Statistics
Cal Poly State University, San Luis Obispo

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

One Comment »

  • Stan said:

    If you would like the data ask for it. If you are upset when you do not get it, write the journal and the funding agency. Authors as a matter of professional courtesy should provide data, but they have little or no incentive to do so. There is a small, comprehensive National Academy book on data sharing, Board on Life Sciences, Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences (National Academies Press, Washington, DC, 2003) (see http://www.nap.edu/books/0309088593/html/). The book is free as pdf.

    Again, the targets for improvement are funding agencies and journal editors.