## Statistics in Science

## No shortcuts when collaborating

###### This column is written for statisticians with master’s degrees and highlights areas of employment that will benefit statisticians at the master’s level. Comments and suggestions should be sent to Keith Crank, the ASA’s research and graduate education manager, at *keith@amstat.org*.

Contributing Editor

Shari Messinger is an associate professor of biostatistics and director of the Biostatistics Collaboration and Consulting Core at the University of Miami Miller School of Medicine. She formerly served as director of biostatistics for the University of Miami GCRC and biostatistics director for the Diabetes Research Institute.

As a collaborating statistician, I am often asked by researchers to “run their data” so they can get the answers they seek corresponding to a particular investigation. What they are really requesting (usually) is that I perform data analysis to address their research questions—some of them quite vague—and that I first need to determine an appropriate analytic approach based on the nature of the investigation, study design, distributional properties of the data, and particular research objectives. Although I know this is really what they are requesting, I am not sure if they are really aware of what is involved.

I currently direct an academic biostatistical consulting core that charges an hourly rate. The following is an edited excerpt from an email I received regarding a bill sent following an analysis of two data sets:

This illustrates a common misunderstanding as to what statistical support is and clearly expresses the belief that statistical support is merely the “labor” of plugging data into a computer and pushing a button. Too many researchers regard statistics as a useful tool, but they think there is a single, straightforward way to address any particular question and that no sound judgment or creativity is required to achieve excellence. One may think of this like needing a Phillips screwdriver with a specific head size and going to the tool box to pick the right one. However, statistical analysis is not this way. It is the application of scientific methods—statistical methods, specifically—to the research objectives at hand. We statisticians must continually educate our subject-matter collaborators as to what statistical science really is.

Statistical science is much more than data analysis, and involves the incorporation of statistical methodology at all stages of research, requiring scientific expertise in the field of statistics. Appropriate use of statistical methodology in data analysis means the data should be analyzed in a way that is both scientifically and statistically reasonable. The statisticians are, themselves, scientists collaborating in research, and are using their statistical expertise in determining and applying the appropriate methodology for rigorously addressing important research questions with excellence. The time invested often requires the following:

- Review of the research for basic understanding of the science
- Review of the data to understand the distributional properties of the variables collected
- Determination of the appropriate methodology to apply in analysis corresponding to the hypothesis and design of the investigation
- Programming of the analysis using appropriate statistical software (specific to the particular data set)
- Review of the analytic results
- Reporting of the results

The time invested for a particular data analysis can take hours, or it can take months. This depends on the research questions, the study design, the properties of data gathered, and the target audience that will need to understand the results.

As collaborating statisticians, it is up to us to educate the research community, making them aware that our contributions involve the incorporation of statistical science to their research. We can do this by communicating with them and explaining what goes into providing high-quality collaborative statistical science that supports outstanding research programs. We must interact with the research team as collaborators, discussing the statistical issues of all aspects of the investigation. Everyone wins when the whole team understands how to fully incorporate statistical science into the entire research process.

I replied to Dr. Doe’s email by making these points, almost verbatim. His misunderstanding is common, especially for newer investigators. Thankfully, most of them take kindly to learning about statistical science and this improves their research. What we do is important; there are no shortcuts.

chalachewsaid:Dear sir/madam

this is Chalachew from Ethiopia and i am graduated my BSC. In statistics in Ethiopian recognized university on July 2010 with an interesting CGPA.and currently i am employed at national bank of Ethiopia as a junior research officer. I have the ambition to continue MSC. In bio statistics, econometrics or other statistics related fields. I am waiting full scholarship including all expenses including travel fees. so if u gave me this chance i will be effective.

Thank you for your regards,

Steven J. Piercesaid:I help run a statistical consulting service for a large university. I also occasionally encounter clients who seem to think we should be able to rapidly turn around results with minimal expenditure of time (and therfore cost). Educating them about what really goes into doing high-quality statistical work that answers their research questions appropriately is an ongoing struggle. We generally try to negotiate fees in advance (including expected number of hours of effort), so that the client can decide about whether the deliverable he or she will receive is worth the expected cost. When calculating the expected hours of effort, I include time for things like client meetings, literature review, exploratory analysis, modeling, preparing tables & figures, and contributing to manuscripts or presentations. I find that creating a detailed task breakdown with estimatesd effort for each task helps us to avoid underestimating the amount of work involved. It also helps the client understand why their request to “please run the analysis” is really not a 2 hour job.

Wayne G. Fischer, PhDsaid:Huh. I find it very interesting that neither Ms. Messinger nor Mr. Pierce talked (explicitly) about the most critical aspect statistical collaboration: working with clients to *design* their research studies in order to answer the proposed research questions in the most efficient, effective way.

And a key part of that is not only being familiar with the underlying science of the research, but (very importantly) understanding the *context* of the processes from which the data are to be drawn.

CQ Deng, PhDsaid:I used to work in medical center to provide the biostatistical consulting services to faculties and research fellows. In many cases, the statistical consultation or statistical analyses were sought after the data had already been collected. Statisticians were asked to fix the issues in related to (really) the study design or data collection. There was often no pre-planned statistical analysis plan. The post-hoc analysis, exploratory analysis (or using a bad word, fishing expedition) were not uncommon.

Shari Messinger, Ph.D.said:I agree with Dr. Fischer’s comment completely concerning the importance of statistical collaboration in the design of research investigations. The article states that “Statistical science is much more than data analysis, and involves the incorporation of statistical methodology at all stages of research” and that “we must interact with the research team as collaborators, discussing the statistical issues in all aspects of the investigation.” Unfortunately many of the investigators that contact statisticians for support do so after their data is already collected. We do try to communicate the importance of statistical considerations and collaboration in all phases of research, through lectures we offer to educate the research community at our institution. It’s a matter of educating researchers about the importance of statistical science in all stages of research.

Tom Spradlinsaid:And of course, another important task of the consulting statistician is the elicitation of the investigator’s prior probability distribution on the parameter of interest. This allows the statistician (1) to do a good Bayesian analysis of the data and (2) to be sure the investigator is involved in the data interpretation.

Megansaid:Yes, these are valid points for the “higher ups” of statistical consulting. However, there are several other issues to consider when running an analysis. The format of the data, the “cleanliness” of the data, adequate computing resources (think GWAS), and taking the proper care when preparing the data for analysis are all concerns as well. And you cannot do these things without knowledge of statistics (i.e. this is not just a push button kind of thing). It is said that 90% of the “analysis” work is doing these things and only 10% is the actual analysis.