Type IV Errors: How Collaboration Can Lead to Simpler Analyses
This column is written for statisticians with master’s degrees and highlights areas of employment that will benefit statisticians at the master’s level. Comments and suggestions should be sent to Megan Murphy, Amstat News managing editor, at firstname.lastname@example.org.
Jonathan Stallings is a fifth-year PhD candidate in the Department of Statistics at Virginia Tech (VT). From 2011 to 2013, he was a Lead Statistical Collaborator at VT’s Laboratory for Interdisciplinary Statistical Analysis, where he collaborated on more than 80 projects with researchers needing statistical guidance.
I started my graduate career in the department of statistics at Virginia Tech having had years of mathematical training and undergraduate research under my belt. Like many graduate students, I didn’t like being wrong or uncertain about an answer. This attitude helped me succeed in the classroom, where questions have right and wrong answers, but I was inexperienced when it came to applying what I learned to real problems and felt very uncomfortable.
In my third year in the graduate program, I was asked to become a lead statistical collaborator at Virginia Tech’s Laboratory for Interdisciplinary Statistical Analysis (LISA). I saw this as an opportunity to confront my insecurities and learn how statistics are used in practice. Naturally, I was apprehensive going into my first few meetings with clients I had never met before. I wasn’t sure what they would expect from me or what I should expect from them, and I was terrified to be asked to do an analysis I didn’t know how to do. Looking back, I now realize I was going into these meetings as if they were an exam I couldn’t study for and had little chance of passing.
At LISA, we aim to answer a client’s research question using statistics and refer to ourselves as collaborators, not consultants, to reflect our level of involvement in their project. LISA collaborators seek first to understand the client’s overall goals outside a statistical framework and appreciate the effect of their research to their respective field. We then relate these goals to their collected data or advise them about how to design their data collection to best answer these goals. It is crucial for both the client and statistical collaborator to understand what the data will show if the client’s hypotheses are correct. It is not until this stage is reached that we discuss potential statistical methodologies.
Focusing on the client’s needs and wants outside of a statistical framework is the best way to prevent type III errors, which were introduced by A. W. Kimball’s 1957 paper, “Errors of the Third Kind in Statistical Consulting.” A type III error occurs when the statistician offers the correct statistical advice for the wrong research question. This was a difficult challenge for me; I had to get out of my comfort zone and fight the urge to talk about statistics. Once I became better at it, I realized that expressing interest in the client’s research not only fostered a more comfortable, collaborative relationship, but also gave me greater flexibility in choosing an appropriate statistical analysis.
What makes a statistical analysis “appropriate”? There are many criteria to compare methodologies such as type I errors, power, and validity of assumptions of the data like normality and constant variance. These criteria are meaningless to most clients, especially if they have limited statistical training. Clients I have interacted with are looking for techniques they can understand and give confident, accurate conclusions of the hypotheses. Maybe a latent growth curve model could be used to answer the research question, but if I could answer their research questions using a straightforward ANOVA, why wouldn’t I just do that?
One of my first clients wanted to investigate the potential differences of tumor regression between immunocompetent (a functioning immune system) and immunodeficient (a poor immune system) mice after applying either a placebo or a technique known as irreversible electroporation. I made scatterplots for each group to see how the tumors grew across time and saw a clear trend that supported their hypotheses. Focusing on the data in front of me, I thought a repeated measures model that incorporated the presence of missing data was appropriate and spent a lot of time researching how those models worked. Eventually, I realized the client was not interested in modeling the growth curves; they just wanted to see whether differences existed. Ultimately, we chose to compare individual means at specific days using simple nonparametric tests and successfully answered their research question.
At this point, I would like to introduce what I call type IV errors: when a statistician performs the correct analysis that answers the right research questions when a simpler analysis would suffice. Why is a type IV error something to worry about? If the statistics are correct, isn’t our job done? The issue is that when statisticians commit type IV errors, we are potentially alienating the client from the collaborative relationship and giving them results they cannot use. We also are giving ourselves too much work to do, spending days on something that could take hours or even minutes. At the end of the day, we have wasted everyone’s time if the client doesn’t understand what we did.
A common misconception that increases the likelihood of a type IV error is that more complicated analyses lead to more accurate conclusions and a better chance at getting statistically significant results. Clients have asked me about doing zero-inflated Poisson models, latent growth curves, and structural equation modeling, giving me the impression they think these techniques, which they have only heard about, are better because they are more complicated. Their data might fit the mold of such analyses, but the statistician’s job isn’t fitting models; it’s helping answer research questions.
Simplifying an analysis is sometimes easier said than done; it is an art form that takes a lot of practice. Something that has helped me is not being afraid to get creative with the data. I make as many plots as I can that would support/reject the client’s hypotheses, using these as visual confirmation and motivation for the chosen analyses; histograms and scatterplots are two of my very best friends. I see the raw data as a starting point and allow myself to think outside of what is presented, focusing instead on how I can use the data to achieve the client’s goals.
Building your statistical toolbox with different analyses should be your goal in your graduate school coursework. This includes not only how to do them, but understanding their inner workings so they can be explained to nonstatisticians. Some techniques I commonly implement are bootstrapping, transformations, permutation tests, and simple nonparametric tests like the Wilcoxon-Mann-Whitney rank-sum test and the Kruskal-Wallis ANOVA. Find whatever techniques you are most comfortable with, but be prepared to explain why they are useful and applicable to the client’s project.
Type IV errors should not be seen as a deterrent from using advanced statistical methods to answer research questions. They are a reminder that whatever analysis you choose should answer the client’s research questions and be understood by the client. Focusing first on the client’s research goals has guided me to simpler statistical methodologies and helped me explain the results to the client. Do not be afraid to tell a client you don’t know of an advanced statistical method or be swayed when they suggest this analysis is most appropriate. If you can find a simpler way to answer their questions, they will be thrilled to hear it!