## A True Revolution in Statistics

*Hunter Glanz*

The field of statistics has had to adapt for its entire life—different types of data, tiny or huge data sets, technology, etc. The types of problems and methods considered computationally intractable continue to change with the evolution of modern computers. Being able to “play in everyone else’s backyard” means we in the field of statistics must advance and keep up with progress in other fields. While the mathematical development of new theory and tools is important, collaboration and interdisciplinary work are the lifeblood of our field. Data will always exist and traditional statistical tools will always be useful, but the atmosphere of collaboration and perception of statistics are undergoing noticeable changes.

The terms “Big Data” and “data science” describe recent developments involving our field, but the current revolution in statistics transcends the scope of either of these. Signs of this revolution have been popping up in Nate Silver’s story and the designation of 2013 as the International Year of Statistics. An increasing number of data modeling competitions, including the 2009 Netflix Prize and those hosted by kaggle.com, attract more than just those with backgrounds in statistics. Computer scientists, econometricians, statisticians, and others vie for interviews, money, and other prizes while answering problems that often resemble current research questions. Academic programs in statistics continue to grow as it becomes clearer that essentially every job requires quantitative skills and even statistics-related knowledge. For some time, statisticians have filled valuable positions in research and consulting efforts, and it has always been a mission of our field to increase the perceived importance of what we can contribute in projects involving data.

Instead of just being considered an achievement, the declaration of this year as the International Year of Statistics also should be recognized as a herald of the publicity our field has been dreaming of. Research and work in statistics is an exercise in metacognition. We view and interact with the world via data received through our senses. Learning and living based on that learned knowledge is statistics: *the science of analyzing data* OR *the study of collection, organization, interpretation, and presentation of data*. We have merely given the act of living a rigorous vocabulary. As further evidenced by the developments described earlier, this perspective has become widespread. That is, other fields and people in general have come to understand that any project will involve statistics without any extra consideration necessary. Multiple fronts to this revolution necessitate careful, but efficient, thinking and potentially new behavior moving forward.

Recently, the field of statistics seems to be sharing the market for data analysis with an increasing number of people and disciplines. Data science and machine learning, for example, appear to maintain strong ties to statistics while also incorporating their own flavors. The first dimension of effects of the revolution will involve the teaching of statistics and the preparation of future statisticians. As statistics educators, our vulnerability to the *depth versus breadth* trade-off will seemingly grow as statistics students begin to face the decision between, say, more courses in statistics and courses in computer science to stay technologically savvy. This might look to some like yielding ground when the types of positions and problems statistics programs prepare students for have not changed much. On the contrary, the field of statistics should use this opportunity to lead students and professionals into this new age of data analysis by establishing a novel synergy with computer science, machine learning, and related fields based on newly emerging problems.

By definition, statistics is *the* data science. As such, the take-home message of any introductory statistics class or statistics program should be, “Learning statistics will make you a significantly better learner in general.” With this knowledge, we can rest a little easier as long as we equip statistics students with the necessary foundation to approach other, possibly more advanced, problems. Messages like this one transcend specific course curriculum, but are at least as valuable. Of course, faculty and current industry statisticians must adapt in other ways to this transformation in the realm of data analysis.

Whether an educator or not, statisticians tackle current problems in research and industry every day. While incorporating new methods and data analysis tools has always been a responsibility of statistics professionals, care should be taken as the number of sources of such methods increases. In some people’s quest for the almighty low *p*-value or minimum error, important details can take a backseat unnecessarily. More and more research solutions include algorithm running times and the potential of other solutions. To be supportive and stay at the front of the revolution, we should continue to emphasize these types of comparative studies. Certain methods may be able to answer or accommodate a slightly different set of questions than others can. Interpretability can vary widely across techniques and their corresponding results. Collaboration remains important, and being proactive can include novel statistical research, applying pre-existing statistical methods in novel ways to new problems or in new fields, and translating the language surrounding data analysis in multiple fields into a common diction.

Statistics faces a new challenge in this broadening of awareness and applications. To maintain the momentum and bounty of this revolution, statisticians should embrace the recent atmosphere of technological advancement that resulted in an explosion of new data and problems. The modern statistician’s toolbox must grow to include new computational and theoretical tools. Let the shifting of perception and the increased demand for a more advanced quantitative skill set fuel the evolution our ever more beloved science of data: statistics.

**Hunter Glanz** earned his bachelor’s in statistics and mathematics from California Polytechnic State University in 2009 and his master’s in mathematics from Boston University in 2012. He is now working toward his PhD and expects to graduate in May. Glanz’s work has been in Bayesian and computational statistics with applications in geography (remote sensing).

Vincent Granvillesaid:The concept of p-values is almost never used in data science. I guess not because of ignorance, but because we use a different wording, a different metric, but they both serve a similar purpose. In data science contexts, many time there is no underlying models, in other words what is done is model-free inference and confidence intervals. You can check the “Analyticbridge First Theorem” as an illustration. Rather than p-values, I frequently use “predictive power”, a synthetic metric that I created, and which is a bit similar to the natural metric called Entropy.

Randy Bartlettsaid:I agree and I will write it plainly, applied statisticians are already data scientists. In the field, we have been adapting to the realities for decades. Corporate statisticians have lacked the support of a professional organization. They tend to caucas with their industry rather than with ASA. Now we have the PSTAT & INFORMS CAP; and CSP and the INFORMS BA conference. It is a start, yet we need to go farther if we want ASA to remain relevant.

The big problems holding back statisticians in the field are around leadership, organization, planning, and decision making. We face more competition from nonstatisticians than ever before. Training from ASA in complementary topics, other than more statistics, might make a difference.