Trilobites and Us
I recently attended the IYS 2013 workshop, “The Future of Statistical Science.” The themes were familiar: what an excellent job statisticians are doing in many areas; what great potential we have for continuing and expanding this good work; and, finally, we need to change our ways, and if we don’t, others will get to this new good work before we do and we’ll be consigned to the dust bin of history.
The difference in 2013 is that the others are not computer scientists, machine learners, or data miners, but data scientists. Another difference is that some members of our profession think the threat is real this time and that perhaps the absorption of statistics into data science is the way we must go.
First, the excellent job we’re doing. That was illustrated in the fields of genomics, cancer biology, the study of diet, the environment and climate, in risk and regulation, neuroimaging, confidentiality and privacy, and autism research. The talks on these topics were given by duos: a subject-matter specialist and a statistician.
… [H]as there really been a fundamental shift around us, so that our old clumsy ways of adapting and evolving are no longer adequate?
It was an impressive testament to the power and value of our subject. If I were 50 years younger and beginning my career, I’d have been inspired by it all. However, many important application areas were not illustrated at the workshop, including social, agricultural, government, business, and industrial statistics. I took these omissions as a necessary consequence of the limited time available. But also, it seemed to me that inclusion in the program was a sign of being a non-traditional application area, one more likely to capture the imagination of the media representatives present and the funding agencies that will read the report about the workshop. After all, I think the workshop was for them, not us.
The break-out sessions explored our challenges. We learned that the community of statisticians has been guilty of the following:
- A. Poor marketing, for not everyone who should know what we have to offer does know
- B. Missed opportunities, with automatic translation, handwriting recognition, document and much image analysis being representative examples
- C. Failing to emphasize that applied statistics is not performed in a vacuum and that our students should become immersed in at least one substantive subject-matter area
- D. Insufficient emphasis on computing, in particular for dealing with the very large data sets becoming increasingly common
- E. Conservatism and rigidity, especially in relation to drug development
- F. Generally poor teaching, particularly to large classes of non-specialists
- G. Not delivering what Silicon Valley wants, which perhaps involves adopting a more engineering approach to our work
- H. Failing to articulate our core to the world at large
And yet, the number of students wanting to major in statistics is shooting up all around the world, and the demand for statisticians everywhere far outstrips the supply. Why the disconnect? Are we doing such a bad job that we need to rename ourselves data scientists to capture the imagination of future students, collaborators, or clients? Are we so lacking in confidence in ourselves, our colleagues, and our core discipline that we shiver in our shoes the moment a potential usurper appears on the scene? Or, has there really been a fundamental shift around us, so that our old clumsy ways of adapting and evolving are no longer adequate?
I see items a) through h) above—and the many others that could be added—as being like my school reports, which invariably said, “Can do better.” Perhaps we might add “Must do better.” Of course, we can and should do better. We need to adapt, to evolve, as we have been and will continue to do. Look at our history. We have to steer a path between the Scylla of complacency and sclerosis, of resting on our laurels, of reluctance to change, and the Charybdis of frantic change, of forgetting where we’ve come from and where we are going, of always trying to wear the latest fashion.
I see no evidence that the view of data science being promoted by its enthusiasts has any prospect of replacing our discipline in the diverse areas in which it has become central, many of which were named above. Google, Microsoft, Apple, Amazon, Walmart, the National Security Agency, the UK Government Communications Headquarters, and other organizations like them will surely present new and great challenges of a statistical nature, but that will never be more than a small part of what we can do. Furthermore, it is rather obvious that, as a profession, we are unlikely to provide enough qualified people to meet their needs, so others can and should move in to help them do what is needed.
I think we have a great tradition and a great future, both far longer than the concentration span of funding agencies, university faculties, and foundations—people who play zero-sum funding games across disciplines. We might miss out on the millions being lavished on data science right now, but that’s no reason for us to stop trying to do the best we can at what we do best, something that is far wider and deeper than data science. As with mathematics more generally, we are in this business for the long term. Let’s not lose our nerve.
In the year 2013, we celebrated the 300th anniversary of Bernoulli’s Ars conjectandi, the 250th anniversary of Bayes’ Essay, the 200th anniversary of Laplace’s Essai philosophique, the 150th anniversary of Galton’s mapping the weather, the 101st anniversary of Fisher’s clarion call for maximum likelihood, and the 51st anniversary of Tukey’s The Future of Data Analysis. Where are the Bernoullis, Bayes, Laplaces, Galtons, Fishers, and Tukeys of data science? Of course the answer is that their B, B, L, G, F, and T are ours. Must we give away the farm, or the family jewels?
Let’s wait for 10 years and see who is still talking about Big Data and data science. The former can only be said once, and now it has been said. As for the latter, can it really be true that respected members of the statistics profession have entertained the idea of renaming their academic home department of data science? “What’s in a name?” asked Shakespeare, to which I add, “Did any species ever avoid extinction by adopting a new name?” No, they adapted, they evolved, and so must we.
Of course, I might be wrong. Perhaps the last trilobite thought to herself at the end of the Palaeozoic age, “I tried to evolve, but things were changing too fast for me.” But trilobites lasted for 300 million years and were on every continent on earth. I’d be happy with that for statistics.