Home » Additional Features, Featured

Trilobites and Us

1 January 2014 2,448 views 3 Comments
Terry Speed

Terry Speed

I recently attended the IYS 2013 workshop, “The Future of Statistical Science.” The themes were familiar: what an excellent job statisticians are doing in many areas; what great potential we have for continuing and expanding this good work; and, finally, we need to change our ways, and if we don’t, others will get to this new good work before we do and we’ll be consigned to the dust bin of history.

The difference in 2013 is that the others are not computer scientists, machine learners, or data miners, but data scientists. Another difference is that some members of our profession think the threat is real this time and that perhaps the absorption of statistics into data science is the way we must go.

First, the excellent job we’re doing. That was illustrated in the fields of genomics, cancer biology, the study of diet, the environment and climate, in risk and regulation, neuroimaging, confidentiality and privacy, and autism research. The talks on these topics were given by duos: a subject-matter specialist and a statistician.

… [H]as there really been a fundamental shift around us, so that our old clumsy ways of adapting and evolving are no longer adequate?

It was an impressive testament to the power and value of our subject. If I were 50 years younger and beginning my career, I’d have been inspired by it all. However, many important application areas were not illustrated at the workshop, including social, agricultural, government, business, and industrial statistics. I took these omissions as a necessary consequence of the limited time available. But also, it seemed to me that inclusion in the program was a sign of being a non-traditional application area, one more likely to capture the imagination of the media representatives present and the funding agencies that will read the report about the workshop. After all, I think the workshop was for them, not us.

The break-out sessions explored our challenges. We learned that the community of statisticians has been guilty of the following:

  • A. Poor marketing, for not everyone who should know what we have to offer does know
  • B. Missed opportunities, with automatic translation, handwriting recognition, document and much image analysis being representative examples
  • C. Failing to emphasize that applied statistics is not performed in a vacuum and that our students should become immersed in at least one substantive subject-matter area
  • D. Insufficient emphasis on computing, in particular for dealing with the very large data sets becoming increasingly common
  • E. Conservatism and rigidity, especially in relation to drug development
  • F. Generally poor teaching, particularly to large classes of non-specialists
  • G. Not delivering what Silicon Valley wants, which perhaps involves adopting a more engineering approach to our work
  • H. Failing to articulate our core to the world at large

Challenges indeed.

And yet, the number of students wanting to major in statistics is shooting up all around the world, and the demand for statisticians everywhere far outstrips the supply. Why the disconnect? Are we doing such a bad job that we need to rename ourselves data scientists to capture the imagination of future students, collaborators, or clients? Are we so lacking in confidence in ourselves, our colleagues, and our core discipline that we shiver in our shoes the moment a potential usurper appears on the scene? Or, has there really been a fundamental shift around us, so that our old clumsy ways of adapting and evolving are no longer adequate?

I see items a) through h) above—and the many others that could be added—as being like my school reports, which invariably said, “Can do better.” Perhaps we might add “Must do better.” Of course, we can and should do better. We need to adapt, to evolve, as we have been and will continue to do. Look at our history. We have to steer a path between the Scylla of complacency and sclerosis, of resting on our laurels, of reluctance to change, and the Charybdis of frantic change, of forgetting where we’ve come from and where we are going, of always trying to wear the latest fashion.

I see no evidence that the view of data science being promoted by its enthusiasts has any prospect of replacing our discipline in the diverse areas in which it has become central, many of which were named above. Google, Microsoft, Apple, Amazon, Walmart, the National Security Agency, the UK Government Communications Headquarters, and other organizations like them will surely present new and great challenges of a statistical nature, but that will never be more than a small part of what we can do. Furthermore, it is rather obvious that, as a profession, we are unlikely to provide enough qualified people to meet their needs, so others can and should move in to help them do what is needed.

I think we have a great tradition and a great future, both far longer than the concentration span of funding agencies, university faculties, and foundations—people who play zero-sum funding games across disciplines. We might miss out on the millions being lavished on data science right now, but that’s no reason for us to stop trying to do the best we can at what we do best, something that is far wider and deeper than data science. As with mathematics more generally, we are in this business for the long term. Let’s not lose our nerve.

In the year 2013, we celebrated the 300th anniversary of Bernoulli’s Ars conjectandi, the 250th anniversary of Bayes’ Essay, the 200th anniversary of Laplace’s Essai philosophique, the 150th anniversary of Galton’s mapping the weather, the 101st anniversary of Fisher’s clarion call for maximum likelihood, and the 51st anniversary of Tukey’s The Future of Data Analysis. Where are the Bernoullis, Bayes, Laplaces, Galtons, Fishers, and Tukeys of data science? Of course the answer is that their B, B, L, G, F, and T are ours. Must we give away the farm, or the family jewels?

Let’s wait for 10 years and see who is still talking about Big Data and data science. The former can only be said once, and now it has been said. As for the latter, can it really be true that respected members of the statistics profession have entertained the idea of renaming their academic home department of data science? “What’s in a name?” asked Shakespeare, to which I add, “Did any species ever avoid extinction by adopting a new name?” No, they adapted, they evolved, and so must we.

Of course, I might be wrong. Perhaps the last trilobite thought to herself at the end of the Palaeozoic age, “I tried to evolve, but things were changing too fast for me.” But trilobites lasted for 300 million years and were on every continent on earth. I’d be happy with that for statistics.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

3 Comments »

  • Randy Bartlett said:

    Agreed, we must not lose our nerve. The feedback is that our profession needs to adapt. This does not require feeling badly (guilty) or changing our name. By the way, I would replace Item G with ‘Not listening to feedback.’

    Those who covet our roles in the corporation are assisted by a mobilized and more powerful marketing constituent. They want to define Big Data, Data Mining, Data Science, Machine Language, et al. as involving data analysis and not statisticians. It would be helpful if more ASA statisticians would ignore and not embrace their anti-statistician marketing and restrictive definitions of statistics.

    Out in the field, we can not afford to discount feedback and to insulate ourselves with denials. If we could keep data analysis out of Big Data, Data Mining, Data Science, Machine Language, Six Sigma, et al., it would be a different story. Instead, we need to be involved. We need to split all new ‘marketing’ coming our way. Data Mining = Statistical Data Mining + Non-Statistical Data Mining. Data Science = Statistical Data Science + Non-Statistical Data Science.

    Statistics is a great profession and we want to keep it whole.

  • Michael said:

    The question is, do those data scientists are doing something statisticians couldn’t do it at the moment? If so, let we incorporate that in the courses. Then we will not lag behind! I don’t think so it is difficult to teach a statistician about data mining; just it needs exposure to big data and intensive courses in computing to manipulate this big data.

  • Joe Verducci said:

    Just as Probability persists as its own discipline, but became widespread through Statistics, so should Statistics become an essential component of any Data Science program that may be drawing more interest. Indeed the training program that Terry calls for — Statistics, Computer Science, and another substantive area — could be called Data Science or Statistics (what’s in a name?). The substance matters, and the _content_ of our courses needs to become more substantive, moving beyond iid sampling or stationary processes into wider notions of generalizabilty.