We Are Data Science
“I keep saying that the sexy job in the next 10 years will be statisticians.”
Chief Economist at Google
“I believe statistics has many cultures.”
Distinguished Professor, Texas A&M University
I wish to advance the points that statisticians are already involved and must expand their involvement in business analytics, Big Data, data mining, data science, machine learning, and predictive modeling (analytics). Also, I wish to clarify that any topic involving data analysis, necessitates statistical thinking, statistical techniques, and statistical assumptions.
The current bizarre and restrictive pronouncements about what we do aim to limit us to a rigid set of ‘small data’ tools. This distasteful propaganda, in the form of straw-man characterizations of the established fields, is intended to differentiate some new vision of how to work with data without knowing statistics.
Experiencing similar discouraging statements from fellow ASA members has its own special flavor. I recently attended JSM 2013 and read articles in Amstat News—“Aren’t We Data Science?” and “The ASA and Big Data” —and Significance—“Big Data and Big Business: Should Statisticians Join In?” As a result, I have a new list of things I am not doing and cannot do—primarily data science, Big Data, and data mining.
These comments from ASA members limit our credibility in the eyes of employers and recruiters—providing aid and comfort to those who covet our role in the corporation. This comes at a time when ASA’s left hand (Ron Wasserstein, et al.) is offering the long-awaited PStat® and Conference on Statistical Practice, and when statistics departments are offering an MS in analytics. Of course, many ASA members do not think in self-limiting ways or confine themselves to classical techniques. As a group, we statisticians are not homogeneous. The unstated issue is, however, whether we can remain one profession.
The Business Analytics Role
Corporations house two data functions. Roughly put, the IT silo manages data and business quants analyze the data. I will use the less-tainted term, business quants, to denote those econometricians, industrial engineers, operations researchers, statisticians, etc., who apply the tools of complete and incomplete information. Our jobs are to help run a business. This involves making and supporting decisions, and this requires mastering the business and extracting information by any means possible.
We must augment our publication-centric education to master all three toolboxes—mathematics, statistics, and algorithms (logic, heuristics, optimization)—or be crowded out by more strident professionals. The mathematics we use consists of numerical tools for making deductions from complete numbers, as in E = MC2.
Statistics consists of making inferences based upon incomplete information arising from incomplete or poorly measured data. The pride of our most powerful and indispensible statistical assumptions is the error term, as in E = MC2+ ε.
No statistics means no error term, no inference, and no corresponding statistical assumptions for incomplete information.
Data Science and Big Data Require Statistics
We reside in a global community possessing a low statistical literacy. As Deming said, “The nonstatistician cannot always recognize a statistical problem when he sees one.” We should expect depictions of data science and Big Data void of an understanding of statistics.
The business press and Big Data vendors are portraying Big Data as complete information. Instead, it is often excessive incomplete information enabling a paradigm shift in approach and methodology for certain applications, but not in statistical thinking or statistical assumptions. Non-quants are unfamiliar with our three old friends from the statistics tool box: missing values, missing Xs (the wrong data), and measurement error.
Also, we do not want unrefined Big Data! We want information, and this often requires us to reduce Big Data. eBay’s approach to Big Data is typical: keep buying more hardware storage. This allows for searching, reporting, counting/summarizing, and, at a slightly higher conceptual level, segmentation. However, this light analysis is merely descriptive in character; it will take the quants to deliver the promises of Big Data.
Next, we need statistical diagnostics to measure the accuracy and reliability of results.
I second Marie Davidian’s call to arms and the recommendations in her aforementioned Amstat News articles. ASA members, like everyone else, must embrace change. In private industry, government, and all other organizational settings in which we work, statisticians and other quants must be data science generalists and practice every type of data analysis, whether in business analytics, Big Data, data mining, data science, machine learning, or predictive modeling (analytics). To differentiate our value proposition, we must be involved.
Furthermore, an understanding of statistics is necessary to properly lead and organize resources, which can address our concerns about involving the most appropriate professionals. I discuss in greater detail the needed changes in A Practitioner’s Guide to Business Analytics.
Randy Bartlett, PhD, PSTAT®
Author of A Practitioner’s Guide to Business Analytics