“Though many statisticians have first-rate computing skills, stat should reach out to CS for collaboration in advanced areas, such as the R project is doing with CS compiler experts.”

I give some complementary insight to Norman’s excellent comments.

One venue (there are many others now) is the Conference on the Interface on Computing Science and Statistics. Three keynote speakers at the Interface, Brad Efron, Jerry Friedman, and Leo Breiman, had all warned 10+ years ago that CS would overtake Statistics for a large number of problems, particularly the newer, far larger, problems.

If you read the papers in conferences such as Uncertainty in Artificial Intelligence, Neural Information Processing Systems, International Conference on Machine Learning, and a number of machine learning journals, you will notice that, by removing 20% of less of the papers, the papers are basically statistics. The difference is that the papers apply computational methods that are considerably beyond methods that many statisticians are capable of doing. When I first read the papers 12+ years ago, I immediately thought “Why should CS folks (particularly ML ones) even bother with statisticians?” and “Will mainstream statisticians ever be able to do more than a small fraction of what the ML folks are doing?”

On a more specific level, Michael Jordan showed how to do the theory associated with a hierarchical mixture of ‘experts’ (i.e., mixture of statistical mixture models). Jordan also showed how to do the computation using variational inequalities. Thomas Minka and John Lafferty extended both the theory and the speed of applications using an Expectation-Propagation Algorithm (related to EM-type techniques). We expect breakthroughs on theory. We should also expect breakthroughs on the computational algorithms that are properly tied in with the theory as with the above work.

One place where most CS folks are comparatively weak is where statisticians are even weaker: Working with exceptionally large files using conventional or unconventional statistical models. In some situations, we might want to clean-up and do analysis on a set of files using the Fellegi-Holt Model of Statistical Data Editing (JASA 1976) or the Fellegi-Sunter Model of Record Linkage (JASA 1969). Unfortunately, the type of integer programming methods of the FH edit model and the search/retrieval/comparison and approximate string comparison methods of the FS model have almost entirely been taken over by the CS folks. The issue is that many of the computational algorithms needed for the FH model and the FS model are nearly the same as the algorithms needed for conventional statistical models.

]]>By the way, pass me the Statistical QDR (Qualification, Diagnostics, & Review).

]]>The title made it so that I could not resist offering two future ‘lies’ ahead:

1. The loudest in the software industry will claim that their magic software replaces statisticians. Just put your data in and press the green button. You do not even need to know what problem you are solving; that good.

2. Statistics Denial: The loudest in IT will continue to claim that they can analyze data without statistics.

Two old lies, that may have died:

3. There are these Big Data techniques that do not rely on statistics.

4. Big Data will replace the scientific method.

Statistics Losing Ground to Computer Science | Amstat News

]]>Statistics Losing Ground to Computer Science | Amstat News

]]>As a former engineer working in a linguistics environment I cannot believe the lack of enthusiasm, the unwillingness to experiment, the conservatism. As a result I believe that we should be examining what kind of people we are accepting into traditional fields, and what is happening back in the classrooms of schools that whole fields of people end up being considered conservative (although, I might add, not all members of those fields are conservative).

]]>