Home » A Statistician's View, Additional Features, Departments

Comment on ‘Statistics as a Science, not an Art: The Way to Survive in Data Science’

1 July 2015 1,163 views No Comment
Murray Aitkin, University of Melbourne

Murray Aitkin has been a professor of statistics at the universities of Lancaster UK, Tel Aviv, and Newcastle UK and a director of statistical consulting centers at these universities. He also has been a senior statistical consultant to the National Center for Education Statistics.

The interesting article by Mark van der Laan deals with two major issues: the role of consulting statisticians and the role of statistical theory in the kinds of analysis statisticians use.

On the first, I agree entirely with his views, also endorsed by Joseph Bauer (April 2015). As consultants, we need to know as much as is relevant about the substantive research area on which we are being consulted. For senior consultants in general university statistical consulting centers, this can be a formidable responsibility, sometimes requiring re-skilling in new areas of applied research. (My most recent areas were in environmental toxicology, following social networks.) Sometimes the area requires such intensive and extensive training or retraining that the consultant remains in the area and becomes an expert (if he or she is sufficiently able and tactful). Terry Speed is a great example.

On van der Laan’s second technical issue, of the irrelevance of finite dimensional parametric inferences, I have a different view. Edwin Pitman, in his 1979 book, wrote (p. 1):

All actual sample spaces are discrete, and all observable random variables have discrete distributions. The continuous distribution is a mathematical construction, suitable for mathematical treatment, but not practically observable.
References
Aitkin, M. 2008. Applications of the Bayesian bootstrap in finite population inference. Journal of Official Statistics 24:21–51.

Gutiérrez-Pena, E., and S.G. Walker. 2005. Statistical decision problems and Bayesian nonparametric methods. International Statistical Review 73:309–330.

Huang, A. 2014. Joint estimation of the mean and error distribution in generalized linear models. Journal of the American Statistical Association 109:186–196.

Owen, A.B. 1988. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75:237–249.

Owen, A.B. 2001. Empirical likelihood. Boca Raton: Chapman and Hall/CRC Press.

Pitman, E. 1979. Some basic theory for statistical inference. London: Chapman and Hall.

Rubin, D.B. 1981. The Bayesian bootstrap. Annals of Statistics 9:130–134.

Särndal, C.-E., B. Swensson, and J. Wretman. 1992. Model-assisted survey sampling. New York: Springer.

Walker, S.G., and E. Gutiérrez-Pena. 2007. Bayesian parametric inference in a nonparametric framework. Test 16:188–197.

Since every real variable is measured or recorded to a finite precision, no matter how small, its values are on a finite grid, equally spaced if the measurement precision is consistent. For any defined finite population of such variables (and all real populations are finite), the multinomial distribution provides an exact true model, without any assumptions. With a defined sample design, the empirical likelihood of Owen (1988, 2001) provides the sample information, and in the frequentist framework, maximum empirical likelihood provides the information about the finite population parameters of interest. In the Bayesian framework, the conjugate Dirichlet prior provides the corresponding posterior information.

The evolution of statistical theory has been a very slow process. The Fisherian likelihood and its analysis was a revolutionary paradigm change in theory, and the Bayesian computational MCMC developments have greatly strengthened the contribution of the Bayesian paradigm. In the latter paradigm, the multinomial/Dirichlet approach through the Bayesian bootstrap of Rubin (1981) occupies a tiny corner, despite repeated calls by noted Bayesians for its general adoption (Gutiérrez-Pena and Walker 2005, Walker and Gutiérrez-Pena 2007).

The problem in the past has been how to generalize this one-population model to multiple populations and regressions. Aitkin (2008) gave the Bayesian extension to clustering, stratification, and regression and Huang (2014) gave the maximum empirical likelihood extension to generalized linear model regressions. These generalizations provide a powerful tool for analysis without strong parametric response distributional assumptions: They require only a GLM specification (an “assisting model” in the survey sampling language of Särndal, Swensson, and Wretman 1992). We can expect that, over time, these approaches will be incorporated into standard statistical packages, making them available for our substantive research clients and colleagues.

 
 
 
 
 

Mark van der Laan’s Response to Murray Aitkin

I very much appreciate these responses.

Yes, when I refer to parametric models, I meant the unrealistic parametric models loaded with assumptions that are known to be false. Indeed, if the observed data are discrete valued, then a nonparametric statistical model is still described by a finite number of parameters and can thus also be viewed as a parametric model.

Regarding the move toward more nonparametric statistics in our field, it is all about how one deals with the curse of dimensionality due to the dimension of the data and the size of the statistical model for the data probability distribution. The conventional approaches in maximum likelihood and Bayesian learning are nontargeted in the sense that they focus on the estimation of the whole density/likelihood of the data. Beyond that, they often rely on unrealistic statistical models from the outset, thereby starting out with ill-defined target parameters and bias that cannot be recovered.

On the contrary, targeted learning involves first carefully defining the target parameter of the data-generating distribution in a realistic statistical model. This target parameter typically only depends on part of the data distribution, so that the data adaptive estimation can focus on that relevant part of the likelihood. Secondly, the learning of this relevant part of the data distribution should be targeted toward the target parameter so the resulting substitution estimator is minimally biased and asymptotically efficient.

The targeted maximum likelihood estimation (TMLE) method, or, more generally, targeted minimum loss-based estimation, achieves this with a two-step procedure. First, it uses ensemble (Super) learning, fully utilizing the theoretically optimal cross-validation method for selecting among candidate estimators of the relevant part of the likelihood, resulting in an initial estimator of the relevant part of the likelihood. Subsequently, TMLE carries out a targeted bias reduction step that can be achieved with parametric maximum likelihood estimation using a so-called least favorable parametric model through the initial estimator. Both components, Super Learning and the TMLE step, are necessary to obtain an asymptotically linear estimator of the target parameter that allows for influence curve-based statistical inference in terms of confidence intervals and testing, without relying on unrealistic assumptions. In this manner, one can construct a priori specified targeted machine learning algorithms that allow for formal statistical inference based on a normal limit distribution.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.