March Issue Features Call to Action for Statisticians
JASA Book Reviews in the March Issue
- Bayesian Evaluation of Informative Hypotheses—Herbert Hoijtink, Irene Klugkist, and Paul A. Boelen (eds.)
- Bayesian Methods for Measures of Agreement—Lyle D. Broemeling
- Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches—Richard Jensen and Qiang Shen
- Design of Comparative Experiments—R. A. Bailey
- Handbook of Multilevel Analysis—Jan de Leeuw and Erik Meijer (eds.)
- Introduction to Nonparametric Estimation—Alexandre B. Tsybakov
- Lagrangian Probability Distributions—Prem C. Consul and Felix Famoye
- Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis—Michael J. Daniels and Joseph W. Hogan
- Negative Binomial Regression—Joseph M. Hilbe
- Simulation and Inference for Stochastic Differential Equations: With R Examples—Stefano M. Iacus
- Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics—Atanu Biswas, Sujay Datta, Jason P. Fine, and Mark R. Segal (eds.)
- Survival and Event History Analysis: A Process Point of View—Odd O. Aalen, Ornulf Borgan, and Haakon K. Gjessing
- Design and Analysis of Bioavailability and Bioequivalence Studies (3rd ed.)—Shein-Chung Chow and Jen-pei Liu
- An Introduction to Copulas (2nd ed.)—Roger B. Nelsen
- Statistics: A Guide to the Unknown (4th ed.)—Roxy Peck, George Casella, George W. Cobb, Roger Hoerl, Deborah Nolan, Robert Starbuck, and Hal Stern
The March 2010 issue of the Journal of the American Statistical Association features an article based on the remarks delivered by 2009 ASA President Sally Morton at the 2009 Joint Statistical Meetings in Washington, DC. Titled “Statistics: From Evidence to Policy,’’ it contains Morton’s call for individual statisticians and the association to apply their expertise in the service of improved public policy. Morton quotes Peter Orszag, director of the Office of Management and Budget, as indicating that “Policy decisions should be driven by evidence.” She recommends that statisticians, both individually and collectively, engage with important issues through visits to Congress and public statements. As an example of a relevant public policy domain, Morton focuses on health care and the role of quantitative methods in the push toward evidence-based medicine.
Applications and Case Studies
The lead article in this section is “A Moving Average Approach for Spatial Statistical Models of Stream Networks,’’ by Jay Ver Hoef and Erin Peterson. Water is a critical natural resource, and monitoring the network of U.S. waterways is a critical activity. Statistical models for the data obtained from such monitoring need to respect the correlation among measurement points. Spatial models using covariance models based on Euclidean distance may not be valid because two points can be close in terms of Euclidean distance but far apart in terms of stream topology. The authors build a stream topology, a way of describing the links between streams, that starts from the most downstream point in the network and computes distance upstream from that point. These distances are used to build moving average models that incorporate spatial dependence in a way that is analogous to how moving average models are used in time-series analyses.
Two discussion pieces accompany the article, one by Noel Cressie and David O’Donnell and a second by Sujit Sahu. Cressie and O’Donnell provide insightful remarks about the tail-up and tail-down model specifications and also speculate about approaches for developing nonstationary models. Sahu suggests extensions of the moving average models to incorporate a temporal (as well as a spatial) component and to accommodate nonlinear measurements.
An interesting article by Tyler McCormick, Matthew Salganik, and Tian Zhang asks “How Many People Do You Know?” and addresses statistical methods for estimating the size of personal networks. The focus is both estimating the size of an individual’s networks and the distribution of network sizes across the population. The latter is crucial to understanding the spread of diseases and exploring the evolution of group behavior. Standard methods for estimating network size ask questions like “How many pregnant woman do you know?” and then use information about the frequency of such people in the population to infer network size.
Unfortunately, such approaches are prone to biases related to heterogeneity of the population (some people are more likely to know pregnant woman than others) and to other sources of bias. McCormick and coauthors develop a latent, nonrandom mixing model that addresses several important biases and provides improved inference. Interestingly, their model results also provide advice about how to design the survey questions, such that simpler methods are likely to be effective.
Theory and Methods
The usual broad mix of topics appears in this section. Hemant Ishwaran and co-authors Udaya Kogalur, Eiran Gorodeski, Andy Minn, and Michael Lauer address “High-Dimensional Variable Selection for Survival Data.’’ Modern biotechnology has created many instances in which a very large number of predictors (e.g., gene expression values) are available for statistical modeling on a relatively small number of units. The large p, small n problem, as it is known, seems to arise with increasing frequency. Here the goal is to relate the high-dimensional data to a survival time outcome while still dealing with the usual survival-analysis issues such as right censoring.
The authors use random survival forests (RSF), an extension of Breiman’s random forests used in regression and classification settings. The minimal depth of a tree is a form of order statistic that can be used to measure predictiveness of a variable in a survival tree. The authors derive the distribution of the minimal depth and use it for variable selection. Several methodological advances lead to a new regularized algorithm called “RSF-variable hunting,” which implements the approach. Several examples are presented, including gene selection using microarray data.
The arrival of a census year once again brings attention to the thorny issue of collecting critical data while preserving the privacy of survey respondents. In “A Statistical Framework for Differential Privacy,’’ Larry Wasserman and Shuheng Zhou introduce the JASA audience to the concept of differential privacy that has emerged in the computer science literature. The setting is one in which data collectors want to prepare a data release that provides as much information as possible to the public while preserving the privacy rights of respondents. The data release here is viewed as a random mechanism that produces a release product given a true data set. The notion of a random mechanism is consistent with many of the strategies that are used in practice, including data swapping.
Differential privacy is a particular privacy requirement that requires the random mechanism be insensitive to changes to a single data point. More formally, the ratio of the probability of a particular data release should be near 1 if only a single data point is varied. The closer the ratio is to 1 the greater the privacy protection. Wasserman and Zhou study several different data-release mechanisms that satisfy the requirement and compare them by computing the rate of convergence of distributions constructed from the released data to the true target distribution in the population.
Click here for the full list of articles and books under review.