September Issue Provides Perspectives on Reconstructing Paleoclimate
Hal Stern, JASA Editor
The observed climate record (dating back to only about 1850) is relatively short for assessing the importance of recent warming—especially the role of human greenhouse gas emissions. An important tool in the study of global warming is the use of climate proxies (e.g., pollen deposits and tree rings) to infer the historical climate record back 1,000 years or so.
The September issue’s featured Applications and Case Studies (ACS) article, “The Value of Multiproxy Reconstruction of Past Climate,” which was presented at the 2010 Joint Statistical Meetings, carries out a simulation study to assess the potential of multiple climate proxies to improve historical climate reconstructions. Authors Bo Li, Douglas Nychka, and Caspar Ammann generate synthetic proxy data of three types:
- Tree-ring width (useful for capturing short-term variation in temperatures)
- Borehole depth temperature (sensitive to multi-decadal or longer time scales)
- Pollen records (reflect intermediate time-span variability)
A Bayesian hierarchical model is developed to relate the proxies to the underlying temperature and various model parameters. The temperature series, itself, is related to external forcing factors such as solar irradiance and volcanic activity, with the historical temperatures treated as missing data. The authors carry out several analyses of the simulated data to determine the key factors in creating accurate climate reconstructions. They find that external forcing factors play a critical role, as does a proxy such as pollen, which provides key information at the multi-decadal time scale.
Book Reviews Biomeasurement: A Student’s Guide to Biological Statistics (2nd ed.) Continuous Bivariate Distributions (2nd ed.) Economic Modeling and Inference Gene Expression Studies Using Affymetrix Microarrays Introduction to Spatial Econometrics Scientific Data Mining: A Practical Perspective Statistical Analysis and Modelling of Spatial Point Patterns Statistical Detection and Surveillance of Geographic Clusters Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data (2nd ed.) The Statistics of Gene Mapping Theory of Decision Under Uncertainty
Applying Quantitative Bias Analysis to Epidemiologic Data
Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink
N. Balakrishnan and Chin-Diew Lai
Bent Jesper Christensen and Nicholas M. Kiefer
Hinrich Göhlmann and Willem Talloen
James LeSage and Robert Kelley Pace
Janine Illian, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan
Peter Rogerson and Ikuho Yamada
William D. Dupont
David Siegmund and Benjamin Yakir
Biomeasurement: A Student’s Guide to Biological Statistics (2nd ed.)
Continuous Bivariate Distributions (2nd ed.)
Economic Modeling and Inference
Gene Expression Studies Using Affymetrix Microarrays
Introduction to Spatial Econometrics
Scientific Data Mining: A Practical Perspective
Statistical Analysis and Modelling of Spatial Point Patterns
Statistical Detection and Surveillance of Geographic Clusters
Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data (2nd ed.)
The Statistics of Gene Mapping
Theory of Decision Under Uncertainty
Three insightful discussions by Noel Cressie and Martin Tingley; Richard Smith; and Eugene Wahl, Christian Schoelzel, John Williams, and Seyitriza Tigrek provide a range of perspectives. They offer suggestions for improving the study and enhancing the way pollen is used in climate reconstructions.
Modern technological advances have affected all sciences, but none more so than biology. Development of DNAR microarray technology enabled researchers to measure gene expression in biological samples. One limitation of the microarray approach is that expression can only be measured for known genetic sequences printed on the slides. Recent technologies such as serial analysis of gene expression and massively parallel signature sequencing (MPSS) allow researchers to explore the relative frequency of sequence segments without requiring the sequence to be known.
The paper “Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection,” by Soma Dhavala, Sujay Datta, Bani Mallick, Raymond Carroll, Sangeeta Khare, Sara Lawhon, and L. Garry Adams, introduces an analysis of variance style model for MPSS sequence count data. The model uses a zero-inflated Poisson distribution, rather than the continuous distributions assumed by existing methods. The model is applied to compare gene expression in three groups of cattle: those infected by a “wild type” Salmonella bacterium, those infected by a mutant strain, and those not infected. Several genes that are differentially expressed across the three groups are identified and their biological significance assessed.
Theory and Methods
The T&M editor’s invited paper session at JSM 2010 featured Bradley Efron speaking on one of the more vexing problems of contemporary statistics. In “Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates,” Efron considers large-scale studies that produce hundreds or thousands of correlated test statistics, each typically represented by a normal variate or a z-value.
A canonical example is a microarray experiment comparing healthy and diseased subjects’ gene expression levels for thousands of genes.
The paper develops methods for assessing the accuracy of summary statistics based on large sets of correlated normal variates such as their empirical cdf or a false discovery rate statistic. Efron sidesteps the seemingly necessary step of estimating an N by N correlation matrix by making use of what he describes as a “really classical” result known as Mehler’s Identity, enabling an accurate approximation based on the root mean square correlation over all N • (N − 1)/2 pairs—a quantity often easily estimated.
Discussions by Tony Cai, Ruth Heller, Armin Schwartzman, and Peter H. Westfall provide context for understanding the paper’s main contributions to the broader literature on large-scale testing.
According to Leon Trotsky “Everything is relative in this world …” That seems true, except, of course, when it comes to the error models in much of statistical theory and modeling, where additive (rather than relative) error models dominate the landscape. Kani Chen, Shaojun Guo, Yuanyuan Lin, and Zhiliang Ying argue that, for many applications, minimizing relative error makes more sense. They provide an alternative to additive-error modeling in “Least Absolute Relative Error Estimation.”
The authors study regression modeling based on a relative-error criterion that is symmetric in the response and the regression function. Estimators derived from the symmetric relative-error objective function are shown to be generally consistent and asymptotically normal, as well as amenable to inference via a random weighting approach. In addition, the estimators are shown to be efficient for certain multiplicative-error models, in which the error distribution has the inverse transformation invariant density f(x) = c exp(−|1 −x| − |1 −x−1| − log x)*I(x > 0). The new approach to modeling is illustrated using stock return data from the Hong Kong stock exchange.
Of course, the above articles are just a sample. There are many other interesting articles in the T&M and ACS sections of the September issue of JASA, not to mention the usual array of informative book reviews. Visit the JASA website for the full list of articles and books under review.