## September Issue Provides Perspectives on Reconstructing Paleoclimate

*Hal Stern, JASA Editor*

The observed climate record (dating back to only about 1850) is relatively short for assessing the importance of recent warming—especially the role of human greenhouse gas emissions. An important tool in the study of global warming is the use of climate proxies (e.g., pollen deposits and tree rings) to infer the historical climate record back 1,000 years or so.

The September issue’s featured Applications and Case Studies (ACS) article, “The Value of Multiproxy Reconstruction of Past Climate,” which was presented at the 2010 Joint Statistical Meetings, carries out a simulation study to assess the potential of multiple climate proxies to improve historical climate reconstructions. Authors **Bo Li, Douglas Nychka,** and **Caspar Ammann** generate synthetic proxy data of three types:

- Tree-ring width (useful for capturing short-term variation in temperatures)
- Borehole depth temperature (sensitive to multi-decadal or longer time scales)
- Pollen records (reflect intermediate time-span variability)

A Bayesian hierarchical model is developed to relate the proxies to the underlying temperature and various model parameters. The temperature series, itself, is related to external forcing factors such as solar irradiance and volcanic activity, with the historical temperatures treated as missing data. The authors carry out several analyses of the simulated data to determine the key factors in creating accurate climate reconstructions. They find that external forcing factors play a critical role, as does a proxy such as pollen, which provides key information at the multi-decadal time scale.

**Book Reviews**

*Applying Quantitative Bias Analysis to Epidemiologic Data*

Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink

*Biomeasurement: A Student’s Guide to Biological Statistics (2nd ed.) *

Dawn Hawkins

*Continuous Bivariate Distributions (2nd ed.) *

N. Balakrishnan and Chin-Diew Lai

*Economic Modeling and Inference*

Bent Jesper Christensen and Nicholas M. Kiefer

*Gene Expression Studies Using Affymetrix Microarrays*

Hinrich Göhlmann and Willem Talloen

*Introduction to Spatial Econometrics*

James LeSage and Robert Kelley Pace

*Scientific Data Mining: A Practical Perspective*

Chandrika Kamath

*Statistical Analysis and Modelling of Spatial Point Patterns*

Janine Illian, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan

*Statistical Detection and Surveillance of Geographic Clusters*

Peter Rogerson and Ikuho Yamada

*Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data (2nd ed.) *

William D. Dupont

*The Statistics of Gene Mapping*

David Siegmund and Benjamin Yakir

*Theory of Decision Under Uncertainty*

Itzhak Gilboa

Three insightful discussions by **Noel Cressie** and **Martin Tingley; Richard Smith;** and **Eugene Wahl, Christian Schoelzel, John Williams,** and **Seyitriza Tigrek** provide a range of perspectives. They offer suggestions for improving the study and enhancing the way pollen is used in climate reconstructions.

Modern technological advances have affected all sciences, but none more so than biology. Development of DNAR microarray technology enabled researchers to measure gene expression in biological samples. One limitation of the microarray approach is that expression can only be measured for known genetic sequences printed on the slides. Recent technologies such as serial analysis of gene expression and massively parallel signature sequencing (MPSS) allow researchers to explore the relative frequency of sequence segments without requiring the sequence to be known.

The paper “Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection,” by **Soma Dhavala, Sujay Datta, Bani Mallick, Raymond Carroll, Sangeeta Khare, Sara Lawhon,** and **L. Garry Adams,** introduces an analysis of variance style model for MPSS sequence count data. The model uses a zero-inflated Poisson distribution, rather than the continuous distributions assumed by existing methods. The model is applied to compare gene expression in three groups of cattle: those infected by a “wild type” Salmonella bacterium, those infected by a mutant strain, and those not infected. Several genes that are differentially expressed across the three groups are identified and their biological significance assessed.

**Theory and Methods**

The T&M editor’s invited paper session at JSM 2010 featured **Bradley Efron** speaking on one of the more vexing problems of contemporary statistics. In “Correlated z-Values and the Accuracy of Large-Scale Statistical Estimates,” Efron considers large-scale studies that produce hundreds or thousands of correlated test statistics, each typically represented by a normal variate or a z-value.

A canonical example is a microarray experiment comparing healthy and diseased subjects’ gene expression levels for thousands of genes.

The paper develops methods for assessing the accuracy of summary statistics based on large sets of correlated normal variates such as their empirical cdf or a false discovery rate statistic. Efron sidesteps the seemingly necessary step of estimating an N by N correlation matrix by making use of what he describes as a “really classical” result known as Mehler’s Identity, enabling an accurate approximation based on the root mean square correlation over all N • (N − 1)/2 pairs—a quantity often easily estimated.

Discussions by **Tony Cai, Ruth Heller, Armin Schwartzman, ** and **Peter H. Westfall** provide context for understanding the paper’s main contributions to the broader literature on large-scale testing.

According to Leon Trotsky “Everything is relative in this world …” That seems true, except, of course, when it comes to the error models in much of statistical theory and modeling, where additive (rather than relative) error models dominate the landscape. **Kani Chen, Shaojun Guo, Yuanyuan Lin,** and **Zhiliang Ying** argue that, for many applications, minimizing relative error makes more sense. They provide an alternative to additive-error modeling in “Least Absolute Relative Error Estimation.”

The authors study regression modeling based on a relative-error criterion that is symmetric in the response and the regression function. Estimators derived from the symmetric relative-error objective function are shown to be generally consistent and asymptotically normal, as well as amenable to inference via a random weighting approach. In addition, the estimators are shown to be efficient for certain multiplicative-error models, in which the error distribution has the inverse transformation invariant density f(x) = c exp(−|1 −x| − |1 −x−1| − log x)*I(x > 0). The new approach to modeling is illustrated using stock return data from the Hong Kong stock exchange.

Of course, the above articles are just a sample. There are many other interesting articles in the T&M and ACS sections of the September issue of *JASA,* not to mention the usual array of informative book reviews. Visit the *JASA* website for the full list of articles and books under review.