## March JASA Features Novel Methods for Handling Censored Data

The March 2012 issue of the *Journal of the American Statistical Association* is the first for new Theory and Methods section editors Xuming He of the University of Michigan and Jun Liu of Harvard University. Their first issue contains the usual broad assortment of theory and methods contributions, affecting both study design and data analysis. Two Theory and Methods articles that use imputation to improve statistical inference with censored data are described below. The Applications and Case Studies section of this issue includes applications in education, environmental science, medicine, and public health. Articles about improved methods for tissue classification in brain imaging and wind power forecasting are previewed here.

#### Theory and Methods

Censored data are a form of missing data; it is known only that the observed value lies in a particular interval. For example, censored patient survival times in a clinical study are often known only to exceed the amount of follow-up time for an individual.

In many settings, censored data can complicate analysis and/or lead to a loss of statistical efficiency. One such setting is in the development of survival trees for analyzing survival data. Survival trees are a popular nonparametric regression approach to understanding the relationship between the survival time of a subject and a set of predictors. The ability to obtain a good-fitting survival tree relies on having a sufficient number of observed outcomes to build up the tree; censored observations don’t have observed failure times that can limit our ability to develop appropriate trees.

“Recursively Imputed Survival Trees,” by **Ruoqing Zhu** and **Michael Kosorok**, proposes imputing the unobserved failure times based on an initial survival model, updating the model based on the imputed failure times, and then repeating the process several times. The proposed method can be viewed as a type of Monte Carlo expectation-maximization (EM) algorithm, which generates extra diversity in the tree-fitting process. Simulation studies and data analyses demonstrate that the new approach makes better use of the censored data than previous tree-based methods, yielding improved model fit and reduced prediction error.

Another case in which censored data can cause difficulties is when important predictors in a regression are subject to censoring. This can happen, for example, when an exposure variable is subject to detection limits so that values below the detection limit are not observed. A common approach is to use likelihood-based methods that assume the “missing” covariate values are missing at random, but this is not appropriate for censored covariates.

**Huixia Wang** and **Xingdong Feng** address this issue in the context of robust regression, specifically M-regression, in their article “Multiple Imputation for M-Regression with Censored Covariates.” Instead of specifying parametric likelihoods for the imputation model in their approach, the authors’ method imputes the censored covariates by assuming the conditional quantiles of the censored covariates are linear in the observed variables. This can be viewed as a semiparametric approach. The censored covariates are imputed multiple times and then standard regression methods are applied to the multiply imputed data sets and the results combined to produce an estimator with improved efficiency. The resulting estimator is shown to be consistent and asymptotically normal. The finite sample performance of the proposed method is assessed through a simulation study and the analysis of the c-reactive protein data set in the 2007–2008 National Health and Nutrition Examination Survey.

#### Applications and Case Studies

Magnetic resonance imaging (MRI) is a technology that allows scientists to study the human brain. It allows them to assess the anatomical structure of the brain. MRI produces measurements on a large three-dimensional array of volume elements known as voxels. One class of anatomical studies attempts to identify the different tissue types (gray matter, white matter, and cerebrospinal fluid) in a subject’s brain from a single MRI image.

This type of classification can be done manually, but it is labor intensive. Automatic methods have been developed that use a Markov random field (MRF) prior distribution to favor contiguity among voxels of the same tissue type. The MRF is combined with a model that assumes observed voxel image intensities (e.g., gray levels) are normally distributed with means and variances that depend on the tissue type. Typical methods assume that each voxel is homogeneous so that the entire voxel is characterized as belonging to a given tissue type.

**Dai Feng**, **Luke Tierney**, and **Vincent Magnotta**, in their article, “MRI Tissue Classification via Bayes Hierarchical Mixture Models,” derive an improved classification approach by introducing subvoxels within each voxel and then building an MRF model at the subvoxel level. Markov chain Monte Carlo methods are used to simulate from the posterior distribution of the model parameters and the subvoxel labels. A thorough simulation study demonstrates the value of the new approach.

Alternative energy is a critical research topic at present. Wind power shows great promise as a potential source of energy, but effective use of wind power requires accurate forecasts of wind power generation to allow wind energy producers to make reasonable supply commitments for future time periods. Most currently, available forecast approaches obtain point estimates of wind speed and wind direction and then turn these into point estimates of wind power. **Jooyoung Jeon** and **James Taylor’s** article, “Using Conditional Kernel Density Estimation for Wind Power Density Forecasting,” develops a method for generating probability distributions of wind power forecasts. Probability distributions of wind power forecasts allow better decisions than point estimates.

Jeon and Taylor use standard time series approaches (a VARMA-GARCH model) to generate wind speed and wind direction estimates. They then use a stochastic model to relate wind power to wind speed and direction. Their approach uses conditional kernel density estimation, where the conditioning is on wind speed, to obtain separate probability distributions for the wind power produced at different wind speeds. A time decay factor is built into the conditional kernel density approach to allow evolution in the wind power/wind speed relationship over time. The methodology is applied to four Greek wind farms and provides better short-term (up to 72-hour) forecasts than simpler methods that rely on deterministic wind speed/wind power curves.

There are many other informative articles in both sections of the March issue, as well as a set of book reviews. The full list of articles and a list of the books reviewed can be found at the ASA website.