Modern Methods, Complex Data Featured in May Issue
Hugh A. Chipman, Technometrics Editor
Increasingly complex data and sophisticated measurement devices are resulting in a need for new statistical methods. The first two papers illustrate this aptly for reliability and degradation modeling. In “Methods for Planning Repeated Measures Degradation Studies,” Brian P. Weaver, William Q. Meeker, Luis A. Escobar, and Joanne Wendelberger consider the design of studies when there are few or no failures and degradation measurements must be collected instead. Test plans for repeated measures are developed based on anticipated statistical modeling with mixed effects linear regression. The effect of the number of units and number of measurements per unit are studied, and two real examples are used to illustrate the methods.
In “Field-Failure Predictions Based on Failure-Time Data with Dynamic Covariate Information,” Yili Hong and William Q. Meeker consider models for failure-time data, but with covariates that are changing as units are under study. For example, there are more products being produced with automatic data-collecting devices that track how and under which environments the products are being used. The covariates are incorporated via a cumulative exposure model allowing the prediction of field-failure returns up to a specified future time.
The next two articles consider high-throughput data from biological studies. In “Robust Analysis of High Throughput Screening (HTS) Assay Data,” Changwon Lim, Pranab K. Sen, and Shyamal D. Peddada consider toxicity assays in which thousands of compounds are evaluated by individual dose-response studies. Nonlinear regression models are the quantitative workhorse of such studies, but heteroscedasticity in some—but not all—curves leads to a loss of efficiency. The number of compounds studied means automated methods must be developed to deal with this problem, as well as outliers and influential observations. The paper uses preliminary test estimation that is robust to variance structure, enabling automated and efficient screening decisions. The proposed methodology is illustrated using a data set obtained from the National Toxicology Program.
Although Ke Zhang, Jacqueline M. Hughes-Oliver, and S. Stanley Young also consider high-throughput assays, the objective and model used are different. In the scenario considered in “Analysis of High-Dimensional Structure-Activity Screening Datasets Using the Optimal Bit String Tree,” each assay result yields a categorical response (e.g., activity), and the focus is on modeling the relationship between structure of the compound being assayed and the response. A specialized decision tree, and simulated annealing estimation algorithm is developed, and drug discovery applications are used to illustrate the technique.
Image registration, which seeks to map one image onto another of the same scene, is a fundamental task in many imaging applications. Conventional parametric approaches, which typically assume a global transformation function, cannot preserve singularities and other features of the mapping transformation. Peihua Qiu and Chen Xing suggest a more local and adaptive approach in their paper, “On Nonparametric Image Registration.” Both theoretical and numerical studies demonstrate the method is effective in various applications.
A limitation of many nonparametric regression functions is that they rely on a single library of basis functions from which to construct a regression estimate. In “Nonparametric Regression with Basis Selection from Multiple Libraries,” Jeffrey C. Sklar, Junqing Wu, Wendy Meiring, and Yuedong Wang propose a more adaptive procedure. By using multiple libraries, the resultant regression models are sufficiently flexible to model functions that consist of both changepoints and local smooth components.
Aggregated functional data arise in situations in which several individual curves are combined and only the resultant overall curve is observed. For example, in studying consumption of electricity over time, individual usage curves are difficult and expensive to obtain, but total usage is readily available. Ronaldo Dias, Nancy L. Garcia, and Alexandra M. Schmidt develop “A Hierarchical Model for Aggregated Functional Data” by viewing each aggregated curve as a realization of a Gaussian process with mean modeled through a weighted linear combination of the disaggregated curves. A nonstationary covariance function is used, with inference via a Bayesian approach. The paper focuses on two real examples: a calibration problem for NIR spectroscopy data and an analysis of distribution of energy among different types of consumers.
Principal component analysis has been variously improved by sparsity constraints on coefficients and robust estimation. However, the two methods have not been simultaneously combined. Christophe Croux, Peter Filzmoser, and Heinrich Fritz develop such a combined approach in “Robust Sparse Principal Component Analysis,” yielding both interpretable and stable principal components. By using a sequential computation algorithm, principal components can be obtained for data sets with more variables than observations.
Peter Hall, Fred Lombard, and Cornelis J. Potgieter develop “A New Approach to Function-Based Hypothesis Testing in Location-Scale Families.” They consider scenarios in which two sampled distributions are simply location and scale changes of one another. The test, applicable to both paired data and two-sample data is based on the empirical characteristic function. The method is demonstrated on two motivating applications in the mining industry.
In “Bayes Statistical Analyses for Particle Sieving Studies,” Norma Leyva, Garritt L. Page, Stephen B. Vardeman, and Joanne R. Wendelberger consider contexts in which specimens of a granular material are run through a set of progressively finer sieves and the fractions of the specimen weight captured on each sieve are measured to provide the basis for a characterization of the material through its “particle size distribution.” The article proposes Bayes analyses based on parsimoniously parameterized multivariate normal approximate models for vectors of log weight fraction ratios and extends this to mixtures of materials and hierarchical modeling in which a single process produces several lots of particles.
In computer experiments, statistical calibration enables scientists to incorporate field data. However, the practical application is hardly straightforward for data structures such as spatial-temporal fields, which are usually large or not well represented by a stationary process model. In “Fast Sequential Computer Model Calibration of Large Nonstationary Spatial-Temporal Processes,” Matthew T. Pratola, Stephan R. Sain, Derek Bingham, Michael Wiltberger, and Joshua Rigler present a computationally efficient approach to estimating the calibration parameters by measuring discrepancy between the computer model output and field data. The simple-to-implement approach can be used for sequential design, and is applicable to large and nonstationary data.
Finally, in the short note “On the Connection and Equivalence of Three Sparse Linear Discriminant Analysis Methods,” Qing Mai and Hui Zou show that the normalized solutions of three sparse methods are equal for any sequence of penalization parameters. A short demonstration is provided using a prostate cancer data set.