Technometrics Highlights: Latest Issue Covers Design, Analysis, Anomaly Detection
Volume 59, Issue 1 of Technometrics includes 11 articles covering topics ranging from design and analysis of complex, black-box computer simulations to algorithmic design approaches for customizing and enhancing key properties of physical experiments to anomaly detection in image and other high-dimensional data streams.
In the paper titled “Monotonic Metamodels for Deterministic Computer Experiments,” author Matthias Hwai Yong Tan explores the challenging goal of incorporating prior knowledge that the response is monotonic in some of the input variables in deterministic computer simulations. Although the Gaussian process (GP) models ubiquitously used for simulation response surface modeling are not monotonic, incorporating such information can substantially improve the accuracy and interpretability of the response predictions. Previous methods that project GP sample paths onto some space of monotonic functions fail to preserve important GP modeling properties such as the prediction uncertainty shrinking at locations close to the design points. This paper develops a weighted projection approach that more effectively uses information in the GP model, together with two computational implementations. The first is isotonic regression on a grid, while the second is projection onto a cone of monotone splines, which alleviates problems encountered in a grid-based approach. Simulations show the monotone B-spline metamodel gives particularly good results.
In “Sliced Full Factorial-Based Latin Hypercube Designs as a Framework for a Batch Sequential Design Algorithm,” Weitao Duan, Bruce E. Ankenman, Susan M. Sanchez, and Paul J. Sanchez develop a method for more efficiently fitting complex models such as finite element or discrete event simulations. To reduce experimental effort, sequential design strategies allow experimenters to collect data only until some measure of prediction precision is reached. The authors’ batch sequential experiment design method uses sliced full factorial-based Latin hypercube designs, which are extensions of sliced orthogonal array-based Latin hypercube designs. At all stages of the sequential design, their approach achieves good univariate projection properties, and the structure of their designs tends to produce uniformity in higher dimensions, which results in the excellent sampling and fitting properties the authors demonstrate with empirical and theoretical arguments.
In “Optimization of Multi-Fidelity Computer Experiments via the EQIE Criterion,” Xu He, Rui Tuo, and C. F. Jeff Wu address the problem of Gaussian process-based optimization for multi-fidelity deterministic computer experiments having tunable levels of accuracy. They propose an optimization scheme that sequentially adds new computer runs based on two sampling criteria. Their first expected quantile improvement criterion scores the desirability of candidate inputs for a fixed accuracy level of the simulator, and their second expected quantile improvement efficiency criterion scores the desirability of candidate combinations of inputs in conjunction with simulator accuracy level. The latter allows not only the inputs, but also the simulator accuracy level, to be strategically chosen for the next round of simulation. Their approach is shown to outperform the popular expected improvement criterion.
In “Calibration of Stochastic Computer Simulators Using Likelihood Emulation,” Jeremy E. Oakley and Benjamin D. Youngman combine simulation and physical experimental data in the so-called calibration problem, which involves modeling the difference or discrepancy between physical reality and its imperfect representation embodied by the simulation. Their focus is on stochastic computer simulation models in which each run takes perhaps one or two minutes. They combine a Gaussian process emulator of the likelihood surface with importance sampling, such that changing the discrepancy specification changes only the importance weights. One major benefit of this is that it allows a range of discrepancy models to be investigated with little additional computational effort, which is important because it is difficult to know the structure of the discrepancy in advance. The approach is illustrated with a case study of a natural history model that has been used to characterize UK bowel cancer incidence.
In “Design and Analysis of Experiments on Non-Convex Regions,” Matthew T. Pratola, Ofir Harari, Derek Bingham, and Gwenn E. Flowers present a new approach for modeling a response in the commonly occurring but under-investigated situation in which the design region is non-convex, for which current tools are limited. The authors’ new method for selecting design points over non-convex regions is based on the application of multidimensional scaling to the geodesic distance. Optimal designs for prediction are described, with special emphasis on Gaussian process models, followed by a simulation study and an application in glaciology.
In “Nonstationary Gaussian Process Models Using Spatial Hierarchical Clustering from Finite Differences,” Matthew J. Heaton, William F. Christensen, and Maria A. Terres consider the modeling of large spatial data having nonstationarity over the spatial domain, which is frequently encountered in science and engineering problems. The computational expense of Gaussian process modeling can be prohibitive in these situations. To perform computationally feasible inference, the authors partition the spatial region into disjoint sets using hierarchical clustering of observations with finite differences in the response as a measure of dissimilarity. Intuitively, directions with large finite differences indicate directions of rapid increase or decrease and are, therefore, appropriate for partitioning the spatial region. After clustering, a nonstationary Gaussian process model is fit across the clusters in a manner that allows the computational burden of model fitting to be distributed across multiple cores and nodes. The methodology is motivated and illustrated using digital temperature data across the city of Houston.
The next three papers develop tools that advance the design and analysis of physical experiments by harnessing modern computational capabilities. In “Benefits and Fast Construction of Efficient Two-Level Foldover Designs,” Anna Errore, Bradley Jones, William Li, and Christopher J. Nachtsheim further substantiate recent arguments that small foldover designs offer advantages in two-level screening experiments. In addition, the authors develop a fast algorithm for constructing efficient two-level foldover designs and show they have superior efficiency for estimating the main effects model. Moreover, their algorithmic approach allows fast construction of designs with many more factors and/or runs. A useful feature of their compromise algorithm is it allows a practitioner to choose among many alternative designs, balancing the tradeoff between efficiency of the main effect estimates vs. correlation and confounding of the two-factor interactions.
In “Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions,” Pieter T. Eendebak and Eric D. Schoen investigate the related problem of designing two-level experiments large enough to estimate all main effects and two-factor interactions. The effect hierarchy principle often suggests that main effect estimation should be given more prominence than the estimation of two-factor interactions, and orthogonal arrays favor main effect estimation. However, recognizing that complete enumeration of orthogonal arrays is infeasible in many practical settings, the authors develop a partial enumeration procedure and establish upper bounds on the D-efficiency for the interaction model based on arrays that have not been generated by the partial enumeration. Their optimal design algorithm generates designs that give smaller standard errors for the main effects, at the expense of worse D-efficiencies for the interaction model, relative to D-optimal designs. Their generated designs for 7–10 factors and 32–72 runs are smaller or have a higher D-efficiency than the smallest orthogonal arrays from the literature.
In “Joint Identification of Location and Dispersion Effects in Unreplicated Two-Level Factorials,” Andrew J. Henrey and Thomas M. Loughin relax the assumption that the location effects have been identified correctly when estimating dispersion effects in unreplicated factorial designs, violation of which degrades the performance of existing methods. The authors develop a method for joint identification of location and dispersion effects that can reliably identify active effects of both types. A normal-based model containing parameters for effects in both the mean and variance is used and parameters are estimated using maximum likelihood with subsequent effect selection via a specially derived information criterion. The method successfully identifies sensible location-dispersion models missed by methods that rely on sequential estimation of location and dispersion effects.
“Anomaly Detection in Images with Smooth Background via Smooth-Sparse Decomposition,” by Hao Yan, Kamran Paynabar, and Jianjun Shi, tackles the emerging problem of how to analyze high-dimensional streams of image-based inspection data for process monitoring purposes. In manufacturing applications such as steel, composites, and textile production, anomaly detection in noisy images is of special importance. Although several methods exist for image denoising and anomaly detection, most perform denoising and detection sequentially, which affects detection accuracy and efficiency, in addition to being computationally prohibitive for real-time applications. The authors develop a new approach for anomaly detection in noisy images with smooth backgrounds. Termed smooth-sparse decomposition, the approach exploits regularized high-dimensional regression to decompose an image and separate anomalous regions by solving a large-scale optimization problem. Fast algorithms for solving the optimization model are also developed.
In “Estimation of Field Reliability Based on Aggregate Lifetime Data,” Piao Chen and Zhi-Sheng Ye present an approach for fitting distribution models to failure data that are aggregated (with substantial loss of information) in a particular way that is common in reliability databases for complex systems with many components. Instead of individual failure times, each aggregate data point is the sum of a series of collective failures representing the cumulative operating time of one component from system commencement to the last component replacement. This data format differs from traditional lifetime data and makes statistical inference challenging. The authors consider gamma and inverse Gaussian distribution models and develop procedures for point and interval estimation of the parameters, based on the aggregated data.