June JASA Addresses Topics from Malaria Modeling to Outlier Detection
The June issue of the Journal of the American Statistical Association covers application topics ranging from models of the term structure of interest rates to an analysis of the key factors in malaria epidemics. Theory and Methods contributions include novel approaches to the traditional problems of small-area estimation and outlier detection. There is also a review article on making and evaluating forecasts.
Applications and Case Studies
Malaria is a mosquito-borne infectious disease widespread in tropical and subtropical areas of the globe, with hundreds of millions of cases per year that result in more than 1 million deaths per year. The disease was eliminated in North America and Europe during the first half of the 20th century, but remains a problem in large parts of Africa and Asia. In India, there are more than 2 million cases per year. There are new efforts under way to eradicate malaria across the globe; these require an understanding of the roles of environmental factors, immunity, and disease transmission dynamics in malaria epidemics.
In “Malaria in Northwest India: Data Analysis via Partially Observed Stochastic Differential Equation Models Driven by Levy Noise,” authors Anindya Bhadra, Edward L. Ionides, Karina laneri, Mercedes Pascual, Menno Bouma, and Ramesh Dhiman develop a stochastic model used to study the role of different factors in malaria epidemics. The authors begin with a system of stochastic differential equations describing the inter-relationships and transitions among five classes of susceptible (fully and partially), exposed, and infected (symptomatic and mild) individuals. Then, a measurement model relates the dynamic process model to the observed data, a time series of counts of new cases. Inference for model parameters is obtained via a likelihood-based approach that relies on numerical solutions for the sample paths of the underlying process model (a so-called “plug-and-play approach”). The resulting analysis provides clear evidence of a role for rainfall variability in malaria dynamics and the usefulness of rainfall as a predictor in malaria forecasts.
A different application paper concerns the persistent methodological problem of obtaining truthful answers to sensitive survey questions. Randomized response was an early approach and has been used often. An alternative method that has attracted attention recently is the item count technique. In this approach, survey respondents are randomly assigned to treatment and control groups. Each receives a list of items and is asked only for the count of the number of items with which they agree. The treatment group includes one extra item corresponding to the sensitive information desired.
Kosuke Imai in “Statistical Inference for the Item Count Technique” introduces new inference approaches that allow multivariate analyses (e.g., regression) for data collected with the item count technique. This allows investigators to move beyond just estimating the population proportion answering in a particular way to finding the characteristics associated with different responses. The approach is applied to a question from a 1991 National Race and Politics survey concerning racial hatred in the United States in which it is found that southern whites are more likely to be angered by a black family moving next door than non-southern whites, even after adjusting for a number of demographic variables.
Theory and Methods
Small-area estimation—the practice of estimating rates or means for a large number of small geographic areas—is a big statistical problem. Jiming Jiang, Thuan Nguyen, and J. Sunil Rao tackle this problem in “Best Predictive Small-Area Estimation” via an innovative method that combines two modeling approaches.
The authors consider prediction of the fixed parameters in two popular small-area models—the Fay-Herriot model and the nested-error regression model—deriving the so-called best predictive estimator (BPE) of the fixed parameters for each model. The BPEs are used to define a new procedure called observed best prediction (OBP) that is argued to be superior to empirical best linear unbiased prediction (EBLUP) using theoretical derivations and empirical studies.
The performance of observed best prediction is shown to be comparable to that of empirical best linear unbiased prediction when the model is correctly specified and the number of small areas is large. The advantage of the new approach is that OBP outperforms EBLUP when the underlying model is mis-specified. The authors provide theory for observed best prediction and illustrate their approach with data on kidney transplant graft failure rates from 23 hospitals.
It is unlikely these days that the acronym “IPOD” conjures up the phrase “iterative procedure for outlier detection.” However, that is the name given by Yiyuan She and Art Owen to the penalty-based outlier detection method studied in “Outlier Detection Using Nonconvex Penalized Regression.”
Take a regression model Y = XB + E; add a mean-shift parameter for each observation; and fit the resulting highly over-parameterized model using a hard-thresholding, penalty-based regression method. What you get is the authors’ Θ-IPOD procedure (iterative procedure for outlier detection based on a thresholding rule Θ). The method has one tuning parameter to be selected that handles both the tasks of outlier identification and regression coefficient estimation. In effect, the method performs variable selection on the combined model regression coefficient vector and mean-shift parameters vector, thus identifying a model that is robust to outliers.
Θ-IPOD methodology extends to high-dimensional modeling with p >> n, provided both the coefficient vector is sparse and outliers are rare. While the method gives impressive performance in Monte Carlo studies and proves itself worthy on a number of well-studied real data sets, the authors caution that their method, like others to which they compare, depends on having a preliminary robust fit. Although the Θ-IPOD iterations are shown to be very fast (O(np) at most compared to O(np^2) for standard competing algorithms), the speed of obtaining the preliminary robust fit in n+p dimensions is a limiting factor.
The June issue also includes a Review article on forecasting. Ideally, forecasts ought to be probabilistic, taking the form of probability distributions over future quantities or events. However, many situations still require single-valued point forecasts. Tilmann Gneiting, in “Making and Evaluating Point Forecasts,” discusses approaches for comparing and assessing point forecasts. The article introduces scoring functions and develops the ideas of consistency and elicitability for such functions. After a series of examples, the article closes by arguing for a change in current point forecasting practice, so that either a scoring function is specified ex ante or an elicitable target function is named.
The Review section also includes the usual slate of informative book reviews. View a full list of articles and a list of the books under review. ASA members can access JASA online for free by logging in through Members Only and clicking “My Publications.”