Special Issue Features Anomaly Detection
David M. Steinberg, Technometrics Editor
The February 2010 issue of Technometrics features several articles that highlight some of the fascinating applications in which statistical data analysis is used to detect unusual circumstances, or anomalies. The articles explore the methods developed to address these problems and describe challenges that lie ahead.
The first two articles focus on fraud detection. Agus Sudjianto, Sheela Nair, Ming Yuan, Aijun Zhang, Daniel Kern, and Fernando Cela-Díaz describe problems and statistical solutions for detecting financial crime. Although these crimes affect millions of people every year, they are nonetheless rare events. The volume and complexity of financial data require not only effective algorithms, but also efficient training and execution. Criminals deliberately attempt to conceal the nature of their actions and quickly change their strategies over time, resulting in severe class overlapping and concept drift. In some cases, legal constraints and investigation delays make it impossible to verify suspected crimes in a timely manner. The authors discuss some of the classic statistical techniques that have been applied, as well as more recent machine learning and data mining algorithms. Many illustrative examples are described, with emphasis on two important types of financial crimes: fraud and money laundering.
The second article—by Richard A. Becker, Chris Volinsky, and Allan R. Wilks—is titled “Fraud Detection in Telecommunications: History and Lessons Learned.” This paper reviews the history of fraud detection at AT&T, which was one of the first companies to address fraud in a systematic way to protect its revenue stream. The authors discuss some of the major fraud schemes and techniques employed to identify them, leading to generic conclusions about fraud detection. Specifically, they advocate the use of simple, understandable models, heavy use of visualization, a flexible environment, the importance of data management, and the need to keep humans in the loop.
These two papers are followed by a commentary from David Hand, who expands on the contributions and provides his own perspectives from involvement in a number of related applications.
The third, and final, article on anomaly detection, by Galit Shmueli and Howard Burkom, looks at a quite different application area: biosurveillance. This activity involves monitoring a wide range of pre-diagnostic and diagnostic data so we will enhance our ability to detect, investigate, and respond to disease outbreaks. Statistical control charts, which have been a central tool in classic disease surveillance, have migrated into modern biosurveillance. However, the new types of data monitored, the processes underlying these data, and the application context all deviate from the industrial setting for which these tools were originally designed. Assumptions of normality, independence, and stationarity are typically violated in syndromic time series; target values of process parameters are time-dependent and hard to define; and data labeling is ambiguous in the sense that outbreak periods are not clearly defined or known. Additional challenges include multiplicity in several dimensions, performance evaluation, and practical system usage and requirements. The article focuses mainly on the monitoring of time series for early alerting of anomalies, with a brief summary of methods to detect significant spatial and spatiotemporal case clusters.
The remaining articles in the issue explore a number of areas. Adrian E. Raftery, Miroslav Kárný, and Pavel Ettler consider the problem of online prediction when it is uncertain what prediction model is the best. They develop a method called Dynamic Model Averaging (DMA), in which a state space model for the parameters of each model is combined with a Markov chain model for the correct model. This allows the ‘correct’ model to vary over time. The state space and Markov chain models are both specified in terms of forgetting, leading to a highly parsimonious representation. When the model and parameters do not change, DMA is a recursive implementation of standard Bayesian model averaging. The method is applied to the problem of predicting the output strip thickness for a cold rolling mill, where the output is measured with a time delay. When only a small number of physically motivated models were considered and one was clearly best, the method quickly converged to the best model, with small cost for model uncertainty. When model uncertainty and the number of models were large, the method ensured that the penalty for model uncertainty was small. At the beginning of the process, when control is most difficult, DMA over a large model space led to better predictions than the single best-performing physically motivated model.