Home » Additional Features, Journal of the American Statistical Association Highlights

Study of Rankings, Criminal Trajectories Featured in December Issue

1 January 2013 No Comment

The appeal of statistics for many practitioners is the wide range of application areas to which our methods can be applied. This is demonstrated here with the articles that we feature from the December issue of the Journal of the American Statistical Association.

Theory and Methods

Rankings appear frequently in modern media to tell us about the most popular websites, the most common baby names, or even the most downloaded JASA articles. It is natural to wonder how much attention to pay to such lists. For example, is the item appearing 10th on a list really more popular than the item appearing 11th? It is not immediately obvious how to address this question using statistical methods, because our usual results based on independent identically distributed samples do not appear to be relevant.

Justin Dyer and Art Owen provide an answer to this interesting question in “Correct Ordering in the Zipf-Poisson Ensemble.” They propose to view the rank data as a Zipf-Poisson ensemble with each count modeled as a Poisson random variable with mean depending on its position in the list. Using the Zipf-Poisson ensemble as a framework, the authors are able to prove the number of items that we can be confident are appearing in the right spot can be quite small, even if the rankings are based on a large body of data.

One example they consider is a list of the most frequent words appearing in a large corpus of printed text, the British National Corpus (a body of text comprised of approximately 100,000,000 words). The most commonly appearing word is “the,” which accounts for about 6% of the data; this is followed by “be,” “of,” “and,” and “a.” The theoretical work of Dyer and Owen indicates that for a data set of this size and with frequency distribution like that found in the corpus, the number of correctly ranked items is assured to be 72 or higher. A simulation verifies that this is the case. Beyond that point in the list, it is possible that the variation may be explained by random factors.

Applications and Case Studies

Studies of longitudinal data, where measurements on the same individual are obtained repeatedly over time, are common in statistics, especially in medical studies. This issue of JASA includes a fascinating analysis of longitudinal data associated with criminal behavior of an individual over time. In “Modeling Criminal Careers as Departures from a Unimodal Population Age-Crime Curve: The Case of Marijuana Use,” authors Donatello Telesca, Elena Erosheva, Derek Kreager, and Ross Matsueda introduce a novel statistical approach to address a controversial question in criminology. The age-crime curve of an individual records the number of criminal acts of an individual over time. Such curves are used often in criminology to compare demographic groups. Modeling the curves is difficult because they are typically based on relatively short sequences and appear highly variable (e.g., some individuals appear to peak early and others late), despite a long-standing theoretical framework that a single unimodal curve underlies all criminal offenses and is invariant across social groups.

The authors develop a model that is consistent with the long-standing theory and also able to accommodate the large amount of variability in the data. Individual age-crime curves are modeled as random functions distributed about a unimodal population curve. The model allows for individual variation in the amplitude of the curve (i.e., the relative amount of criminal activity) and allows for individual “warping” of the time scale through an individual-specific time-transformation function. The latter allows individuals to behave in a manner that is consistent with the population curve while allowing for acceleration at some time points and deceleration at others. The model is demonstrated on a longitudinal study of self-reported marijuana use in Denver, Colorado, youth, where it provides a rich array of conclusions from data that appear at first glance to be noisy and uninformative.


This issue also includes another in our occasional series of review articles, here a review of “Instrumental Variable Estimators for Binary Outcomes,” by Paul Clarke and Frank Windmeijer.

Instrumental variables arise as an approach for assessing the causal effect of a treatment when there is nonignorable selection of those chosen to receive the treatment. It is common to think about the need for adjusting estimated treatment effects in observational studies in which those receiving treatment may differ in important ways from those who don’t receive the treatment. The issue also arises in randomized studies in which lack of compliance can cause differences in exposure to the treatment that are associated with individual characteristics that also affect the outcome.

Over the years, economists and statisticians have developed a range of approaches to this difficult issue, including the instrumental variable approach in which an additional variable that affects the outcome only through the treatment (i.e., has no direct effect on the outcome) can be used to identify the treatment effect.

The authors of this review article survey the different types of instrumental variable approaches that have been proposed for situations with a binary response variable. They highlight the implicit assumptions being made by the approaches and compare their robustness in the face of violations of these assumptions. The findings are illustrated on a re-analysis of a randomized placebo-controlled trial with imperfect compliance in the treatment group.

There are many other informative articles in both sections of the December issue, as well as our usual set of book reviews. The full list of articles and a list of the books under review can be found here, and these three articles will be available for free download for a limited time.

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4.33 out of 5)

Comments are closed.