Home » Featured

NIJ’s Real-Time Crime Forecasting Challenge: An Attempt to Encourage Data Scientists from Every Field to Think About Criminal Justice Problems

1 January 2018 No Comment
Joel Hunt, US Department of Justice

    Earlier this summer, the US Department of Justice’s National Institute of Justice (NIJ) announced the winners of the Real-Time Crime Forecasting Challenge. Four of the winners were students.

    The goal of the challenge was to develop algorithms that could forecast police calls-for-service (CFS) in four crime categories in Portland, Oregon, for five forecast periods. The challenge had the following aims:

    Getting to Know the NIJ
    As the research arm of the US Department of Justice, NIJ invests in scientific research across disciplines to serve the needs of the criminal justice community and has been a driving force in the use of data to address the challenges of crime and justice since the 1980s. NIJ recognizes that rapid advances in data sciences are being used to forecast consumer behavior, detect medical anomalies, and provide informatics about product consumers. These advances have been made by students, professors, scientists, corporations, and individuals across the spectrum of scientific disciplines, including biology, cognitive behavioral research, economics, and statistics.
    • Harness advances in data science in other fields to advance crime forecasting
    • Encourage scientists from all fields to consider the challenges of crime and justice
    • Conduct the most comprehensive comparative analysis of crime forecasting software and algorithms to date

    An ancillary goal of the NIJ was to broaden awareness in the STEM community of NIJ’s involvement in data science. Applicants could be in one of three categories: student (high school or undergraduate); small team/business; and large business.

    The Portland Police Bureau provided five years of calls-for-service data as a training data set; however, contestants were not limited to using the CFS data. The four crime categories were all CFS, burglary (residential and commercial), street crimes, and theft of auto. The five forecast periods were March 1–7, 2017; March 1–14, 2017; March 1–31, 2017; March 1–April 30, 2017; and March 1–May 31, 2017.

    The challenge was designed to test models that incorporated spatial and temporal aspects to forecast future locations of CFS. Winners were determined by the effectiveness and efficiency of their crime forecasting algorithms based on two criteria: Prediction Accuracy Index (PAI) and Prediction Efficiency Index* (PEI*). The PAI measures the effectiveness of the forecasts with the following equation:

    Where n equals the number of crimes that occur in the forecast area, N equals the total number of crimes, a equals the forecast area, and A equals the area of the entire study area. The PEI* will measure the efficiency of the forecast with the following equation:

    Where PEI* equals the maximum obtainable PAI value for the amount of area forecast, a. As such:
    Where n* equals the maximum obtainable n for the amount of area forecast, a.

    In all, 62 submitted algorithms were tested on a single data set. This is the largest known comparative analysis of crime forecasting algorithms to date. The challenge provided insights into the effects of spatial and temporal aggregations of CFS on the ability to forecast CFS.

    The results indicate a clear and significant variance in the ability of the contestants’ algorithms to forecast crime. Further, no one algorithm did well across all categories and forecast periods, though some were more effective, efficient, or both for some crime categories and forecast periods. There is clearly room for improvement to maximize the potential benefit to fighting crime.

    Based on the variance of the PAI and PEI* values of the submissions, the challenge demonstrated that different algorithms may lead to more effective and efficient use of department resources (allocating resources to areas more likely to experience high CFS).

    The results support findings from prior NIJ-funded research. Specifically, to effectively or efficiently forecast burglary, more crimes (time) are needed due to the lower probability of repeat and near-repeat burglaries compared to other crime types. Further, street crimes and thefts of auto are likely to have higher repeat and near-repeat patterns, allowing for more effective and efficient forecasts, even when fewer crimes are present.

    The challenge results also provided NIJ scientists with information that will enable them to better judge the effectiveness of spatially-based policing strategies. Figures 1a and 1b show a general theme present in all combinations of crimes and periods. The color of the circle indicates the category of the forecast (i.e., blue is student, yellow is small team/business, and red is large business).

    Figure 1a

    Figure 1a. Results for street crimes for a two-week period. The circle sizes are proportional to the size of the cell and positioned relative to the forecast’s PAI and PEI* score.

    Figure 1b

    Figure 1b. Results for street crimes for a two-week period. The circle sizes are proportional to the amount of area forecast and positioned relative to the forecast’s PAI and PEI* score.

    Based on these graphs, it appears forecasts using smaller areal units are more effective, but not more efficient. That there is a difference in effectiveness based on areal size is not surprising. The scale component of the modifiable areal unit problem suggests a variance in scores based on spatial scale. What is surprising is that the scale component does not affect the efficiency measure.

    This tells us that when evaluating the effectiveness of a spatially-based police strategy, departments should consider the effect of choosing and maintaining the unit of spatial analysis. A change to a smaller unit of analysis could show an increase in effectiveness that may or may not be indicative of the impact of their strategy. To best understand the effectiveness of a strategy, keeping the cell size constant is critical. Additionally, smaller total forecast area results in a more effective, but not more efficient, forecast.

    It is possible the PAI may need an adjusted score similar to other measures (e.g., adjusted r-squared, which penalize the score based on the number of variables in the model; however, in this case, an adjusted measure could penalize based on the size of the cell).

    Practically speaking, when using a smaller cell size, you get a larger percentage of CFS per percentage of area policed. This is an important measure when considering that the percentage of area is akin to amount of resources (e.g., money, manpower) needed in the praxis of the strategy.

    Another question raised by the results is why cell size and forecast area do not affect the efficiency. One potential answer is that the measure of effectiveness relies on measures of area forecast, whereas the measure of efficiency relies on measures of CFS.

    In practical terms, the measure of efficiency also is important for police departments to consider because it measures the ratio of CFS forecast to how many CFS could have been forecast for that amount of area. When communicating decisions and strategies to the public, it is important to understand the limitations of the analytics guiding those decisions and strategies. The measure of efficiency helps with that.

    These are just the preliminary findings. NIJ scientists will continue to explore the results of the challenge and produce updates on what is learned, which will be available on the challenge website. You can also be added to the NIJ’s data scientist listserv—used to announce publications, events, and funding opportunities at that site.

    Editor’s Note: The opinions, findings, and conclusions or recommendations expressed in this article are those of the author and do not necessarily reflect those of the US Department of Justice.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

    Comments are closed.