Home » Additional Features

P&C Committee Discusses Privacy Protections for Transportation Data in Big Data Environment

1 November 2018 516 views No Comment

The Privacy and Confidentiality (P&C) Committee organized an invited session at the 2018 Joint Statistical Meetings that focused on protecting participants’ privacy in large-scale transportation studies. Jacob Bournazian of the Energy Information Administration and former chair of the P&C committee organized the session. Stephanie Shipp of the University of Virginia chaired the session, and Tom Krenzke of Westat discussed the papers.

As background, transportation data include detailed spatial and temporal information collected from roadside sensors, vehicles, mobile phones, and navigation devices. Identifying the travel behavior of individuals, families, and communities is an important part of traffic management and planning. These transportation data also support the research, analysis, and planning of transport systems and future planning for building transportation infrastructure. However, collecting, accessing, or releasing information that identifies an individual’s travel patterns without his or her knowledge or consent may cause privacy violations under existing laws, even if the information is used for a larger social benefit.

The four presenters described current and proposed application of methods for protecting privacy interests in large transportation data sets and the benefits and privacy risks associated with researchers accessing these data for statistical use in a big data environment.

The first paper, presented by William Bachman and co-authored by Krenzke and Jane Li—all at Westat at the time—presented the results of a literature review and their evaluation of privacy protection methods for common transportation data-collection technologies such as fixed sensors and mobile sources. Their conclusions are that guidance on legal responsibilities is lacking, the “universal” one-time privacy protection method is not realistic without reducing data utility, and specific data uses require specific disclosure control algorithms, which they discussed in the context of geospatial data anonymization methods. Their recommendations are to minimize raw data access and protect personally identifiable information (PII), as well as use pseudoidentifiers and nonspecific attributes that can be combined to identify PII. They also recommend eliminating uniqueness of the location or trace data using k-anonymity principles of aggregation, perturbation, suppression, etc.

The next three papers focused on privacy protection using data from the second Strategic Highway Research Program (SHRP2). In this program, 145 research projects focused on safety, renewal, reliability, and capacity were conducted from 2006 to 2015.

Miguel Perez, director of the Center for Data Reduction and Analyses Support at the Virginia Tech Transportation Institute, described naturalistic driving studies, the types of data collected, and the use and potential abuse of the data. Data from naturalistic driving studies, such as SHRP2, are collected in privately owned vehicles using unobtrusive instrumentation so participants drive as they normally would. No experimenters are present. The objectives of the study are to provide detailed pre-crash information, real-life behaviors, and rich databases for research. Through examples of case studies, Perez highlighted uses of the data such as analyzing the start and stop locations of car trips (e.g., home to shopping by distance), typical commuting patterns and the effects of disruptions, and crash and near-crash risks.

Crashes generate a lot of paperwork, some of which is publicly available. Perez’s recommendations to facilitate researcher access to the data while ensuring participant privacy are to replace videos of faces with avatars that retain behaviors and expressions but remove identifying features and/or to code the behaviors so the video is no longer needed.

Christian Richard of Battelle presented potential frameworks to assess and manage participant disclosure risk. He and his coauthors developed a framework for assessing data set utility and privacy risk, identified approaches for protecting privacy while maintaining utility, and conducted a pilot test of the framework on potential public use data sets (PUDs). The authors created three PUDs on different topics: (1) driving events that include summary variables for individual crashes and variables describing roadway characteristics at event locations; (2) trip summaries with approximately 5.5 million trip records that contain summary information about each trip; and (3) travel density maps that include GIS shape files and tables that contain the number of participants and trips for each road segment. Each PUD was implemented with strong privacy measures.

Applying the National Institute of Standards and Technology 2015 Data Privacy Framework, the authors performed threat modeling, evaluated reidentification risk, applied transformations, performed diagnostics, and then re-evaluated the risks and utility tradeoffs. They found that a modified version of NIST 2015 framework can be applied to the development of SHRP2 Naturalistic Driving Study PUDs, but each PUD has unique requirements so additional signoffs to use the data may be needed. They also found there are established technical approaches for quantifying risk and utility, transforming data sets, and evaluating the risk-utility tradeoff that can be applied to SHRP2 PUD development. However, they concluded the unknown privacy risks with SHRP2 data are still too high to move forward with a PUD model at this time.

Suzie Lee is a project director at the Center for Data Reduction and Analysis Support at the Virginia Tech Transportation Institute. She described the current procedures to allow researcher access to the SHRP2 data to balance participant privacy protection with data utility. The SHRP2 data provide information about 1,800 crashes, 6,900 near crashes, and 5.4 million trips involving 3,000 participants. Two approaches are used to provide data. Both procedures require institutional review board (IRB) approval and ethics training.

The first is to carve out subsets of high-interest data and make it easily and widely available using an InSight website. The second process allows users to specify subsets of data on specific topics and to be issued data use licenses and extractions of data that meet their specifications.

Privacy is protected by restricting use of identifying data to a secure data enclave, and nonidentifying data are securely exported. There are strong administrative controls and consequences for violations. Detection algorithms are set up to prevent users from submitting false or incorrect credentials or scraping the data.

To date, 3,000 distinct users have used the InSight website to conduct their research and there have been more than 200 data use licenses issued that allow researcher access to subsets of the data.

Tom Krenzke summarized the session by highlighting the advances in privacy protection evident at the session. He mentioned the European Union’s General Data Protection Regulation, a regulation in EU law on data protection and privacy for all individuals within the European Union, stating that it—and other such regulations—may create paralysis by disclosure analysis without further guidance because of the number of rules and requirements required to protect data. He liked the general approach the Virginia Tech Transportation Institute is taking to allow the SHRP2 data to be used for analysis via the data tool, while maintaining control of the actual data and satisfying the bulk of data users and analyses. He highlighted the risk-utility tradeoff, emphasizing the need for researcher training, ethical behavior, and consequences if rules are violated.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.