Home » Committees, Member News, Privacy and Confidentiality Committee

Privacy Day Webinar 2020: A Summary

1 May 2020 938 views No Comment
Submitted by the ASA Privacy and Confidentiality Committee

    On January 28, a webinar sponsored by the ASA Privacy and Confidentiality Committee was presented by Michael Hawes, senior adviser for the Data Access and Privacy, Research, and Methodology Directorate of the US Census Bureau, titled “Differential Privacy and the 2020 Decennial Census.”

    It was evident from the webinar that the Census Bureau has a commitment to privacy and confidentiality; it’s the law. All information collected by the Census Bureau is protected under Title 13 of the US Code. For the Census Bureau, it is important to keep the public’s trust in an era when concerns about privacy are growing. The challenge is that the release of any statistics calculated from confidential data reveals small amounts of personal information that could be used to reidentify specific individuals. Aggregating data, such as in the Census tabular products, is not enough to protect privacy in large-scale data products. Computer algorithms can reconstruct individual-level information from aggregate data tables quite easily. These reconstructed individual-level records can then be linked to available third-party data sources to accurately reidentify those data subjects.

    Hawes explained that the 2003 Database Reconstruction Theorem has made statistical agencies increasingly cautious about the privacy risks associated with the publication of large amounts of highly accurate and granular data. The Census Bureau has done reconstruction studies, including one to reconstruct individuals’ 2010 Census records from the published 2010 data products. For the 2010 Census, the bureau collected a handful of attributes for the approximately 309 million individuals in the United States, yielding approximately 1.9 billion confidential data points. However, the Census Bureau published more than 150 billion statistics calculated from the data.

    In their experiment, using only publicly available data, the Census Bureau was able to accurately reconstruct individual-level records for all 6 million inhabited Census blocks in the United States. Linking those records to commercially available data from 2010, the bureau was able to confirm accurate reidentifications for 51 million individuals.

    While the Census Bureau has been a leader in risk mitigation techniques, which included introducing data swapping in past decades, Hawes emphasized that alternative data protections are needed. To meet this need, the bureau has adopted differential privacy techniques for the 2020 Census.

    Differential privacy works by injecting a precisely calibrated amount of noise into the data to control the privacy risk associated with each statistic it publishes.

    What is the optimal balance between privacy and accuracy?
    Hawes explained how this tradeoff is a legal and policy decision, balancing the amount of noise infused versus the resulting accuracy. If no privacy loss is the goal, then no results could be published. A strength of differential privacy is that the privacy loss budget is built in as a parameter to precisely control this tradeoff. The higher the measure, the more you favor accuracy over privacy protection.

    At the time of the webinar, the Census Bureau was about a year away from producing the first differentially private data products. Several policy issues remain to be decided upon, especially with regard to the exact privacy loss budget, which quantifies that balance of privacy and accuracy. The bureau continues to evaluate their implementation of differential privacy to improve upon the accuracy and “fitness for use” of the resulting data, especially for smaller areas.

    Will there be a measure of accuracy published?
    One of the elegant aspects of differential privacy, compared with traditional disclosure avoidance methods, is you can be fully transparent about how the algorithm works, its parameters, and the impact the methodology has on the accuracy of the resulting data.

    Who makes the policy decisions about privacy vs. accuracy?
    These decisions will be made by the Census Bureau’s Data Stewardship Executive Policy Committee.

    Will the Census Bureau be using differential privacy for the American Community Survey (ACS)?
    Recognizing the increasing privacy threats posed by the proliferation of third-party data that can be used to reidentify individuals in official statistics and the increasingly powerful algorithms that can perform those reconstructions and reidentification, the Census Bureau has committed to modernizing its disclosure avoidance methods for all censuses and surveys on a rolling basis. The ACS will eventually be moved to differential privacy, but only after extensive consultation with ACS data users about the effects the method might have on the data’s fitness for use. The earliest this transition could happen is 2025.

    What will the impact of differential privacy be for different types of data uses?
    There are countless ways to use the Census data. What gets published will be high quality for many uses, but may not be highest for other uses. There’s not a single metric that can be optimized for. The Census Bureau is committed to providing guidance to its data users about the data products’ fitness for use for various use cases.

    All webinars can be viewed online.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading...

    Comments are closed.