Home » Additional Features

June CNSTAT Workshop Focused on Privacy

1 December 2019 No Comment
Jerry Reiter, Lars Vilhuber, and Tom Krenzke, ASA Privacy and Confidentiality Committee Members
    A key takeaway from the introductory talks was how difficult it is to feel confident data are safe, as there are always attack strategies agencies may not conceive of when applying legacy disclosure protection methods.

    For decades, federal statistical agencies have striven to balance the legal and ethical obligations to protect the confidentiality of data subjects with the need to provide informative statistics and access to data for secondary analysis. In recent years, balancing these objectives has become increasingly difficult. The digital revolution has seen an explosion in the growth of available data, both from public and private sources, which ill-intentioned actors could use to compromise confidentiality protections.

    With these challenges in mind, the Committee on National Statistics (CNSTAT) of the National Academy of Sciences held a workshop June 6–7 to discuss new approaches to protecting data confidentiality, with a focus on methods that offer formal guarantees of privacy protection such as differential privacy. The discussions covered policy and implementation issues from both provider and user perspectives, including the promises and limitations of using formal privacy methods.

    The workshop began with examples of potential disclosures from data protected by legacy disclosure protection methods. The examples highlighted how ill-intentioned actors can use information external to the released data to breach data confidentiality. A key takeaway from the introductory talks was how difficult it is to feel confident data are safe, as there are always attack strategies agencies may not conceive of when applying legacy disclosure protection methods. Indeed, one of the appealing promises of formal privacy approaches is to augment confidence in the data protection. Another takeaway is that it can be useful for agencies to have friendly hackers attack their databases and assess their vulnerabilities.

    Participants heard from agency heads and staff about challenges they face in trying to put disclosure avoidance into practice. One challenge that was often repeated was the human capital and staffing challenge. Agency staff have many responsibilities, and for many agencies, disclosure avoidance is not a primary job assignment. This a difficult challenge to overcome; however, it may be possible for agencies to leverage expertise in academia and the private sector through partnerships. For example, many of the computer scientists involved in modernizing the disclosure control methods for the 2020 decennial census are on the Census Bureau payroll as part-time, Schedule A employees. Another example of partnerships is the National Census Research Network, which Robert Groves—now of Georgetown University—created with the National Science Foundation when he was the head of the Census Bureau.

    The audience saw a tutorial on differential privacy and heard from a panel of experts about what can and cannot be done. These sessions offered several takeaways. First, there exist many differentially private algorithms that can handle many statistical tasks, from counting to regression modeling. Second, differentially private algorithms are not panaceas. In particular, there are data dissemination tasks for which no differentially private algorithms exist that produce outputs with high enough quality to be fit for use. It is clear there is still much research to be done in practical implementations of differential privacy.

    Another important takeaway, mentioned multiple times, is that agencies should think about differentially private algorithms as supplementary tools in the disclosure protection toolkit, not as all-encompassing replacements. Agencies do not necessarily have to take an all-or-nothing approach. For certain problems, differential privacy is an achievable criterion. For others, agencies may—as a policy matter—use other solutions or hybrid solutions. One presentation provided evidence that modern privacy protection methods may not add much error compared to other sources of error.

    The second day started with a panel involving users and stakeholders. Panel members stressed the importance of involving user communities and stakeholders in the decisions for privacy policies. Agencies must help their user communities and stakeholders understand the impacts of changes in privacy protection methods on the types of analyses and decision tasks they care about. A key theme underlying many of the talks was that it is no longer acceptable for agencies and other data stewards to simply say “here’s the data, we know best how to make it, now you can use it.” Recent agency changes in privacy protection methods have inspired analysts to ask more questions about the impact of the agency’s actions—including editing, imputation, and privacy protection—on inferences.

    Many audience questions focused on what one might call policy decisions. Panelists raised those issues, as well. In particular, who decides what the privacy budget is? What are the implications for having to allocate that budget? If we’re going to run out of the budget for a data set, are we going to have to let that data source expire? One panelist mentioned users were comfortable with deviations from the confidential data estimates in one case. This inspired questions around what it means to have acceptable deviations.

    The final session comprised lightning talks. Staff from various agencies and other organizations gave five-minute presentations about challenges and solutions they’re working on for issues in privacy.

    CNSTAT plans to continue the conversation on privacy in future activities. This is a critical issue facing all statistical agencies and is at the core of many agencies’ missions.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

    Comments are closed.