Developing a Privacy Policy

Monthly Membership Magazine of the American Statistical Association

Developing a Privacy Policy

1 February 2019 633 views No Comment

ASA Privacy and Confidentiality Committee Members

Journal of Privacy and Confidentiality Relaunched with Special Issue in Honor of Stephen E. Fienberg

Aleksandra Slavković and Lars Vilhuber

Stephen Fienberg

In addition to Stephen Fienberg’s numerous contributions to the research, education, and practice of statistics, social sciences, and machine learning, he was a co-founder of the Journal of Privacy and Confidentiality (JPC) and—to the end—editor-in-chief of it. His absence led to a significant hiatus in the journal’s activities.

The newest issue relaunches the journal. Aleksandra Slavković and Lars Vilhuber introduce the issue, including its special features, in their editorial. Additional editorials by Cynthia Dwork and Vilhuber address what the editorial team has accomplished in the past year and aims to accomplish in the next few years.

The issue is in honor of Fienberg and his impact on the methodology and practice of privacy and confidentiality. It commemorates the intersection between statistics, computer science, privacy, and confidentiality as he envisioned so many years ago. There are six peer-reviewed articles, as well as 10 reminiscences about Fienberg by researchers who worked with him.

JPC is an open-access journal, and all articles are free to view and download.

JPC accepts submissions from any field as long as it relates to privacy and confidentiality.

There is some risk of disclosing identities of individuals (or entities) and their sensitive data that exists in every data file released. A privacy policy should describe a process that considers whether the release of the data file followed reasonable procedures and a process that applies appropriate data use restrictions and access controls. At the very least, the risk of re-identification should be minimized after considering the possible approaches to re-identify persons using existing technology and public or commercially available information.

Furthermore, it is important to be aware that the privacy of individuals can be affected by a data release whether or not the person is in the file.

The European Union (EU) recently adopted the General Data Protection Regulation (GDPR), which has raised awareness of maintaining data confidentiality. With the adoption of GDPR and more awareness of data confidentiality, organizations should include the GDPR in their privacy policy as applicable. For more information about the GDPR, see the July 2018 issue of Amstat News.

Background

In the US, the legal framework for privacy policies is based on various federal and state statutes and case law. The majority of these laws focus on protecting the identifiability of persons as the basis and approach for protecting privacy interests. Privacy interests are fragmented across subject matter topics, and privacy protections and remedies also vary across subject matter.

The current requirements to protect privacy interests place a strong emphasis on whether personally identifiable information (PII) exists in the data file. The definition of PII is so broad that it refers to any piece of information that can be used on its own or with other information to re-identify, contact, or locate a person in a file.

The reliance on a vague term such as PII for whether a data system requires privacy protection provides no guidance for developing a privacy policy in the digital age. The reliance on the presence or absence of PII in a data file should not be determinative on whether privacy protections apply.

Recommendation

The current privacy legal framework needs to change and focus on applying appropriate data use restrictions, assessing the risk of re-identification, creating the process for developing the data set, and determining the contents of a data set.

A privacy policy should focus on ensuring the process and procedural actions to protect information are followed. New statutes need to be drafted that require privacy protections regardless of whether PII exists in a file and impose penalties without requiring persons to show harm or damages.

The new legal framework needs to focus on data use restrictions and access controls as the main approach for protecting privacy interests throughout the lifecycle of information, beginning with data collection and ending with record destruction.

Guidance

Privacy interests should apply, regardless of whether persons are identifiable in a file and regardless of whether PII exists. The existence of PII in a data file is only one factor as a high two-risk variable in determining the identifiability of the data. For example, combinations of indirect identifying variables can lead to the identity of an individual, or the full or partial data vector for a record may be linkable to another file containing PII.

A privacy policy should associate the levels of access and the level of data protection methods (data protection methodology) to minimize re-identification risk. For example, a data set in a physical enclave with strong access controls may be able to allow access to more raw data than a publicly accessible data set.

Strong privacy protections rely on applying appropriate data protection methods, data use restrictions, and access controls. Privacy rights should be enforceable without the need to show direct or indirect damages.

Privacy violations should be noted when the process described in a privacy policy is violated, the required actions are not applied, or the actions taken do not comply with the process.

Essential elements for a data privacy policy should, at the very least, do the following:

Describe allowable uses of the data. Some uses are more problematic and require more controls or restrictions. Determine whether researcher access to microdata will be allowed. Use specific informed consent statements and avoid the use of broad consent forms.
Describe the source(s) of the data and, to measure re-identification risk, review the sample size and whether sample weights are used in the design, the population size, whether high-risk variables exist in the data file that can be matched to external files, and whether multiple records in a data file are known to belong to the same cluster. Longitudinal and panel surveys create a special case of disclosure risk that may be associated with linked files. In this case, the disclosure risk of a microdata file increases if some records on the file are released on another file with more detailed or overlapping recodes (categorizations) of the same variables.
Describe the key re-identification risk elements and potential attack scenarios the policy is designed to protect against.
Determine what data protection methods should be applied to data releases that are commensurate with the proposed use(s).
Describe the data access controls for the data system. Reference whether any privacy protection certificates/certificates of confidentiality apply that reduce the risk of required disclosures of identifiable information.
Verify the organization has the resources, expertise, and capability to provide appropriate guarantees, assurances, or attestations for privacy protection when information is being collected.
Avoid broad consent statements and develop specific consent and use statements for the information collected.
State whether information will be disclosed to the third parties for other purposes such as research and/or marketing.
Describe the baseline anonymization process and determine the threshold level for re-identification of data subjects in the file to be publicly released. Consider how to minimize what information is collected and stored to reduce the probability of linking records to re-identify individuals.

New privacy statutes should provide clear enforcement sanctions via civil damages and criminal penalties for violating the requirements and procedures stated in the policy without the requirement for a person to show harm or damages. The important steps to follow for implementing a privacy policy include assessing the data to be released and the risks of disclosing identifiable information, minimizing the data to be released without compromising utility, applying reasonable data protection methods, developing and applying appropriate monitoring, and creating an accountability and breach response plan (i.e., use checklist or process approach similar to the approach used for developing a privacy impact assessment).

Data Release Policies

A data release policy should be consistent with and support the privacy policy. Factors to consider when drafting a data release policy include the following:

Different approaches for assessing re-identification risk
Different assumptions for determining the sophistication of the data intruder
a. The likelihood the intruder would attempt to re-identify. The likelihood is often unknown, so assumptions need to be stated regarding the sophistication and competency of statistical and computational skills needed for re-identification. Apply a reasonableness standard in terms of technology, software, and auxiliary data files that the data supplier is aware of after making a good faith effort to identify other linkable data files. Auxiliary files that may not appear useful for linking records may have greater value in the future, depending on the growth of available data sources. This requires an annual review to evaluate risk and account for an increase in available data files within a global network and the development of new statistical skills and methods.

b. The amount of effort an intruder would spend trying to re-identify. There may be facts or circumstances the data supplier is aware of that may change these assumptions and require greater protection, such as the likelihood of linkable auxiliary data that can be used for re-identification or the intruder being a foreign government or other entity with significant resources available.
Different types of harm that can arise from an unauthorized disclosure
Differences in the resource capability, expertise, and effort an organization can spend for applying and testing data protection methods
Differences in the utility of the de-identified data; in evaluating the utility of the data, the needs of researcher access to the actual data versus using synthetic data should be considered

The process for releasing data should include the following:

An enforceable commitment not to re-identify
An audit trail on access to data
A measure of re-identification risk before and after data protections are applied
Application of data protection methods to block reasonable efforts to re-identify persons
An anonymization process that applies to all data and follows a ‘reasonableness’ standard
Other access and disclosure controls such as federal privacy certificates, data licensing terms, or memorandum of understandings for enforceable control of data releases

No single risk policy is appropriate for all types of data releases. The minimum risk tolerance should be established on the probability of re-identification of a person in a data file. More broadly, the risk ought to be the disclosure risk of an individual or the sensitive characteristics, regardless of whether a person is in the database.

(No Ratings Yet)

Loading...

Comments are closed.

ASA HOME

Departments

ADVERTISERS

PROFESSIONAL OPPORTUNITIES
FDA
US Census Bureau

Software
STATA

Contact us

Amstat News
American Statistical Association
732 North Washington Street
Alexandria, VA 22314-1904
(703) 684-1221
www.amstat.org

Address Changes

Amstat News Advertising

Developing a Privacy Policy

ASA Privacy and Confidentiality Committee Members

Background

Recommendation

Guidance

Data Release Policies

Welcome!

ASA HOME

Departments

Archives

ADVERTISERS

QUOTABLE

Editorial Staff

Contact us