New Report on US Data Infrastructure: An Interview with Robert Groves

1 November 2022
A white man with gray hair, a gray mustache, and glasses smiles. He is wearing a blue suit and lighter blue tie.

Robert Groves is chair of the Committee on National Statistics and served as director of the US Census Bureau from 2009–2012.

A new report from a National Academies Committee on National Statistics panel, “Toward a Vision for a New Data Infrastructure for Federal Statistics and Social and Economic Research in the 21st Century: Mobilizing Information for the Common Good,” deems the “US statistical agencies’ reliance on sample-survey data and census data is unsustainable” and presents an expansive vision for a new data infrastructure to produce “more timely, better quality, and more granular statistics that could answer questions of national interest, support more rigorous research, and facilitate evidence-based policymaking.” Blending data from a variety of sources—federal statistical, program, and administrative agencies; state, tribal, territory, and local governments; private sector enterprises; nonprofits and academic institutions; and crowdsourced or citizen-science data holders—is the central mechanism for the new infrastructure.

In its first of three reports, the panel explains why the US needs a revamped infrastructure and describes its output, needs, attributes, and challenges, as well as how it might be organized. The second and third reports, respectively, will assess the “implications of using multiple data sources for survey programs” and explore the “technology, tools, and capabilities needed for data sharing, use, and analysis.”
To learn more about the panel’s vision, ASA Director of Science Policy Steve Pierson posed the following questions to panel chair Robert M. Groves on behalf of Amstat News.

Please describe what the panel means by “data infrastructure.” What is the relationship of the federal statistical system, including the chief statistician, to it?

First, Steve, thank you for setting up this interview; I hope it will be interesting to Amstat News readers.

The panel chose the word “infrastructure” because the data resources of a country resemble its highways, bridges, and internet backbone. They are absolutely necessary for a functioning, modern society. Inattention to them often produces disasters. These disasters can be avoided through intentional modernization.

The panel envisions a multi-faceted data infrastructure. The key components include the following:

  • Data assets
  • Technologies used to discover, access, share, use, manage, and secure those assets
  • Expertise needed to use the data
  • Rules that govern data access, use, and protection
  • Organizations that manage the data infrastructure
  • Communities whose data is shared and used for statistical purposes

The panel’s vision assumes statistical agencies and other approved users will use data assets for the common good. It viewed the collective data of the country as a national resource, not unlike other infrastructure. Relevant data from federal, state, tribal, territorial, and local governments; private sector enterprises; nonprofits and academic institutions; and crowdsourced and citizen-science data holders would be available only for statistical purposes. The blending of cross-sector data can improve the quality, timeliness, and granularity of statistics, promote research, and support evidence-based policymaking.

The federal statistical system will play a critical role in the envisioned new data infrastructure. Federal statistical agencies will be important data holders, supplying data for approved statistical uses and important data users. The federal statistical system also plays an important role in supporting the research infrastructure for empirical social and economic sciences through the Federal Statistical Research Data Center network. We expect the federal statistical system and chief statistician to be important leaders in a new data infrastructure, but specific roles and responsibilities await further evolution of the vision and the existing federal data ecosystem.

The panel makes a compelling, rigorous case for why the US needs a new data infrastructure in Chapter 2. How would you summarize the need to a member of Congress?

In the panel’s judgment, the current national data infrastructure is ill-equipped to meet the data needs of the 21st century. Today, paradoxically, national statistics face both grave threats but also a historic opportunity.

Throughout our lives, the federal statistical system relied on statistical surveys, but declining survey participation poses a severe threat to national statistics. Yet, at the same time, the country produces unprecedented amounts of digital data about the activities of individuals and businesses.

To meet the demands for credible and timely statistical information, the US needs to mobilize the nation’s ever-expanding data assets. It needs to facilitate statistics that blend data from multiple sources. This can improve the nation’s information resources on which Congress, the executive branch, and thousands of state and local officials make decisions. With modern computational techniques, this can be accomplished without new threats to privacy.

Currently, data acquisition, access, and use are siloed, inefficient, and largely uncoordinated. Data is being collected that cannot be effectively used to inform policymakers. Laws and regulations remain major obstacles to accessing and using federal statistical, program, and administrative data, as well as state, local, territorial, and tribal government data. Private-sector data use is bespoke and often costly, with no inherent sustainability. Most data holders have no incentives to contribute or share their data for the common good. Privacy-protecting behaviors of data holders are highly variable and largely unregulated; there is little transparency and accountability regarding data use, and data subjects, data holders, and data users are rarely engaged in data use and data governance decisions.

Is it accurate to describe the envisioned data infrastructure as one that produces the blended data from a variety of sources for enhanced statistical information and evidence-building using new statistical methods and designs, new partnerships, state-of-the-art data management capabilities, and a new entity?

Yes, that’s how the panel sees it. The vision is a necessary first step, but a vision alone is insufficient. The panel recognizes the daunting challenges and obstacles we face and understands implementing the vision will take time and resources. But engaging key stakeholders, forging new partnerships, working toward a shared vision, and reaching consensus on short- and medium-term activities that move us toward the vision is crucial.

Expanding available data assets beyond the federal government raises new challenges, and the panel suggests consideration of different organizational options to facilitate cross-sector blending and identifies unanswered questions related to data governance and data entity roles and responsibilities. These and other components require further study. We expect additional workshops and reports beyond the first three will be needed to fully implement the vision.

How does this report build on important work of the Commission on Evidence-Based Policymaking, Advisory Committee on Data for Evidence Building, 2017 CNSTAT reports, Evidence Act, National Secure Data Service, and other reports or activities?

The panel saw a remarkable synergy among these various developments. These initiatives provided the initial building blocks for a new data infrastructure. The panel’s aim was to complement, build on, and extend their work to inform a vision for a new national data infrastructure for national statistics and social and economic research in the 21st century.

The Evidence Act provided federal statistical agencies with a statutory basis for accessing and using data assets of federal nonstatistical agencies, as well as expanding secure access to statistical agency data assets, unless prohibited by law. The panel supports these provisions and, like the Advisory Committee on Data for Evidence Building, calls for the Office of Management and Budget to implement needed Evidence Act regulations and rulemaking.

The Commission on Evidence-Based Policymaking recommended that state earnings data and state-collected data acquired by federal departments be shared for evidence-building purposes, but the Evidence Act did not address these recommendations. The panel, like the Advisory Committee on Data for Evidence Building, supports these recommendations. But the panel concludes that expanding access beyond just state data to include also tribal, territory, and local government data for statistical purposes can benefit the nation and government entities. All the logic that supports blending of administrative data with statistical survey data to construct better statistical information applies to state and local government data, as well.

Supplementing the Commission on Evidence-Based Policymaking and Evidence Act, the panel sees the merit of blending federal statistical and administrative data with private-sector data. The blending of statistical agency data with private-sector data sources, as well as state and local government, nonprofit and academic institutions, and crowdsourced or citizen-science data, can produce more granular, timely, and relevant statistical information and enhance social and economic research.

The Commission on Evidence-Based Policymaking recommended establishing the National Secure Data Service to facilitate access to and use of information for evidence-building. The Evidence Act was silent regarding the National Secure Data Service. The Advisory Committee on Data for Evidence Building year one report presented a vision for the National Secure Data Service, but in January 2022, OMB suggested a demonstration project be established.

President Biden recently signed the CHIPS and Science Act of 2022, which authorized a National Secure Data Service demonstration project at the National Science Foundation. The demonstration project represents progress, but the panel sees a further value of the service beyond evidence-building for facilitating the blending of diverse data sources, including data from private enterprises. Broadening the scope of data assets complicates decisions related to organizational structures and governance. The panel concludes there are multiple structures that can support a new data infrastructure and different options should be considered.

The development of a consensus vision for a new data infrastructure requires that the country leverages all the expertise and learnings we have accumulated over the past six years. These reports have much in common. Building on, learning from, and extending these initiatives to mobilize information for the common good will be critical in achieving a shared vision and building a new national data infrastructure.

The panel heard reports of work underway by the statistical agencies toward the new data infrastructure, including record linkage and use of private and administrative data sources. What most impressed or encouraged you?

This was one of the great joys of the panel—to see how much energy exists within the agencies to build this new infrastructure!

The panel had two important takeaways. First, the US is not unique. The 2021 December workshop described current efforts by Statistics Canada, the UK Office for National Statistics, and Statistics Netherlands to leverage private-sector and other data assets to improve national statistics. The European Commission recently issued a call for evidence and feedback regarding a proposal to make new data sources available for official statistics and statistical purposes consistent with the panel’s conclusions.

Second, federal statistical agencies already are actively engaged in using private-sector data. At the December 2021 workshops, we learned all but one of the 13 designated statistical agencies are using private-sector data assets.

Workshop participants noted that private-sector data utilization for national purposes might greatly improve the quality, timeliness, and granularity of national statistics, as well as improve knowledge of groups that are not well represented in existing surveys. Private-sector data, of course, has limitations, and workshop participants discussed the challenges and limitations of blending private-sector data with administrative and survey data. But to meet these challenges, statistical agencies are actively sharing best practices and lessons learned.

The panel’s vision emphasizes the broadening role of the statistical system beyond enhanced statistical information to also support research and evidence-building. Do you put these latter roles on equal footing as production of statistical information?

No, not given the focus of the panel. The panel’s charge was to produce three reports “that will help guide the development of a vision for a new data infrastructure for federal statistics and social and economic research in the 21st century.” While the panel acknowledges a new data infrastructure will enhance evidence-building, the primary focus was on improving national statistics and enhanced research. The panel intentionally did not duplicate the Advisory Committee on Data for Evidence Building’s excellent work on evidence-building.

Federal statistical agencies’ primary responsibility will remain unchanged: “to produce and disseminate relevant and timely information; conduct credible, accurate, and objective statistical activities; and protect the trust of information providers by ensuring confidentiality and exclusive statistical use of their responses.” A new data infrastructure supports statistical agencies blending data from multiple sources to improve existing statistical products and create new ones. But the federal statistical system also contributes to a large research infrastructure that provides access to restricted statistical agency and other data assets for approved social and economic research. The Federal Statistical Research Data Center network—an established and sustained model of collaboration between statistical agencies, the Federal Reserve System, and universities—has facilitated research that has illuminated issues of national interest.

Statistical agencies’ data assets can be used for approved evidence-building purposes, when permitted by law. The National Secure Data Service will likely have responsibility for services and capabilities facilitating evidence-building.

The panel provides short- and medium-term activities for achieving the seven attributes of the envisioned data infrastructure and the options for supporting a data infrastructure.

Who will carry out the activities, and is there a lead entity in driving or coordinating the activities?

The panel was a creation of the Committee on National Statistics of the National Academies. It sought to paint a vision of a better world, with full knowledge that building this new world has challenges. It took this step to prompt a wider discussion and deliberation.

Our immediate goal is to share the report with interested stakeholders in the statistical system, Congress, research communities, state and local governments, and the private sector. The report suggests 25 short-term and 40 medium-term activities that could help achieve the full vision, but no entity or organization is currently charged with carrying out these activities. Our goal is to start a discussion about the vision, build support, and identify important next steps to move ahead.

The statistical community served by Amstat News will be key stakeholders in shaping future data infrastructure. The panel encourages independent actions to bring the vision into reality. How can readers contribute to fulfilling the panel’s vision for a new infrastructure?

First, we encourage folks to read the report and view the recording of our October 13 public seminar. That session highlighted key takeaways from the report and plans for future data infrastructure–related activities. Also, read the full report, as well as report highlights and a brief for policymakers. Finally, visit the project’s interactive website, which provides report content in an easily digestible and accessible format and offers an opportunity to give feedback and suggestions.

