Statistical Scientists Advance Federal Research Initiatives
Steve Pierson, ASA Director of Science Policy
Responding to calls from the National Science Foundation (NSF) and White House Office of Science and Technology Policy (OSTP), three ASA groups have written whitepapers detailing how statisticians can contribute to administration research initiatives and priorities. The whitepapers, profiled at the end of this article, cover the BRAIN Initiative , the Big Data Research and Development Initiative, and climate change. Each was written by a group of about 12 people, with Rob Kass chairing the BRAIN Initiative group, Cynthia Rudin the Big Data group, and Bruno Sanso climate change.
The whitepapers are, in part, a response to invitations from the current and former directors of the NSF Division of Mathematical Sciences (DMS)—Michael Vogelius and Sastry Pantula, respectively—to provide whitepapers to inform future NSF budgets. As described in a November ASA Community blog entry by ASA Director of Science Policy Steve Pierson, ASA staff also received various indications from other NSF directorates, OSTP, and the White House Office of Management and Budget for the utility of such whitepapers. Perhaps most compellingly, the computer science community has had enormous success in guiding federal research direction at NSF and OSTP through the Computing Community Consortium (CCC) whitepapers, as described in a November Amstat News column.
As the whitepapers were being developed, ASA staff informed NSF (including DMS) officials of the efforts and received guidance for their content. Since their completion, these discussions have continued and widened to include officials from OSTP and other agencies. The feedback so far has been encouraging. For example, Vogelius commented, “These efforts of the ASA are in my opinion very valuable and will be helpful in advancing the central idea that mathematics and statistics are indispensable tools when dealing with the OSTP research priority areas.”
The common themes of the whitepapers are three-fold:
(i) Statistics/statisticians can help to make important advances on OSTP priorities
(ii) The most productive approach will involve multidisciplinary teams of statisticians, domain scientists, and others (e.g., computer scientists)
(iii) There is a need to attract, train, and retain the next generation of statisticians so as to contribute to all interdisciplinary research challenges
Completing such whitepapers takes the willingness of members of the community to work on them. Asked his motivation for leading the ASA BRAIN Initiative group, Carnegie Mellon University statistics professor Rob Kass noted the opportunity to work with a group of accomplished statisticians with extensive experience in the brain sciences and the window of providing some guidance, especially to NSF as it develops its funding priorities connected to the BRAIN Initiative for the coming year.
Asked her motivation, MIT professor Cynthia Rudin, who led the ASA Big Data working group, replied, “I was honored to be asked by Marie Davidian to lead this group. I was also surprised, given my background is machine learning—not typically considered to be “mainstream” statistics. I have appreciated the open mindedness of the wonderful statisticians with whom I have had the pleasure to collaborate—people who share the view of the blurred porous boundaries between statistics and computer science—and their many synergies. The “big tent” view of statistics exemplifies that “statistics” is very broad, and includes the type of work I do. By asking me to lead this effort, Marie showed that the statistics community is willing to put people who might sometimes be considered at the outer edges of the culture right into the middle. In that sense, how could I not agree to lead this effort?”
Bruno Sanso noted as his motivation to lead the climate change initiative the opportunity to foster collaborations between statisticians and climate scientists, enhance the participation of statistical scientists in federal research funding, increase the visibility of statisticians in NSF, and build awareness among policymakers about the importance of funding research that increases our understanding of the uncertainty in climate change. He also saw the effort as a way to build awareness within the statistical community of the important role statisticians play in the study of climate and the many opportunities the field offers for developing interesting statistical models and implementing sophisticated computational methods.
2013 ASA President Marie Davidian, who has led many of the ASA’s meetings at NSF and OSTP and with CCC, commended members of the three groups for their accomplishments, “I’m very grateful to Cynthia, Bruno, and Rob for their leadership in the writing of these three whitepapers and to all the authors who contributed to them. I view these whitepapers as an important way for the statistical community to convey the message that statisticians play a vital role in advancing the science important to our nation and society.”
For next steps—besides further outreach to NSF, OSTP, and other agencies on these whitepapers—the ASA will explore whitepapers on other topics. The annual OMB/OSTP research and development memo for FY15 includes the following multi-agency priorities:
- Advanced manufacturing
- Clean energy
- Global climate change
- Research and development for informed policymaking and management
- Information technology
- Research and development for national security missions
- Innovation in biology and neuroscience
- Science, technology, engineering, and mathematics (STEM) education
- Innovation and commercialization
If you are interested in helping with a whitepaper on any of these topics, contact Pierson at email@example.com.
Bruno Sanso, chair of the ASA Advisory Committee on Climate Change Policy; chair of the Department of Applied Mathematics and Statistics, University of California, Santa Cruz
Authors: Bruno Sanso, University of California, Santa Cruz (chair); L. Mark Berliner, The Ohio State University; Daniel S. Cooley, Colorado State University; Peter Craigmile, The Ohio State University; Noel A. Cressie, University of Wollongong; Murali Haran, The Pennsylvania State University; Robert B. Lund, Clemson University; Douglas W. Nychka, National Center for Atmospheric Research; Chris Paciorek, University of California, Berkeley; Stephan R. Sain, National Center for Atmospheric Research; Richard L. Smith, Statistical and Applied Mathematical Sciences Institute; Michael L. Stein, The University of Chicago
Climate data sets are increasing in number, size, and complexity and challenge traditional methods of data analysis. Satellite remote sensing campaigns, automated weather monitoring networks, and climate-model experiments have contributed to a data explosion that provides a wealth of new information but can overwhelm standard approaches. Developing new statistical approaches is an essential part of understanding climate and its impact on society in the presence of uncertainty. Experience has shown that rapid progress can be made when Big Data is used with statistics to derive new technologies. Crucial to this success are new statistical methods that recognize uncertainties in the measurements and scientific processes but also are tailored to the unique scientific questions being studied.
This white paper makes the case for the National Science Foundation (NSF) to establish an interdisciplinary research program around climate, where statisticians have the opportunity to collaborate with researchers from other disciplines to advance the understanding of the climate system (e.g., quantification of uncertainties, the development of powerful tests of scientific hypotheses). Although NSF supports basic and applied statistical research, these efforts often do not involve scientists and statisticians in partnerships or in teams to address problems in climate science. This program also would address the critical need for training a new generation of interdisciplinary researchers who can tackle challenging scientific problems that require complex data analysis by developing and using the necessary sophisticated statistical methods.
Cynthia Rudin, chair of the ASA Big Data R&D Initiative Working Group; Computer Science and Artificial Intelligence Laboratory and Sloan School of Management, MIT
Authors: Cynthia Rudin, MIT (chair); David Dunson, Duke University; Rafael Irizarry, Harvard University; Hongkai Ji, The Johns Hopkins University; Eric Laber, North Carolina State University; Jeffrey Leek, The Johns Hopkins University; Tyler McCormick, University of Washington; Sherri Rose, Harvard University; Chad Schafer, Carnegie Mellon University; Mark van der Laan, University of California, Berkeley; Larry Wasserman, Carnegie Mellon University; Lingzhou Xue, The Pennsylvania State University
The Big Data Research and Development Initiative is now in its third year and making great strides to address the challenges of Big Data. To further advance this initiative, we describe how statistical thinking can help tackle the many Big Data challenges, emphasizing that often the most productive approach will involve multidisciplinary teams with statistical, computational, mathematical, and scientific domain expertise.
With a major Big Data objective of turning data into knowledge, statistics is an essential scientific discipline because of its sophisticated methods for statistical inference, prediction, quantification of uncertainty, and experimental design. Such methods have helped and will continue to enable researchers to make discoveries in science, government, and industry.
The paper discusses the statistical components of scientific challenges facing many broad areas being transformed by Big Data—including health care, social sciences, civic infrastructure, and the physical sciences—and describes how statistical advances made in collaboration with other scientists can address these challenges. We also emphasize the need to attract, train, and retain the next generation of statisticians necessary to address the research challenges outlined here.
Rob Kass, chair of the ASA BRAIN Initiative Working Group; Department of Statistics, Department of Machine Learning, and Center for the Neural Basis of Cognition, Carnegie Mellon University
Authors: Robert E. Kass, Carnegie Mellon University (chair); Genevera Allen, Rice University; Brian Caffo, The Johns Hopkins University; John Cunningham, Columbia University; Uri Eden, Boston University; Timothy D. Johnson, University of Michigan; Martin A. Lindquist, The Johns Hopkins University; Thomas A Nichols, University of Warwick; Hernando Ombao, University of California, Irvine; Liam Paninski, Columbia University; Russell T. Shinohara, University of Pennsylvania; Bin Yu, University of California, Berkeley
The BRAIN (Brain Research through Advancing Innovative Neurotechnologies) Initiative aims to produce a sophisticated understanding of the link between brain and behavior and to uncover new ways to treat, prevent, and cure brain disorders. Success in meeting these multifaceted challenges will require scientific and technological paradigms that incorporate novel statistical methods for data acquisition and analysis. Our purpose here is to substantiate this proposition and identify implications for training.
Brain research relies on a wide variety of existing methods for collecting human and animal neural data, including neuroimaging (radiography, fMRI, MEG, PET), electrophysiology from multiple electrodes (EEG, ECoG, LFP, spike trains), calcium imaging, optical imaging, optogenetics, and anatomical methods (diffusion imaging, electron microscopy, fluorescent microscopy). Each of these modalities produces data with its own set of statistical and analytical challenges. As neuroscientists improve these techniques and develop new ones, data are being acquired at very large scales …
These advances have begun to produce exciting breakthroughs. But, to realize their potential, new analysis and computational techniques are needed to optimize data acquisition; manage acquired data on the fly; screen and segment the data; correct for artifacts; and align and register data across multiple time points, multiple experiments, multiple subjects, or different laboratories. In addition, as the data-generation process becomes more complex and the data sets get larger and more varied, it is crucial that reliability and scientific relevance of results be assessed against the backdrop of natural variation and measurement noise. This is the essential role of statistical analysis.