A Call for Participation in XSEDE Computing: Statisticians Needed for Big Data
George Ostrouchov, Oak Ridge National Laboratory and University of Tennessee
In this age of Big Data, The New York Times and The Wall Street Journal reported that statistics is cool. Yet, when it comes to high-performance computing (HPC), a necessary component in dealing with Big Data, the statistics community is still largely absent. ASA President Marie Davidian says “Mentions of statistics or statisticians were scant” in recent Nature and Science articles on data.
Big Data are becoming synonymous with just “data” as more people use the term and data sets get bigger, but it is mostly the truly big data on big resources that make the news. “Statistical, mathematical, and computational sciences are a key to innovation and discoveries in this era of Big Data and observation,” said Sastry Pantula, director of the Division of Mathematical Sciences at the National Science Foundation (NSF). “NSF is making significant investments in advanced cyber infrastructure (ACI), and it is important for our communities to take advantage of facilities like XSEDE for their research and for training their undergraduate and graduate students.”
XSEDE stands for Extreme Science and Engineering Discovery Environment and is the NSF’s portal for sharing large computer systems (including supercomputers), data, and expertise. It is easy to apply for resources, which can include computing time on large platforms and expert assistance with running codes such as R.
You don’t have to win the lottery (or a big grant) to buy HPC resources. If you are at a U.S. institution, these resources are available almost “just for the asking.” Jump in yourself, or let your students take the training and teach you at the XSEDE website. You can also get a “startup” XSEDE allocation. Ask for “extended collaborative support” for expert help on your project.
Working with these machines is becoming much easier, even with R (e.g., Programming with Big Data in R). After some familiarity with the resources, an “education” allocation can be used for conducting classes and teaching students about parallel computing.
The cool kids, described in The New York Times and The Wall Street Journal, are statisticians who are also computationally savvy. Parallel computing has gone mainstream as the technology that replaced increasing clock rates of hardware. Scalability is now the term that describes software’s parallel ability to use higher core counts, bigger co-processors, and cluster computers. Only scalable parallel algorithms can now benefit from new hardware. Participation in XSEDE computing is an avenue for making sure the next generation of statisticians is made up of the cool kids for data.
To some, this may seem a job for those who care more about the hardware and algorithms than about theoretical statistics. The reality is that theoretical statisticians are desperately needed. All forms of parallel computing have a common theme of how to split up the problem and data into pieces and how to assemble the pieces. This is done best by someone who understands the estimation problem and is able to consider pitfalls and alternatives due to how data partitions and data size affect bias and variability.
The need for participation by statisticians in HPC goes deeper than just bigger data and faster estimation. The mathematics community in predictive simulation science is heavily involved with HPC. Uncertainty quantification is now a hot topic in that community. This is where statisticians’ understanding of uncertainty, developed over the past century, is particularly strong and needs to be brought to the table. Our presence in HPC will help us intersect more heavily with this community and be more relevant for HPC-based prediction.
To the Editor:
I commend Stephen Stigler for his excellent list of 20 outstanding statisticians for the ASA Hall of Fame (March, 2013). However, I was surprised at what seems an important omission. Simon Kuznets, among his many impressive qualifications, had what I believe to be a unique distinction. He served as a president of the ASA (1949) and received the Nobel Prize (Economics, 1971). Kuznets made major contributions to economics, demography, and statistics; was president of the American Economic Association (1954); and provided important guidance to the U.S. government during World War II. His outstanding contributions exemplified the value of cross-fertilization across different fields and between theory and practice.
Herbert I. Weisberg
Herbert Weisberg’s nomination of Simon Kuznets for the ASA Hall of Fame has considerable merit. One other statistical connection he does not mention: Simon’s brother, George, was also connected to our field, teaching statistics in agricultural economics at Berkeley (my first statistical job was as a TA for George Kuznets). Others, commenting on the ASA website, have made good suggestions. All of these will be considered when we “elect” the next group of members in 2014, the year of the ASA’s 175th anniversary.
Stephen M. Stigler
The University of Chicago