Astrostatistics: The Re-Emergence of a Statistical Discipline
Joseph M. Hilbe, President of the International Astrostatistics Association and Chair of the International Statistical Institute’s Astrostatistics Committee
The resurrection of the new astrostatistics began with the quint-annual Statistical Challenges in Modern Astronomy conferences in 1991, originated by Eric Feigelson and G. Jogesh Babu of Pennsylvania State University. The conferences brought together astronomers and statisticians to present papers and discuss how astronomers could benefit from statistics. Later that decade, collaborations developed between members of the astronomical and statistical departments of several universities and research institutions. In Europe, a series of CosmoStat conferences have been convened on a periodic basis for some 15 years.
Organizationally, astrostatistics took a giant leap forward at the 2009 ISI World Statistics Congress in Durban, South Africa, where a group of some 40 statisticians and astronomers met for the purpose of forming the first astrostatistics committee authorized by a major astronomical or statistical organization. Led by Joseph Hilbe of Arizona State University/JPL, the committee was authorized under the auspices of the International Statistics Institute (ISI), the world statistics association, in December 2009. Within a month, the ISI Astrostatistics Network was formed, soon growing to more than 200 members.
The independent International Astrostatistics Association (IAA) was approved by the network executive board in January 2012, with the aim of serving as a professional society for researchers having an interest in astrostatistics. With some 430 current members representing 46 nations, the IAA is independent of any other organization, but is formally affiliated with the ISI through the ISI astrostatistics committee and is affiliated with the International Astronomical Union (IAU) through the new IAU Astrostatistics and Astroinformatics Working Group. A similar working group has been created under the scope of the American Astronomical Society.
In 2012, Feigelson secured Penn State University Eberly College of Science sponsorship for an astrostatistics and astroinformatics portal (ASAIP), with a webmaster funded through the college. Under the co-editorship of Feigelson and Hilbe, the portal has a current membership of more than 700, representing more than 50 nations. The portal is available for public outreach and internal use by four organizations: the IAA, the IAU Working Group, the AAS Working Group, and the Information and Statistical Sciences Consortium of the planned Large Synoptic Survey Telescope (LSST/ISSC). The LSST aims to take more than 200,000 pictures, or 1.28 petabytes of data, per year for a 10-year period. The Big Data problems of how best to analyze the data are enormous.
Astrostatistics as a discipline has made huge strides over just a five-year period, from occasional conferences and paper sessions to having its own professional association; working groups authorized by the AAS and IAU; ISI standing committee; and ASAIP website with information about every facet of the discipline, including blogs and job postings, articles posted for review, and future conferences. All give evidence of a discipline coming to fruition A premiere event marking yet another transition in the discipline occurs next May when the first IAU Symposium on Astrostatistics will be held in Lisbon, Portugal.
Statisticians with an interest in astronomy are welcome to join the IAA and become involved in the development of the discipline. Due to external subsidies, there are no membership dues for the portal or IAA. Contact Hilbe at firstname.lastname@example.org for more information.
If statistics can be generically understood as the science of collecting and analyzing data for the purpose of classification and prediction and of attempting to quantify and understand the uncertainty inherent in phenomena underlying data, surely astrostatistics must be considered as one of the oldest, if not the oldest, applications of statistical science to the study of nature. Astrostatistics is the discipline dealing with the statistical analysis of astronomical and astrophysical data. It also has been understood by most researchers in the area to incorporate astroinformatics, which is the science of gathering and digitalizing astronomical data for the purpose of analysis.
I mentioned that astrostatistics is a very old discipline—if we accept the broad criterion I gave for how statistics can be understood. Egyptian and Babylonian priests who assiduously studied the motions of the sun, moon, planets, and stars as long ago as 1500 BCE classified and attempted to predict future events for the purpose of knowing when to plant, determining when a new year began, and so forth. However, their predictions were infused by the attempt to understand the effects of the celestial motions on human affairs (astrology). Later, Thales (d 546 BCE), the Ionian Greek reputed to be both the first philosopher and mathematician, apparently began to divorce mythology from scientific investigation. He is credited with predicting an eclipse in 585 BCE, which he allegedly based on studies made of previous eclipses from records kept by Egyptian priests.
Hipparchus (190–120 BCE) applied descriptive statistics and a keen mind to the calculation of the precision of the equinoxes, as well as to the length of the topical year. His calculations were only some six minutes from the value we accept today. As the acknowledged founder of trigonometry, he conjoined it with statistical analysis to calculate the median distance to both the Sun and Moon in terms of Earth radii. His predicted mean distance to the Moon was only 0.2 radii off.
The next technological leap began with Galileo (1564–1642), who, in addition to constructing the first telescope, asserted that measurement error is symmetrically distributed, with smaller errors occurring with higher frequency than larger errors. He also concluded that the mean of the errors is zero. Galileo seemed to understand fully that astronomical observations came with associated error as well.
Kepler (1571–1630) spent two years developing an elliptical model of Mars’ orbit around the Sun based on a noisy, unevenly spaced time series. This was a remarkable achievement in model fitting and selection.
It was not until young Carl Gauss’s (1777–1855) discovery of a basic form of least squares regression in 1794 and its application to predicting the mean apparent location of the asteroid Ceres as it came to view from behind the orbit of the Sun in 1801 that inferential statistical techniques were applied to astronomical events. This prediction gave Gauss credentials to become director of the royal observatory of Gottingen, a position he kept for the rest of his life.
With these historical accomplishments, one might think statistics and astronomy would be closely tied together from that point on, but this was not to be the case. Astronomy and physics were being conjoined during this period to form the discipline of astrophysics, and developments in calculus and differential equations were much more useful for astronomical and astrophysical research than least squares regression and most other then popular statistical interests. Least squares regression requires a matrix inversion, which becomes ever more tedious as predictors are added to a statistical model.
Statistics definitely advanced from the period of Gauss, Legendre, Laplace, and Poisson. In fact, astronomers were using mathematics to great effect in predicting and locating the new planets of Uranus and Neptune. Regression was occasionally used in prediction, though, such as when Edwin Hubble used least squares regression for predicting galactic distance based on spectral redshift, resulting in the discovery of an expanding universe. For the most part, however, astronomers did not pay much attention to any inferential statistical methods beyond linear regression. This bifurcation between the disciplines largely maintained until near the end of the twentieth century.
The advance and use of digital computers commenced in the mid-twentieth century, greatly enhancing both statistics and astronomy. In fact, computers revolutionized both disciplines, allowing storage and computational capabilities far beyond what was thought possible a half century before. With the first IBM personal computer rolling out in August of 1981, statisticians could interactively model data using a variety of techniques that were simply not feasible before. By the late 1980s personal computers became an essential tool for statistical analysis, including data management and graphical design. Many statisticians turned to PCs from their previous reliance on non-interactive mainframes.
Beginning in the early 1990s, there were a few astronomers who perceived how statistics was changing and how the advance in data management and analytic capabilities could benefit astronomical evaluation.
It is clear that the statistical capabilities of the pre-1990 era were not sufficiently sophisticated to attract more than a relatively few astronomers. Huge advances in astronomy/astrophysics were continually being made without the need for statistics or statistical collaborations. This situation, however, began to change toward the end of the last decade. The new data-gathering technologies being developed were generating data in mindboggling amounts. A number of NASA/JPL astronomers were continually expressing concern that they did not have the statistical expertise to understand the data of their study better. I heard this many times on JPL conference calls with study project directors. The difference was that many astronomers came to discover how sophisticated statistics had become as a result of greatly enhanced computing capabilities, the advance of nonparametric and time series methodologies, and the growing interest in Bayesian modeling within the statistical community. Astronomers perceived that these were all capabilities important to the analysis of astronomical data, and that except for a few astronomers, they did not understand these areas of analysis well enough to implement them into their own research.
By the latter part of the last decade, the time was right to develop a program whereby astronomers could learn the following:
- Which statisticians had the interest and ability to collaborate successfully on a study project
- How they could learn how to engage in the appropriate statistical analysis of their data
- How to identify others who may have similar statistical needs or who previously used a method they may need
- What methods had been developed that they can use for their project analysis, or how best to enhance current statistical capabilities into new areas
- How they can develop new statistical methods appropriate for their needs
These points could be implemented best by unifying astrostatistics as a discipline in which astronomers/astrophysicists, statisticians, and information scientists could be aware of each other’s abilities and, perhaps, be able to call on each other in the attempt to understand our universe better.
LSST has been a real stimulus to the new discipline of astrostatistics. Located at 2682 meters (~ 8600’) altitude in northern Chile, the three-mirror reflective telescope will have a wide undistorted view of the sky and take more than 200,000 pictures, or 1.28 petabytes of data, per year for a 10-year period. The Big Data problems of how best to analyze the data are enormous. Currently under construction, LSST is expected to be in full operation by 2022. The tasks—and opportunities—for statisticians cannot be overstated. Every part of the observable universe will be recorded and available for analysis.
We are now firmly in the age of digital astronomy. The amount of data and its complexity is staggering.
A seeming host of other data-gathering mechanisms will be in operation during this and the next decade. Studies are and will be ongoing for most areas of astronomy, including telescopes for gathering data from the radio, microwave, infrared, x-ray, gamma ray, and optical regimes. New generation radio interferometers, in particular, produce vast data sets with enormous informatics challenges. Time domain astronomy, where a region of the sky is observed repeatedly, is an increasingly important element in 21st-century astronomy. Statistical challenges abound in all fields of astronomy, from the most distant reaches of cosmology to the nearby exoplanets.
We are now firmly in the age of digital astronomy. The amount of data and its complexity is staggering—and it is all of real events and objects in our universe. New statistical algorithms and techniques will need to be developed to understand the data. Techniques developed by astrophysicists and astrostatisticians will likely be able to be employed in other areas of statistical application as well (e.g., environmental statistics, ecology, social and political statistics, health outcomes analysis).
Astrostatistics has surely made a resurrection from its primitive beginnings in the ancient world and near dormancy during the past two centuries. It promises to be at the forefront of future Big Data management and analysis. The timing has been ideal for astronomers and statisticians having an interest in the science to work together to better understand our universe. In fact, I will argue that developing graduate degree programs in astrostatistics can benefit both astronomy/astrophysics and statistics. Training students from the outset on how to gather astronomical data for the purpose of subjecting it to appropriate statistical analyses may result in our being able to resolve some of the major problems in astrophysics.