Home » President's Corner

Take Advantage of the ASA’s Data Challenge Opportunities

1 January 2020 984 views No Comment

Wendy Martinez. Photo courtesy of Studio B Photography Barbi Barnum

One might expect the first column of an ASA president to describe the ASA initiatives for the coming year. I am going to deviate from this practice and write about our initiatives in the February issue of Amstat News. Instead, I will focus on ASA data challenge opportunities in this first article.

Many readers might be familiar with the Kaggle competitions, the KDD Cup, and—of course—the famous $1 million Netflix prize. However, did you know the ASA has a history of issuing data challenges that pre-date all of these? The Statistical Computing and Statistical Graphics sections have held a Data Exposition competition with entries being presented and judged at the Joint Statistical Meetings since 1983. Some of these data sets (such as the airline on-time performance data from Data Expo 2009) continue to be used to demonstrate and teach statistical machine learning concepts, which illustrates the importance and impact of these challenges.

The ASA also has an annual Fall Data Challenge for high-school and college students. This challenge typically focuses on a call to address real problems affecting our society. For example, the 2019 Fall Data Challenge used data from the US Department of Housing and Urban Development (HUD) relating to Los Angeles, New York City, and Seattle. There is also a spring competition—Statsketball—to keep the excitement going throughout the school year. This contest uses statistics to make predictions about the NCAA Basketball Tournament.

Then there is the Statistical Impact Competition, which was part of 2019 ASA President Karen Kafadar’s impact initiative and JSM theme—Statistics: Making an Impact. The goal of this challenge was to use data to illustrate areas that have been and could be impacted by the field of statistics. Submissions for the competition have been received and will form the foundation of an Innovation Workshop to be held in the spring of 2020. Participants at the workshop will share ideas, which will result in transdisciplinary collaborations impacting our world.

Another data challenge opportunity will be announced in mid-January. This challenge will be issued as part of the Women in Data Science (WiDS) Conference being held March 2. The WiDS conference is a global event during which data scientists from around the world come together virtually and locally to inspire data scientists, regardless of gender. Regional events can be organized, and there has been one in the DC-MD-VA area for the past several years. We are issuing a data challenge as part of the WiDS 2020 DC-MD-VA regional event, but we are still working on what data set will be used. The plan is to issue the challenge in January, and contestants will present results at the DC-MD-VA WiDS 2020. So, stay tuned to ASA communication channels for details and think about organizing a WiDS event in your region!

Now, back to the longtime data challenges held at the Joint Statistical Meetings, because now is the time to consider entering. Three ASA sections (Computing, Government, and Graphics) came together to sponsor a now-annual Data Challenge Expo. The contest is open to anyone who is interested in participating, including college students and professionals from the private or public sector. This contest challenges participants to analyze a data set using statistical and visualization tools and methods.

The data set for the Data Challenge Expo 2020 is the Global Historical Climatology Network (GHCN). Public use data files and documentation are available on the GHCN website. Contestants must use some portion of the GHCN data, but are strongly encouraged to combine other data sources in their analysis such as IPUMS, NASA’s EarthData, the European Data Portal, or the National Agricultural Statistics Service.

There are two GHCN data sets containing climate data from land surface stations placed around the world and ranging in time from 175 years ago to the past hour. One data set (GHCN Monthly) contains monthly mean temperatures that can be used for climate monitoring. However, the data set that would perhaps be more useful for entries in the competition is the GHCN Daily database. For instance, these data could be used for understanding changes in various growing seasons, assessing the frequency of heavy rainfall and other weather patterns, and describing the frequency of heat waves (see “An Overview of the Global Historical Climatology Network-Daily Database” in the Journal of Atmospheric and Oceanic Technology).

Here are some questions to think about for an analysis; however, contestants should not feel constrained by them. They are just to get the ideas flowing.

  • Is there a long-term trend with respect to temperature? Are there any outliers or anomalies in space or time?
  • Is there a spatial pattern with respect to temperature changes?
  • Are there different geographic regions/clusters that behave differently (e.g., increases, no increases at all, or decreases)?
  • Can you construct a spatio-temporal model that predicts temperatures in 2030 (i.e., some slight extrapolation)? What else might affect the temperatures 10 years from now

Contestants will present their results in a speed poster session at JSM and must submit their abstracts to the JSM online system. Note that judging takes place at JSM and is based on the results presented there. Presenters are responsible for their own JSM registration and travel costs, as well as any other costs associated with JSM attendance. Group submissions are acceptable. To enter, contestants must do the following by February 4:

  • Submit an abstract for a speed poster session via the JSM 2020 website. Specify the Statistical Computing Section as the main sponsor. You may include the Government Statistics Section and Statistical Graphics Section as additional sponsors.
  • Forward the JSM abstract submission email to me.

I would like to end this first column by thanking the outgoing ASA Board members—Lisa LaVange (2018 ASA President), David Williamson (vice president), Amarjot Kaur (treasurer), James Lepkowski (Council of Sections representative), Cynthia Bocci (international representative), and Julia Sharp (Council of Chapters representative)—for their service to our profession. Also, of course, I want to welcome our newest board members—Rob Santos (2021 ASA President), Dionne Price (vice president), Ruixiao Lu (treasurer), Rebecca Hubbard (Council of Sections representative), Alexandra Schmidt (international representative), and Ji-Hyun Lee (Council of Chapters representative). And to all of our members, thank you for letting us serve you.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

Comments are closed.