Home » Cover Story

Analyze Weather Forecasts for Data Expo 2018

1 January 2018 2,446 views No Comment
Wendy Martinez and Jenny Guarino

    The Statistical Computing and Graphics sections will host the Data Expo in 2018.

    The data consist of three years of weather forecasts for 113 cities in the United States harvested from the National Weather Service website. Historical data that do not necessarily match the location of the forecasts are also provided, and contestants are allowed to use additional weather data in their analyses.

    Possible questions for analysis include the following:

    • What is the distribution of the errors in the forecast?
    • Are some locations more stable or variable than others?
    • How has the weather changed over the three years?

    Contestants must present their results at JSM 2018 in Vancouver, British Columbia. Group entries are welcome. To enter, submit a speed session abstract by February 1, 2018. (It doesn’t need to be perfect or specific. Abstracts can be modified later.) 

    After the abstract is submitted, contestants must send an expression of interest/intention by February 2, 2018, to Radu Herbei and Leanna House. The email should include the submitted abstract and abstract number.

    Going Back … Way Back

    The Statistical Computing and Graphics sections have been sponsoring the Data Exposition (Expo) for many years, during which they have challenged contestants to analyze a given data set. The first challenge took place in 1982 and was sponsored by the then Committee on Statistical Graphics. The stated purposes of the first exposition were “(1) to provide a forum in which users and providers of statistical graphics technology can exchange information and ideas and (2) to expose those members of the ASA community who are less familiar with statistical graphics to its capabilities and potential benefits.”

    More about past data expos can be found at the ASA Sections on Statistical Computing and Statistical Graphics website.

    It is interesting to see the changes in the data sets over the years. The data set for the 1983 Data Expo had measurements of mpg, number of cylinders, and displacement on 406 automobiles.

    It is not surprising that the size of the data sets has grown. For example, the airline on-time data used in 2009 contains approximately 120 million records (12 gigabytes) consisting of flight arrival and departure details for all commercial flights within the Unites States from October 1987 to April 2008. The airline data set has become widely used in machine learning and data science research.

    The Government Statistics Section (GSS) started to issue annual buy ativan the uk data challenges in 2015. The contests were open to anyone interested in participating, including college students and professionals from the private or public sector. These contests challenged participants to analyze a government data set using statistical and visualization tools and methods.

    Mike Jadoo from the Bureau of Labor Statistics participated in two data challenges as a contestant in the professional category. He has this to say about the experience:

    I participated in two data challenges, and in my opinion, the experience was great. From my participation, I gained more skills in programming and analyzing data and was able to bring those abilities back to the office that I work for. I have also shared the skills I attained with the students I teach, which has made a big difference in the classroom experience. Students love to hear how the topics they are learning [about] can actually be applied in different situations.

    Professors have found the data challenges to be good teaching tools. Several entries into the GSS expos have been a team of students from a statistics class in which the analysis of the challenge data set was the main focus.

    Eric Kolaczyk of the Center for Information and Systems Engineering at Boston University used the 2017 Data Challenge in a unique way. He held his own contest in the classroom, where each student was asked to learn about the data and conduct their own analysis. The winning student’s project was then submitted as the entry.

    In some cases, the contestants continue to interact with government personnel providing the data. For instance, Jonathan Auerbach of Columbia University and a winner in the 2016 GSS Data Challenge was funded by the Evaluation of Low Cost Safety Improvements Pooled Fund Study to present his award-winning paper to 40 state member representatives in its annual Technical Advisor Meeting. The paper offered a new statistical methodology for highway safety evaluations and presented a fresh perspective on the evaluation of pedestrian safety improvement, according to Roya Amjadi of the Federal Highway Administration.

    Contestants in the Data Expo and Data Challenge have also had the opportunity to publish their results in a special issue of the refereed journal Computational Statistics. Editor-in-chief Juergen Symanzik and the co-editors of the special issues are currently working on the 2016 and 2017 challenge issues, making the articles fully reproducible.

    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading...

    Comments are closed.