Home » Featured

Statisticians Create COVID-19 Dashboard to Predict Infection

1 June 2020 3,394 views 2 Comments
Collaborators
Lily Wang, Associate Professor in Statistics, Iowa State University
GuanNan Wang, Assistant Professor in Mathematics, College of William & Mary
Lei Gao, Assistant Professor in Finance, Iowa State University
Xinyi Li, Postdoctoral Fellow, Statistical and Applied Mathematical Sciences Institute / The University of North Carolina at Chapel Hill
Shan Yu, PhD Student in Statistics, Iowa State University
Myungjin Kim, PhD Student in Statistics, Iowa State University
Yueying Wang, PhD Student in Statistics, Iowa State University
Zhiling Gu, PhD Student in Statistics, Iowa State University
Yuan Gu, Senior Undergraduate Student in Computer Science and Mathematics, College of William & Mary

A group of statistics professors and students from Iowa State University, The University of North Carolina at Chapel Hill, and the College of William & Mary created a dashboard with multiple shiny apps embedded to visualize, track, and predict real-time COVID-19 infections and deaths in the US. The dashboard has received attention, so we asked the group to answer several questions about their work. 

Your dashboard has already gotten a lot of positive attention. Where did the idea come from?

Thank you for your positive comments about our dashboard. The original idea of developing a dashboard was to illustrate our research findings for COVID-19.

An essential question for developing a defense against COVID-19 is to understand how far the virus will spread and how many lives it will claim. In the early stage of the outbreak, especially when the outbreak is fast-moving like the coronavirus, there are many uncertainties. It is not clear to anyone where this crisis will lead us. One way to answer these questions is through scientific modeling.

We started our work at the time of the outbreak of COVID-19 in late January. We thought that data visualization could be a good starting point for the users to understand how far the virus will spread and to illustrate our findings and statistical insights. Besides, the ability to visualize, track, and predict the spread of the coronavirus can help raise awareness and understanding of the impact of the virus and ultimately assist in prevention efforts.

What is the purpose of the dashboard?

Our research aims to help the local communities as well as guide evidence-based decision-making. The purpose of establishing the dashboard was to provide a user-friendly tool to visualize, track, and predict infected and death cases of COVID-19 in the United States. The dashboard illustrates our research findings on the spread of the virus. It is well known that schools, workplaces, and businesses can contribute to the transmission of COVID-19. The dashboard can also assist in evaluating the consequences of disease spread and helping the policy/decision-makers determine the actions related to a workplace/business/school during the COVID-19 pandemic. Finally, our dashboard can be used to facilitate the research effort to confront COVID-19.

Who are you hoping will use the dashboard?

The potential users are policy/decision-makers, researchers, and the general public. Currently, we offer two main R shiny apps in our dashboard. The first app is targeted to serve the local communities, and we provide a real-time seven-day forecast of the infection/death count up to the county level.

The other app offers a four-month prediction based on the most recent data, and it is updated weekly. This app is useful for policymakers and public health leaders who want to understand how this outbreak may unfold through time and space in the future. For example, it can give hospitals an idea of how quickly they need to expand their capacity and by how much.

We also included multiple small apps to share our findings and insights with users, which is suitable for the general public. 

You had a nice big team working together to create this dashboard. Describe your process for working while staying socially distant.

We are a great team, and everybody works around the clock and puts a lot of effort into this project. We have been holding all the group meetings remotely since the outbreak. At the beginning of the outbreak, the group spent a lot of time compiling and collecting the data, studying the epidemic literature, and discussing the methodology remotely. When we found our method was able to obtain accurate predictions, we decided to build the dashboard to help the local communities and decision-makers use our research findings.

To develop the dashboard efficiently, we divided the whole team into several sub-groups. We have sub-groups working on data collection and cleaning, implementing statistical models, developing the software packages and shiny apps, and constructing the website.

Since the dashboard was launched on March 27, we have scheduled a brief group meeting every day to discuss the release of the new forecast. Note that state departments of public health tend to release their data in the evenings. To provide accurate and timely forecasts, we often have to work from midnight until 8:00 a.m. the next morning so we can confidently release our forecasts before people start to check our dashboard in the morning.

When did you start working on the dashboard? How long did the entire project take?

We started working as a team in late January when the COVID-19 cases were reported in the United States. At that time, we mainly focused on reading literature and news and studying the existing epidemic models together.

In February, we established a new spatiotemporal epidemic modeling (STEM) framework for space-time infected/death count data, and we started to collect data and conduct the analysis using our proposed STEM method.

After many rounds of tests, our dashboard was launched on March 27. Since the launch, we received lots of helpful comments and suggestions on how to improve our service. Recently, we have gotten many requests to develop a mobile app based on our dashboard, which will come out very soon. Our research and data will be continuously improved as the pandemic progresses.

The sources you used to collect the data are publicly available (e.g., news articles, press releases, and published reports from public health agencies). What process did you undertake to determine the sources were accurate?

First, thanks to the contributions of institutions and organizations like the Center for Systems Science and Engineering at The John Hopkins University, The New York Times, and The Atlantic, we are able to access daily confirmed/fatal cases and historical data that dates back to January 20, 2020. Besides these publicly available data repositories, we also collected data from the World Health Organization, US Centers for Disease Control and Prevention (CDC), and the health department website in each state or region, as well as press releases. Data on timings of interventions were compiled by checking national and state governmental websites, executive orders, and newly initiated COVID-19 laws. Sometimes, multiple data sources may not agree with each other, but we are trying our best to identify inconsistencies and correct them based on various sources like official reports and news.

What would you say was the most difficult challenge—technical, logistical, or otherwise—you encountered while building the dashboard? How did you manage to solve it?

One of our biggest challenges is the data-collecting process. We realize there are some quality issues such as under-reporting, delayed reporting, and inconsistencies between state and county sources. We also noticed there could be sudden decreases in the newly confirmed cases and the newly confirmed death count on weekends, which might be due to the reduced testing on weekends. Reporting criteria was also an issue for us; the confirmed cases and probable cases used to be combined for some states while not for the others. CDC addressed this issue and suggested reporting the total. While there is no ultimate solution to eliminate all the issues, we cross-check sources and make sensible data adjustments and source selections case by case.

Besides that, another big challenge we encountered was a technical issue. Our dashboard was initially held on Amazon AWS, and that crashed several times in the first week of launch due to the traffic volume being much higher than we expected. With IT support from Iowa State University, we eventually were able to move the dashboard to a university server. 

Now that the dashboard is up and running, what happens next? Will you maintain it? If so, how?

We are currently developing an R package and a mobile app. We will continue to maintain the dashboard and update our short-term forecast daily and long-term projections every week until we are comfortable with stopping it.

Brag a little. Of what aspect of the dashboard are you most proud?

We are most proud of the versatility of our dashboard. Our dashboard is a useful tool not only to visualize and track infected and death cases of COVID-19 in the US, but also to offer a real-time seven-day forecast of the COVID-19 infected/death count at both the county level and state level. In addition, we provide the corresponding risk analysis and a four-month projection with a prediction band to assess uncertainty. We also launched a series of “statistical insights” that highlight indicators that tend to be less visible but provide interesting evidence for analysis and policy-making related to COVID-19.

What do you think makes your dashboard unique?

As far as we know, our dashboard is the first one to provide both the short-term forecast and long-term projection up to the county level, besides the function of tracking and visualization of the spread of COVID-19. Also, we provide the prediction intervals to quantify the uncertainty in the prediction results. We treat it as a unique statisticians’ viewpoint to learn the disease pattern from the data.

Looking back at your work on the project, how do you think it has changed the way you’re dealing with the pandemic?

Along the path of working on this project, we have gained more insights regarding COVID-19. In the beginning, we were worried about the unknown pandemic, like how far the virus will spread and how it can impact economics, public health resources, and our daily life. During the study and research about the pandemic, we got to learn more about the disease. We have been collecting and compiling data from a combination of public sources—like public health agencies and open repositories—updating the status by checking official websites and press releases, and actively studying the literature regarding the models for infectious disease. All these efforts contribute to developing our methodologies in our work. We validate the effectiveness and efficiency of our model through the data, and our analysis reveals that the control measures significantly help to “flatten the curve.” We have also discovered the spatiotemporal dynamic pattern of COVID-19 and identified critical health care infrastructure, demographic features, and socioeconomic factors that help explain the variation of transmission rate in space and time. In addition, based on these findings, we are able to provide short-term forecasts and long-term projections, which helps us conclude when the pandemic will end.

Does seeing all the data illustrated give you a different perspective?

Yes, we connected the dots gradually after working on the analysis every day. Seeing and analyzing the data allows us to understand how the spread dynamics of COVID-19 vary over time and space. We also realized the data quality issues, like missing data or under-reporting, could increase the difficulty in an accurate forecast. We learned how the local area characteristics are associated with the spread of the disease, which might provide guidance for local policy-making. Also, by aggregating our results, we could better see the performance of our model at different levels and time periods.

Visit the dashboard and read more about the team’s goals and methods.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

2 Comments »

  • Robert Pearson said:

    It appears the dashboard is no longer available.

  • Megan Murphy said:

    Checked in again to be sure, and it appears to be up and running now https://covid19.stat.iastate.edu/