The Johns Hopkins University Center for Systems Science and Engineering COVID-19 Dashboard: data collection process, challenges faced, and lessons learned

Ensheng Dong, Jeremy Ratcliff, Tamara D Goyea, Aaron Katz, Ryan Lau, Timothy K Ng, Beatrice Garcia, Evan Bolt, Sarah Prata, David Zhang, Reina C Murray, Mara R Blake, Hongru Du, Fardin Ganjkhanloo, Farzin Ahmadi, Jason Williams, Sayeed Choudhury, Lauren M Gardner, Ensheng Dong, Jeremy Ratcliff, Tamara D Goyea, Aaron Katz, Ryan Lau, Timothy K Ng, Beatrice Garcia, Evan Bolt, Sarah Prata, David Zhang, Reina C Murray, Mara R Blake, Hongru Du, Fardin Ganjkhanloo, Farzin Ahmadi, Jason Williams, Sayeed Choudhury, Lauren M Gardner

Abstract

On Jan 22, 2020, a day after the USA reported its first COVID-19 case, the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) launched the first global real-time coronavirus surveillance system: the JHU CSSE COVID-19 Dashboard. As of June 1, 2022, the dashboard has served the global audience for more than 30 consecutive months, totalling over 226 billion feature layer requests and 3·6 billion page views. The highest daily record was set on March 29, 2020, with more than 4·6 billion requests and over 69 million views. This Personal View reveals the fundamental technical details of the entire data system underlying the dashboard, including data collection, data fusion logic, data curation and sharing, anomaly detection, data corrections, and the human resources required to support such an effort. The Personal View also covers the challenges, ranging from data visualisation to reporting standardisation. The details presented here help develop a framework for future, large-scale public health-related data collection and reporting.

Conflict of interest statement

Declaration of interests We declare no competing interests.

Copyright © 2022 Elsevier Ltd. All rights reserved.

Figures

Figure 1
Figure 1
Johns Hopkins University Center for Systems Science and Engineering Dashboard usage and milestones, including the number of requests and views Requests refer to the number of times a visitor interacts with the dashboard system, such as clicking on a specific country. Views refer to the number of times the dashboard, either desktop or mobile versions, is loaded on the visitor's end. Global COVID-19 cases and deaths in dashed lines are reference plots. (A, B) Total usage from Jan 21, 2020, to June 1, 2022. (C, D) Daily usage before June 15, 2020. Daily cases and daily deaths are smoothed by 7-day moving average.
Figure 2
Figure 2
Graphical summary of the Johns Hopkins University Center for Systems Science and Engineering Dashboard data pipeline The pipeline can be separated into four main steps. (A) Data sourcing describes the identification and validation of trusted, open-source data sources. (B) Autonomous collection uses web scraping algorithms to collect raw data from open-source data sources. (C) Comprehensive data curation passes the data through several quality control mechanisms including an in-house designed anomaly detection service. Data fusion services curate the cleaned data into a single production database. (D) Data sharing is the publication of production data into our online data products.
Figure 3
Figure 3
Evolution of the dashboard visualisation (A) Initially, our efforts were focused on the spread of cases in China. (B) As the virus spread globally, the default view was expanded to include the entire world. (C) In the most current version, vaccination data has been added and the time series has been adjusted from daily to weekly bars.

References

    1. Zhou P, Yang XL, Wang XG, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273.
    1. Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269.
    1. Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382:727–733.
    1. Tian H, Liu Y, Li Y, et al. An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science. 2020;368:638–642.
    1. WHO WHO statement on novel coronavirus in Thailand. 2020.
    1. Ghinai I, McPherson TD, Hunter JC, et al. First known person-to-person transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the USA. Lancet. 2020;395:1137–1144.
    1. WHO Statement on the second meeting of the international health regulations (2005) emergency committee regarding the outbreak of novel coronavirus (2019-nCoV) 2020.
    1. Irons NJ, Raftery AE. Estimating SARS-CoV-2 infections from deaths, confirmed cases, tests, and random surveys. Proc Natl Acad Sci USA. 2021;118
    1. WHO The true death toll of COVID-19: estimating global excess mortality. 2021.
    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:533–534.
    1. Environmental Systems Research Institute Item details. 2022.
    1. Gardner L, Ratcliff J, Dong E, Katz A. A need for open public data standards and sharing in light of COVID-19. Lancet Infect Dis. 2021;21:e80.
    1. US Centers for Disease Control and Prevention Coronavirus disease 2019 (COVID-19) 2020 interim case definition, approved August 5, 2020. 2020.
    1. US Centers for Disease Control and Prevention Coronavirus disease 2019 (COVID-19) 2021 case definition. 2021.
    1. Peralta-Santos A. Assessment of COVID-19 surveillance case definitions and data reporting in the European Union. 2020.
    1. WHO WHO COVID-19 case definition. 2020.
    1. WHO Laboratory testing of human suspected cases of novel coronavirus (nCoV) infection: interim guidance 10 January 2020. 2020.
    1. Mathieu E, Ritchie H, Ortiz-Ospina E, et al. A global database of COVID-19 vaccinations. Nat Hum Behav. 2021;5:947–953.

Source: PubMed

3
Předplatit