Modern Data Management and the COVID 19 Pandemic
For a timely example of modern data management, take a look at the Johns Hopkins visualization of the COVID 19 pandemic. For an example of NOT modern data management, take a look at the CDC reporting on the flu. The contrast is stunning. The COVID 19 visualization updates every 15 minutes and shows its source data and methods. The CDC flu page has not been updated since week 8, February 22.
Modern Data Management
(updated every 15 mins, exposes source data and methods)
With 100,000 cases and 3,400 deaths we can easily calculate the death rate of 3.4%. We know that many of the governments are not reporting the data accurately — particularly when it comes to cases. So the death rate may be significantly less because the number of cases could be much higher.
It would be interesting to compare the COVID 19 model to the CDC flu model, but so far I have not found a good side by side where the data is presented in similar formats. It appears that the CDC takes some time to compile its estimates of the impact of the flu. In this report, the 2017-2018 and 2018-2019 seasons are still listed as preliminary and subject to future revision. The elevated attention on COVID 19 and the hour by hour real time reporting is a significantly different method from compiling estimates based on death certificates. It is not hard to imagine a scenario where a COVID 19 death goes unreported.
NOT Modern Data Management
(Update interval unknown - weeks or more; source data and methods unknown)
The CDC reporting presents deaths as a percent of all deaths in the US and cases as n per 100,000 — making it theoretically possible to calculate the death rate. In the text the CDC estimates there have been at least 32 million cases, 310,000 hospitalizations, and 18,000 deaths from the flu this season. A death rate of .0005. (one 20th of one percent).
There is no question that the COVID 19 dashboard also has shortcomings. Most notably, it can only present the data we have available and as has been widely reported in the news, the US government has barely been testing anyone. So the number of cases in the US is not accurate and everyone knows it.
For those interested in pathogen tracing, check out Nextstrain.
Nextstrain, an open-source project tracking pathogen genome data, does a better job of tracking how the virus travels, but does not do as well in presenting the number of cases, their current status, and fatalities.
I am sure this is interesting to epidemiologists. I am not sure what it telling us though.
For those that think we should give our governments or the CDC a pass because data work is hard. Check out this website put up by a high school student in Mercer Island, WA (by Seattle).