Modeling COVID-19 scenarios for the United States

Authored by nature.com and submitted by mvea

Our analysis strategy supports two main and interconnected objectives: (1) to generate forecasts of COVID-19 deaths, infections and hospital resource needs for all US states; and (2) to explore alternative scenarios on the basis of changes in state-enforced SDMs or population-level mask use. The modeling approach to achieve this is summarized in the Supplementary Information and can be divided into four stages: (1) identification and processing of COVID-19 data, (2) exploration and selection of key drivers or covariates, (3) modeling deaths and cases across three boundary scenarios of SDMs in US states using an SEIR framework and (4) modeling health service utilization as a function of forecast infections and deaths within those scenarios. This study complies with the Guidelines for Accurate and Transparent Health Estimates Reporting statement (Supplementary Information).

IHME forecasts include data from local and national governments, hospital networks and associations, the World Health Organization, third-party aggregators and a range of other sources. Data sources and corrections are described in detail in the Supplementary Information and in the data availability statement. Briefly, daily confirmed case and death numbers due to COVID-19 are collated from the Johns Hopkins University data repository; we supplement and correct this dataset as needed to improve the accuracy of our projections and adjust for reporting-day biases (Supplementary Information). Testing data are obtained from Our World in Data (https://ourworldindata.org/), The COVID Tracking Project (https://covidtracking.com/) and supplemented with data from additional government websites (Supplementary Information). Social distancing data are obtained from a number of different official and open sources, which vary by state (Supplementary Information). Mobility data are obtained from Facebook Data for Good (https://dataforgood.fb.com/docs/covid19/), Google (https://www.google.com/covid19/mobility/), SafeGraph (https://www.safegraph.com/dashboard/covid19-shelter-in-place/) and Descartes Labs (https://www.descarteslabs.com/mobility/; Supplementary Information). Mask-use data are obtained from the Facebook Global Symptom Survey (in collaboration with the University of Maryland Social Data Science Center), the Kaiser Family Foundation, YouGov COVID-19 Behavioural Tracker survey (https://today.yougov.com/covid-19/) and PREMISE (https://www.premise.com/covid-19/; Supplementary Information). Specific sources for data on licensed bed and ICU capacity and average annual utilization in the United States are detailed in the Supplementary Information.

Before modeling, observed cumulative deaths are smoothed using a spline-based smoothing algorithm with randomly placed knots37. Uncertainty is derived from bootstrapping and resampling of the observed deaths. The time series of case data is used as a leading indicator of death based on an infection fatality ratio (IFR) and a lag from infection to death. These smoothed estimates of observed deaths by location are then used to create estimated infections based on an age distribution of infections and on age-specific IFRs. The age-specific infections were collapsed into total infections by day and state and used as data inputs in the SEIR model. Detailed descriptions of data smoothing and transformation steps are provided in the Supplementary Information.

Covariates for the compartmental transmission SEIR model are predictors of the β parameter in the model that affects the transition from the susceptible to exposed state; specifically, β represents the contact rate multiplied by the probability of transmission per contact. Covariates were evaluated on the basis of biological plausibility and on the impact on the results of the SEIR model. Given limited empirical evidence of population-level predictors of SARS-CoV-2 transmission, biologically plausible predictors of pneumonia such as population density (percentage of the population living in areas with more than 1,000 individuals per square kilometer), tobacco smoking prevalence, population-weighted elevation, lower respiratory infection mortality rate and particulate matter air pollution were considered. These covariates are representative at a population level and are time invariant. Location-specific estimates for these covariates are derived from the Global Burden of Disease Study 2019 (refs. 38,39,40). Time-varying covariates include pneumonia excess mortality seasonality, diagnostic tests administered per capita, population-level mobility and personal mask use. These are described below.

We used weekly pneumonia mortality data from the National Center for Health Statistics Mortality Surveillance System (https://gis.cdc.gov/grasp/fluview/mortality.html) from 2013 to 2019 by US state. Pneumonia deaths included all deaths classified by the full range of the International Classification of Disease codes in J12–J18.9. We pooled data over available years for each state and found the weekly deviation from the annual, state-specific mean mortality due to pneumonia. We then fit a seasonal pattern using a Bayesian meta-regression model with a flexible spline and assumed annual periodicity (Supplementary Information). For locations outside the United States, we used vital registration data where available. Locations without vital registration data had weekly pneumonia seasonality predicted based on latitude from a model pooling all available data (Supplementary Information).

We considered diagnostic testing for active SARS-CoV-2 infections as a predictor of the ability for a state to identify and isolate active infections. We assumed that higher rates of testing were negatively associated with SARS-CoV-2 transmission. Our primary sources for US testing data were compiled by the COVID Tracking Project (Supplementary Information). Unless testing data existed before the first confirmed case in a state, we assumed that testing was non zero after the date of the first confirmed case. Before producing predictions of testing per capita, we smoothed the input data by using the same smoothing algorithm used for smoothing daily death data before modeling (previously described). Testing per capita projections for unobserved future days were based on linearly extrapolating the mean day-over-day difference in daily tests per capita for each location. We put an upper limit on diagnostic tests per capita of 500 per 100,000 based on the highest observed rates in June 2020.

SDMs were not used as direct covariates in the transmission model. Rather, SDMs were used to predict population mobility (see below), which was subsequently used as a covariate in the transmission model. We collected the dates of state-issued mandates enforcing social distancing, as well as the planned or actual removal of these mandates. The measures that we included in our model were: (1) severe travel restrictions, (2) closing of public educational facilities, (3) closure of nonessential businesses, (4) stay-at-home orders and (5) restrictions on gathering size. Generally, these came from state government official orders or press releases.

To determine the expected change in mobility due to SDMs, we used a Bayesian, hierarchical meta-regression model with random effects by location on the composite mobility indicator to estimate the effects of social distancing policies on changes in mobility (Supplementary Information).

We used four data sources on human mobility to construct a composite mobility indicator. Those sources were Facebook, Google, SafeGraph and Descartes Labs (Supplementary Information). Each source takes a slightly different approach to capturing mobility, so before constructing a composite mobility indicator, we standardized these different data sources (Supplementary Information). Briefly, this first involved determining the change in a baseline level of mobility for each location by data source. Then, we determined a location-specific median ratio of change in mobility for each pairwise comparison of mobility sources, using Google as a reference and adjusting the other sources by that ratio. The time series for mobility was estimated using a Gaussian process regression model using the standardized data sources to get a composite indicator for change in mobility for each location day.

We calculated the residuals between our predicted composite mobility time series and input composite time series, and then applied a first-order random walk to the residuals. The random walk was used to predict residuals from 1 January 2020 to 1 January 2021, which were then added to the mobility predictions to produce a final time series with uncertainty: ‘past’ changes in mobility from 1 January 2020 to 28 September 2020 and projected mobility from 28 September 2020 to 1 January 2021.

We performed a meta-analysis of 40 peer-reviewed scientific studies in an assessment of mask effectiveness for preventing respiratory viral infections (Supplementary Information). The studies were extracted from a preprint publication24. In addition, we considered all articles from a second meta-analysis23 and one supplemental publication41. These studies included both persons working in health care and the general population, especially family members of those with known infections. The studies indicate overall reductions in infections due to masks preventing exhalation of respiratory droplets containing viruses, as well as some prevention of inhalation by those uninfected. The resulting meta-regression calculated log-transformed relative risks and corresponding log-transformed standard errors based on raw counts and used a continuity correction for studies with zero counts in the raw data (0.001). We included additional specifications and characteristics to account for differences in the characteristics of individual studies and to identify important factors impacting mask effectiveness (Supplementary Information).

We used MR-BRT (meta-regression, Bayesian, regularized and trimmed), a meta-regression tool developed at the Institute for Health Metrics and Evaluation (Supplementary Information), to perform a meta-analysis that considered the various characteristics of each study. We accounted for between-study heterogeneity and quantified remaining between-study heterogeneity into the width of the UI. We also performed various sensitivity analyses to verify the robustness of the modeled estimates and found that the estimate of the effectiveness of mask use did not change significantly when we explored four alternative analyses, including changing the continuity correction assumption, using odds ratio versus relative risk from published studies, using a fixed-effects versus a mixed-effects model and including studies without information on covariates.

We estimated the proportion of people who self-reported always wearing a face mask when outside in public for both US and global locations using data from PREMISE (US), the Kaiser Family Foundation (US), YouGov (non-US) and Facebook (non-US) surveys (Supplementary Information). We used the same smoothing model as for COVID-19 deaths and testing per capita to produce estimates of observed mask use. This smoothing process averaged each data point with its neighbors. The level of mask use starting on 21 September 2020 (the last day of processed and analyzed data) was assumed to be flat. Among states without state-specific data, a within-the-US regional average was used.

Model specification is summarized in a schematic with additional details provided in the Supplementary Information. To fit and predict disease transmission dynamics, we include a SEIR component in our multistage model. In particular, the population of each location is tracked through the following system of differential equations:

$$\begin{array}{l}\frac{{dS}}{{dt}} = - \beta \left( t \right)\frac{{S\left( {I_1 + I_2} \right)^\alpha }}{N}\\ \frac{{dE}}{{dt}} = \beta \left( t \right)\frac{{S\left( {I_1 + I_2} \right)^\alpha }}{N} - \sigma E\\ \frac{{dI_1}}{{dt}} = \sigma E - \gamma _1I_1\\ \frac{{dI_2}}{{dt}} = \gamma _1I_1 - \gamma _2I_2\\ \frac{{dR}}{{dt}} = \gamma _2I_2\end{array}$$

where α represents a mixing coefficient to account for imperfect mixing within each location, σ is the rate at which infected individuals become infectious, γ 1 is the rate at which infectious people transition out of the presymptomatic phase and γ 2 is the rate at which individuals recover. This model does not distinguish between symptomatic and asymptomatic infections but has two infectious compartments (I 1 and I 2 ) to allow for interventions that would avoid focus on those who could not be symptomatic; I 1 is thus the presymptomatic compartment.

Using the next-generation matrix approach, we can directly calculate both the basic reproductive number under control (R c (t)) and the effective reproductive number (R effective (t)) as (Supplementary Information):

\(R_c\left( t \right) = \alpha \times \beta \left( t \right) \times \left( {I_1\left( t \right) + I_2\left( t \right)} \right)^{\alpha - 1} \times \left( {\frac{1}{{\gamma _1}} + \frac{1}{{\gamma _2}}} \right)\) and

\(R_{effective}\left( t \right) = R_c\left( t \right) \times \frac{{S\left( t \right)}}{N}\)

By allowing β(t) to vary in time, our model is able to account for increases in transmission intensity as human behavior shifts over time (for example, changes in mobility, adding or removing SDMs and changes in population mask use). Briefly, we combine data on cases (correcting for trends in testing), hospitalizations and deaths into a distribution of trends in daily deaths.

To fit this model, we resampled 1,000 draws of daily deaths from this distribution for each state (Supplementary Information). Using an estimated IFR by age and the distribution of time from infection to death (Supplementary Information), we then used the daily deaths to generate 1,000 distributions of estimated infections by day from 10 January to 21 September 2020. We then fit the rates at which infectious individuals may come into contact and infect susceptible individuals (denoted as β(t)) as a function of a number of predictors that affect transmission. Our modeling approach acts across the overall population (that is, no assumed age structure for transmission dynamics), and each location is modeled independently of the others (that is, we do not account for potential movement between locations).

We detail the SEIR fitting algorithm in the Supplementary Information. Briefly, for each draw, we first fit a smooth curve to our estimates of daily new infections. Then, sampling γ 2 , σ and α from defined ranges from the literature (Supplementary Information) and using \(\gamma _1 = \frac{1}{2}\), we then sequentially fit the E, I 1 , I 2 and R components in the past. We then algebraically solve the above system of differential equations for β(t).

The next stage of our model fit relationships between past changes in β(t) and covariates described above: mobility, testing, masks, pneumonia seasonality and others. The time-varying covariates were forecast from 28 September to 28 February 2021 (Supplementary Information). The fitted regression was then used to estimate future transmission intensity β pred (t). The final future transmission intensity is then an adjusted version of β pred (t) based on the average fit over the recent past (where the window of averaging varies by draw from 2 to 4 weeks; Supplementary Information).

Finally, we used the future estimated transmission intensity to predict future transmission (using the same parameter values for all other SEIR parameters for each draw). In a reversal of the translation of deaths into infections, we then used the estimated daily new infections to calculate estimated daily deaths (again using the location-specific IFR). We also used the estimated trajectories of each SEIR compartment to calculate R c and R effective .

A final step to take predicted infections and deaths and a hospital-use microsimulation to estimate hospital resource need for each US state is described in the Supplementary Information and the results are presented online (https://covid19.healthdata.org/).

Policy responses to COVID-19 can be supported by the evaluation of the impacts of various scenarios of those options, against a background of a business-as-usual assumption, to explore fully the potential impact of policy levers available. Additional details are available in the Supplementary Information.

We estimate the trajectory of the epidemic by state under a mandate-easing scenario that models what would happen in each state if the current pattern of easing SDMs continues and new mandates are not implemented. This should be thought of as a worst-case scenario where, regardless of how high the daily death rate becomes, SDMs will not be reintroduced and behavior (including population mobility and mask use) will not vary before 28 February 2021. In locations where the number of cases is rising, this leads to very high numbers of cases by the end of the year.

As a more plausible scenario, we use the observed experience from the first phase of the pandemic to predict the likely response of state and local governments during the second phase. This plausible reference scenario assumes that in each location the trend of easing SDMs will continue at its current trajectory until the daily death rate reaches a threshold of 8 deaths per million. If the daily death rate in a location exceeds that threshold, we assume that SDMs will be reintroduced for a 6-week period. The choice of threshold (of a daily rate of 8 deaths per million) represents the 90th percentile of the distribution of daily death rate at which US states implemented their mandates during the first months of the COVID-19 pandemic. We selected the 90th percentile rather than the 50th percentile to capture an anticipated increased reluctance from governments to reinstate mandates because of the economic effects of the first set of mandates. In locations that do not exceed the threshold of a daily death rate of 8 per million, the projection is based on the covariates in the model and the forecasts for these to 28 February 2021. In locations where the daily death rate exceeded 8 per million at the time of running our final model (21 September 2020), we assumed that mandates would be introduced within 7 days.

The scenario of universal mask use models what would happen if 95% of the population in each state always wore a mask when they were in public. This value was chosen to represent the highest observed rate of mask use in the world so far during the COVID-19 pandemic (Supplementary Information). In this scenario, we also assumed that if the daily death rate in a state exceeds 8 deaths per million, SDMs will be reintroduced for a 6-week period.

Two additional, derivative scenarios were included to assist understanding and policy resolution of these main framework scenarios: a less comprehensive mask-wearing scenario of 85% public use of masks and a scenario of universal mask use in the absence of any additional NPIs. The less comprehensive mask-wearing scenario evaluated what would happen if 85% of the population in each state always wore a mask when they were in public. As with the universal mask-use scenario, we also assumed that if the daily death rate in a state exceeds 8 deaths per million, SDMs will be reintroduced for a 6-week period. For completeness, we also evaluated universal mask use by 95% of the population in a scenario that assumes no implementation of other NPIs at any threshold value of daily deaths—the results from this scenario, which did not differ notably from the more probable version where states respond to rising numbers of daily COVID-19 deaths by reinstating SDM, are provided in the Supplementary Information and Figs. 2–4. SEIR model vetting plots for scenarios of 95% mask use with mandates (Supplementary Data 1), 95% mask use without mandates (Supplementary Data 2) and 85% mask use with mandates (Supplementary Data 3), as well as detailed regression diagnostics (Supplementary Data 4) and the spatial distribution of select covariates (Supplementary Data 5) are available in the Supporting Information. All scenarios assume an increase in mobility associated with the opening of schools across the country.

OOS predictive performance for IHME SEIR models has been assessed against subsequently observed trends in an ongoing fashion and compared to other publicly available COVID-19 mortality forecasting models in a publicly available framework21. The IHME SEIR model described here has consistently demonstrated high accuracy, as measured by a low MAPE, when compared to models from other groups. For example, among models released in June, at 10 weeks of extrapolation, the IHME SEIR model had the lowest MAPE of any observed forecasting group at 20.2%, compared to an average of 32.6% across groups. Numerous other aspects of predictive performance are assessed in our publicly available framework21.

The increasing number of population-based serology surveys conducted also provides a unique opportunity to cross-validate our forecasts with modeled epidemiological outcomes. In Extended Data Fig. 9, we compare these serology surveys (such as the Spanish ENE-COVID study42) to our estimated population seropositivity, time indexed to the date that the survey was conducted. In general, across the varied locations that have been reported globally, we note a high degree of agreement between the estimated and surveyed seropositivity. As more serology studies are conducted and published, especially in the United States, this will allow an ongoing and iterative assessment of model validity. Two sensitivity analyses were conducted; the first assessed the importance of specific model assumptions on OOS predictive validity, while the second assessed the robustness of our conclusions to these same model assumptions (Supplementary Information).

Epidemics progress based on complex nonlinear and dynamic biological and social processes that are difficult to observe directly and at scale. Mechanistic models of epidemics, formulated either as ordinary differential equations or as individual-based simulation models, are a useful tool for conceptualizing, analyzing or forecasting the time course of epidemics. In the COVID-19 epidemic, effective policies and the responses to those policies have changed the conditions supporting transmission from one week to the next, with the effects of policies realized typically after a variable time lag. Each model approximates an epidemic, and whether used to understand, forecast or advise, there are limitations on the quality and availability of the data used to inform it and the simplifications chosen in model specification. It is unreasonable to expect any model to do everything well, so each model makes compromises to serve a purpose, while maintaining computational tractability.

One of the largest determinants of the quality of a model is the corresponding quality of the input data. Our model is anchored to daily COVID-19-related deaths, as opposed to daily COVID-19 case counts, due to the assumption that death counts are a less biased estimate of true COVID-19-related deaths than COVID-19 case counts are of the true number of SARS-CoV-2 infections. Numerous biases such as treatment-seeking behavior, testing protocols (such as only testing those who have traveled abroad) and differential access to care greatly influence the utility of case count data. Moreover, there is growing evidence that inapparent and asymptomatic individuals are infectious, as well as individuals who eventually become symptomatic and are infectious before the onset of any symptoms. As such, our primary input data for our model are counts of deaths; death data can likewise be fallible, however, and where available, we combine death data, case data and hospitalization data to estimate COVID-19 deaths.

Beyond the basic input data, a large number of other data sources with their own potential biases are incorporated into our model. Testing, mobility and mask use are all imperfectly measured and may or may not be representative of the practices of those that are susceptible and/or infectious. Moreover, any forecast of the patterns of these covariates is associated with a large number of assumptions (Supplementary Information), and as such, care must be taken in the interpretation of estimates farther into the future, as the uncertainty associated with the numerous submodels that go into these estimates increases in time. Moreover, although our time-invariant covariates are simpler to estimate, some of them may be more associated with disease outcome than transmission potential, and thus their impact on the model may be more muted.

For practical purposes, our transmission model has made a large number of simplifying assumptions. Key among these is the exclusion of movement between locations (for example, importation) and the absence of age structure and mixing within location (for example, we assume a well-mixed population). It is clear that there are large, super-spreader-like events that have occurred throughout the COVID-19 pandemic, and our current model is unable to fully capture these dynamics. Another important assumption to note is that of the relationship between pneumonia seasonality and SARS-CoV-2 seasonality. To date, across both the Northern and Southern Hemispheres, there is a strong association between COVID-19 cases and deaths and general seasonal patterns of pneumonia deaths (Supplementary Information). Our forecasts to the end of February 2021 are immensely influenced by the assumption that this relationship will maintain throughout the year and that SARS-CoV-2 seasonality will be well approximated by pneumonia seasonality. While we assess this assumption to the extent possible (Supplementary Information), we have not yet experienced a full year of SARS-CoV-2 transmission, and as such cannot yet know if this assumption is valid. Additionally, our model attempts to account for some of the associated uncertainties in the process but does not fully capture all levels of uncertainty. Future iterations should track uncertainties that arise from more complex processes such as demographic stochasticity. There is also uncertainty (and unidentifiability) surrounding a number of the parameters of the transmission model. Here we have chosen to incorporate this lack of knowledge by drawing key transmission parameters from plausible distributions and then presenting the average result across these potential realities. As more information becomes available, we hope to tune these parameters to each location in turn.

Finally, the model presented herein is not the first model our team has developed to predict current and future transmission of SARS-CoV-2. As the outbreak has progressed, we have attempted to adapt our modeling framework to both the changing epidemiological landscape, as well as the increase in data that could be useful to inform a model. Changes in the dynamics of the outbreak overwhelmed both the initial purpose and some key assumptions of our first model, requiring evolution in our approach. While the current SEIR formulation is a more flexible framework (and thus less likely to need complete reconfiguration as the outbreak progresses further), we fully expect the need to adapt our model to accommodate future shifts in patterns of SARS-CoV-2 transmission. Incorporating movement within and without locations is one example, but resolving our model at finer spatial scales, as well as accounting for differential exposure and treatment rates across sexes and races are other dimensions of transmission modeling that we currently do not account for but expect will be necessary additions in the coming months. As we have done before, we will continually adapt, update and improve our model based on need and predictive validity.

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

velonaut on October 24th, 2020 at 04:27 UTC »

Important detail: Their estimate for how many Americans will die from Covid-19 by February 2021 is 511,373, of which ~230K have already died. So 95% mask use is modelled to provide a ~54% reduction in future Covid-19 related deaths.

coldgator on October 24th, 2020 at 02:49 UTC »

Universal PROPER mask use. Right now I'd settle for PSAs that they have to go OVER YOUR NOSE

thelastestgunslinger on October 24th, 2020 at 02:41 UTC »

So, what you’re saying is that in the next 5 months, the United States will suffer an extra 130,000 deaths, over and above the inescapable ones?