Using contributing causes of death improves prediction of opioid involvement in unclassified drug overdoses in US death records

Andrew J Boslett, Alina Denham, Elaine L Hill, Andrew J Boslett, Alina Denham, Elaine L Hill

Abstract

Background and aims: A substantial share of fatal drug overdoses is missing information on specific drug involvement, leading to under-reporting of opioid-related death rates and a misrepresentation of the extent of the opioid epidemic. We aimed to compare methodological approaches to predicting opioid involvement in unclassified drug overdoses in US death records and to estimate the number of fatal opioid overdoses from 1999 to 2016 using the best-performing method.

Design: This was a secondary data analysis of the universe of drug overdoses in 1999-2016 obtained from the National Center for Health Statistics Detailed Multiple Cause of Death records.

Setting: United States.

Cases: A total of 632 331 drug overdose decedents. Drug overdoses with known drug classification comprised 78.2% of the cases (n = 494 316) and unclassified drug overdoses (ICD-10 T50.9) comprised 21.8% (n = 138 015).

Measurements: Known opioid involvement was defined using ICD-10 codes T40.0-40.4 and T40.6, recorded in the set of contributing causes. Opioid involvement in unclassified drug overdoses was predicted using multiple methodological approaches: logistic regression and machine learning techniques, inclusion/exclusion of contributing causes of death and inclusion/exclusion of county-level characteristics. Having selected the model with the highest predictive ability, we calculated corrected estimates of opioid-related mortality.

Findings: Logistic regression and random forest models performed similarly. Including contributing causes substantially improved predictive accuracy, while including county characteristics did not. Using a superior prediction model, we found that 71.8% of unclassified drug overdoses in 1999-2016 involved opioids, translating into 99 160 additional opioid-related deaths, or approximately 28% more than reported. Importantly, there was a striking geographic variation in undercounting of opioid overdoses.

Conclusions: In modeling opioid involvement in unclassified drug overdoses, highest predictive accuracy is achieved using a statistical model-either logistic regression or a random forest ensemble-with decedent characteristics and contributing causes of death as predictors.

Keywords: Contributing causes of death; fatal drug overdoses; machine learning; methods; mortality rates; opioids; prediction.

Conflict of interest statement

Declaration of competing interest: None

© 2020 Society for the Study of Addiction.

Figures

Figure 1:
Figure 1:
Trends in drug overdoses from 1999 to 2016 by drug overdose category Notes: This figure shows drug overdose counts from 1999 to 2016 based on the following category: 1) opioid-related overdoses; 2) non-opioid related overdoses; and 3) overdoses without a specific drug classification. Opioid overdoses may have also been partially caused by non-opioid drugs, though non-opioid caused overdoses are not caused by opioid drugs. These are estimates based on National Center for Health Statistics’ Multiple Cause of Death Data. The smoothed lines are estimated using local regression and are included to highlight trends in the data over time.
Figure 2:
Figure 2:
Opioid involvement prediction accuracy rates from 1999 to 2016: methodological comparisons Notes: This figure displays total predictive accuracy of year-level models of opioid involvement in drug overdoses with known drug classifications. In each year, we partitioned our data into training (80%) and test (20%) sets. We estimated the models on our training set and performed accuracy estimates using our test set. The “Naïve Estimate” corresponds to the predictive accuracy one would achieve if one predicted that every overdose was in the highest-frequency classification. In this case, this corresponds to predicting that every overdose was opioid-involved. Our reference model M0 is a logistic regression model which includes decedent information (Table 1) but does not include any contributing causes of death. The M1 model is a random forest ensemble with the same predictors as those in the M0 model. The M2 model is a logistic regression model including decedent information and county-level socieconomic characteristics from Ruhm (7). The M3 model is a logistic regression model with both decedent information and contributing cause indicators. Lastly, the M4 model is a random forest ensemble with both decedent information and contributing cause indicators. The smoothed lines are estimated using local regression and are included to highlight trends in the data over time.
Figure 3:
Figure 3:
Differences in opioid overdose rates between corrected rates using the superior methodology and uncorrected rates in 2016 Notes: This figure displays county-level increases in opioid overdose rates predicted using contributing causes of drug overdose death, relative to uncorrected estimates for 2016. We used the superior methodology identified in our study – a logistic regression model with decedent characteristics and contributing causes of death as predictors.
Figure 4:
Figure 4:
Trends in opioid-involved drug overdoses from 1999 to 2016 with corrected estimates using the superior methodology Notes: OD = overdose. This figure displays the number of total overdoses from 1999 to 2016 alongside estimates of opioid involvement projected using superior methodology identified in the study, i.e., a logistic regression with decedent characteristics and contributing causes of drug overdose deaths as predictors. The smoothed lines are estimated using local regression and are included to highlight trends in the data over time.

Source: PubMed

3
Tilaa