Machine Learning to Identify Persons at High-Risk of Human Immunodeficiency Virus Acquisition in Rural Kenya and Uganda

Laura B Balzer, Diane V Havlir, Moses R Kamya, Gabriel Chamie, Edwin D Charlebois, Tamara D Clark, Catherine A Koss, Dalsone Kwarisiima, James Ayieko, Norton Sang, Jane Kabami, Mucunguzi Atukunda, Vivek Jain, Carol S Camlin, Craig R Cohen, Elizabeth A Bukusi, Mark Van Der Laan, Maya L Petersen, Laura B Balzer, Diane V Havlir, Moses R Kamya, Gabriel Chamie, Edwin D Charlebois, Tamara D Clark, Catherine A Koss, Dalsone Kwarisiima, James Ayieko, Norton Sang, Jane Kabami, Mucunguzi Atukunda, Vivek Jain, Carol S Camlin, Craig R Cohen, Elizabeth A Bukusi, Mark Van Der Laan, Maya L Petersen

Abstract

Background: In generalized epidemic settings, strategies are needed to prioritize individuals at higher risk of human immunodeficiency virus (HIV) acquisition for prevention services. We used population-level HIV testing data from rural Kenya and Uganda to construct HIV risk scores and assessed their ability to identify seroconversions.

Methods: During 2013-2017, >75% of residents in 16 communities in the SEARCH study were tested annually for HIV. In this population, we evaluated 3 strategies for using demographic factors to predict the 1-year risk of HIV seroconversion: membership in ≥1 known "risk group" (eg, having a spouse living with HIV), a "model-based" risk score constructed with logistic regression, and a "machine learning" risk score constructed with the Super Learner algorithm. We hypothesized machine learning would identify high-risk individuals more efficiently (fewer persons targeted for a fixed sensitivity) and with higher sensitivity (for a fixed number targeted) than either other approach.

Results: A total of 75 558 persons contributed 166 723 person-years of follow-up; 519 seroconverted. Machine learning improved efficiency. To achieve a fixed sensitivity of 50%, the risk-group strategy targeted 42% of the population, the model-based strategy targeted 27%, and machine learning targeted 18%. Machine learning also improved sensitivity. With an upper limit of 45% targeted, the risk-group strategy correctly classified 58% of seroconversions, the model-based strategy 68%, and machine learning 78%.

Conclusions: Machine learning improved classification of individuals at risk of HIV acquisition compared with a model-based approach or reliance on known risk groups and could inform targeting of prevention strategies in generalized epidemic settings.

Clinical trials registration: NCT01864603.

Keywords: HIV prevention; HIV risk score; PrEP; SEARCH Study; clinical prediction rule.

© The Author(s) 2019. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

Figures

Figure 1.
Figure 1.
Schematic representation of a targeted prevention strategy with the goal of maximizing the intersection of the population offered intensified prevention (light gray) with the population at risk of seroconversion (medium-gray). Sensitivity is the proportion of individuals at high risk who are correctly identified by the strategy: the number in the dark-gray intersection divided by the number in the medium-gray circle. The rate of positive predictions is the proportion of the population targeted: the number in the light-gray circle divided by the total population size. The number needed to target (equal to 1/positive predictive value) is the number classified as high risk per seroconversion identified: the number in the light-gray circle divided by the number in the dark-gray intersection. Abbreviation: HIV, human immunodeficiency virus.
Figure 2.
Figure 2.
Cross-validated efficiency of each candidate targeting strategy, defined as the proportion of the population that would have been classified as high risk (rate of positive predictions) to achieve 50% sensitivity for correct classification of seroconversions.
Figure 3.
Figure 3.
Cross-validated sensitivity for correct classification of seroconversions that would have been achieved by targeting 45% of the overall population.

Source: PubMed

3
Prenumerera