Mapping the spatial distribution of the dengue vector Aedes aegypti and predicting its abundance in northeastern Thailand using machine-learning approach

M S Rahman, Chamsai Pientong, Sumaira Zafar, Tipaya Ekalaksananan, Richard E Paul, Ubydul Haque, Joacim Rocklöv, Hans J Overgaard, M S Rahman, Chamsai Pientong, Sumaira Zafar, Tipaya Ekalaksananan, Richard E Paul, Ubydul Haque, Joacim Rocklöv, Hans J Overgaard

Abstract

Background: Mapping the spatial distribution of the dengue vector Aedes (Ae.) aegypti and accurately predicting its abundance are crucial for designing effective vector control strategies and early warning tools for dengue epidemic prevention. Socio-ecological and landscape factors influence Ae. aegypti abundance. Therefore, we aimed to map the spatial distribution of female adult Ae. aegypti and predict its abundance in northeastern Thailand based on socioeconomic, climate change, and dengue knowledge, attitude and practices (KAP) and/or landscape factors using machine learning (ML)-based system.

Method: A total of 1066 females adult Ae. aegypti were collected from four villages in northeastern Thailand during January-December 2019. Information on household socioeconomics, KAP regarding climate change and dengue, and satellite-based landscape data were also acquired. Geographic information systems (GIS) were used to map the household-based spatial distribution of female adult Ae. aegypti abundance (high/low). Five popular supervised learning models, logistic regression (LR), support vector machine (SVM), k-nearest neighbor (kNN), artificial neural network (ANN), and random forest (RF), were used to predict females adult Ae. aegypti abundance (high/low). The predictive accuracy of each modeling technique was calculated and evaluated. Important variables for predicting female adult Ae. aegypti abundance were also identified using the best-fitted model.

Results: Urban areas had higher abundance of female adult Ae. aegypti compared to rural areas. Overall, study respondents in both urban and rural areas had inadequate KAP regarding climate change and dengue. The average landscape factors per household in urban areas were rice crop (47.4%), natural tree cover (17.8%), built-up area (13.2%), permanent wetlands (21.2%), and rubber plantation (0%), and the corresponding figures for rural areas were 12.1, 2.0, 38.7, 40.1 and 0.1% respectively. Among all assessed models, RF showed the best prediction performance (socioeconomics: area under curve, AUC = 0.93, classification accuracy, CA = 0.86, F1 score = 0.85; KAP: AUC = 0.95, CA = 0.92, F1 = 0.90; landscape: AUC = 0.96, CA = 0.89, F1 = 0.87) for female adult Ae. aegypti abundance. The combined influences of all factors further improved the predictive accuracy in RF model (socioeconomics + KAP + landscape: AUC = 0.99, CA = 0.96 and F1 = 0.95). Dengue prevention practices were shown to be the most important predictor in the RF model for female adult Ae. aegypti abundance in northeastern Thailand.

Conclusion: The RF model is more suitable for the prediction of Ae. aegypti abundance in northeastern Thailand. Our study exemplifies that the application of GIS and machine learning systems has significant potential for understanding the spatial distribution of dengue vectors and predicting its abundance. The study findings might help optimize vector control strategies, future mosquito suppression, prediction and control strategies of epidemic arboviral diseases (dengue, chikungunya, and Zika). Such strategies can be incorporated into One Health approaches applying transdisciplinary approaches considering human-vector and agro-environmental interrelationships.

Keywords: ANN, Artificial neural network; AUC, Area under curve; Aedes aegypti; CA, Classification accuracy.; DENV, Dengue virus; Dengue; Early warning; GIS, Geographic information systems; HCI, Household crowding index; KAP, Knowledge, attitude, and practice; LR, logistic regression; ML, Machine learning; PCI, Premise condition index; Prediction; RF, Random forest; SES, Socioeconomic status; SVM, Support vector machine; Supervised learning; kNN, k-nearest neighbor.

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

© 2021 The Authors.

Figures

Graphical abstract
Graphical abstract
Fig. 1
Fig. 1
(A) Locations of the four Aedes data collection sites, B) Spatial distribution of the dengue vector, female adult Ae. aegypti abundance (high vs. low) calculated based on median values above and below respectively in 128 households of northeastern Thailand during Janu-ary-December 2019.
Fig. 2
Fig. 2
The pipeline of ML model workflow. The left side (features) shows the input fac-tors/predictors; the right side (models classifiers and predictive measurements) produces the full dataset model output and overall predictive performances of each classifier to predict female adult Ae. aegypti abundance (high/low).
Fig. 3
Fig. 3
Predictive performance evaluation parameters of five models to predict Ae. aegypti abundance (high/low). SVM: Support vector machine, LR: logistic regression, kNN: k-nearest neighbor, ANN: Artificial neural network, and RF: Random forest.
Fig. 4
Fig. 4
Receiver operating characteristics (ROC) curves predicting female Ae. aegypti abundance (high/low) in 128 households of northeastern Thailand during January–December 2019.
Fig. 5
Fig. 5
Mean decrease in Gini of random forest important predictors for female adult Ae. aegypti abundance (high/low) in 128 households of northeastern Thailand during January–December 2019.
Fig. 6
Fig. 6
Scatter diagram (lower left), histogram (diagonal), and correlation coefficients (upper right) of relationships between socioeconomic factors (A), KAP scores (B), landscape factors (C), and female adult Ae. aegypti abundance. “Blue”, “green” color represents high and low female adult Ae. aegypti abundance respectively in scatter diagram. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Source: PubMed

3
Abonnieren