Support vector machine-based differentiation between aggressive and chronic periodontitis using microbial profiles

Magda Feres, Yoram Louzoun, Simi Haber, Marcelo Faveri, Luciene C Figueiredo, Liran Levin, Magda Feres, Yoram Louzoun, Simi Haber, Marcelo Faveri, Luciene C Figueiredo, Liran Levin

Abstract

Background: The existence of specific microbial profiles for different periodontal conditions is still a matter of debate. The aim of this study was to test the hypothesis that 40 bacterial species could be used to classify patients, utilising machine learning, into generalised chronic periodontitis (ChP), generalised aggressive periodontitis (AgP) and periodontal health (PH).

Method: Subgingival biofilm samples were collected from patients with AgP, ChP and PH and analysed for their content of 40 bacterial species using checkerboard DNA-DNA hybridisation. Two stages of machine learning were then performed. First of all, we tested whether there was a difference between the composition of bacterial communities in PH and in disease, and then we tested whether a difference existed in the composition of bacterial communities between ChP and AgP. The data were split in each analysis to 70% train and 30% test. A support vector machine (SVM) classifier was used with a linear kernel and a Box constraint of 1. The analysis was divided into two parts.

Results: Overall, 435 patients (3,915 samples) were included in the analysis (PH = 53; ChP = 308; AgP = 74). The variance of the healthy samples in all principal component analysis (PCA) directions was smaller than that of the periodontally diseased samples, suggesting that PH is characterised by a uniform bacterial composition and that the bacterial composition of periodontally diseased samples is much more diverse. The relative bacterial load could distinguish between AgP and ChP.

Conclusion: An SVC classifier using a panel of 40 bacterial species was able to distinguish between PH, AgP in young individuals and ChP.

Keywords: Plaque; mathematics; oral health; periodontitis; prevention.

© 2017 FDI World Dental Federation.

Figures

Figure 1.
Figure 1.
Principal component analysis (PCA) projection of the three conditions (chronic periodontitis, aggressive periodontitis and periodontally healthy). The diagonal plots are the distribution of the principal component weights. One can clearly see that the healthy state is very different from the two other states. Each row is a different principal component. The plots below the diagonal are scatter plots of two principal component vectors. While the healthy condition has a limited variance, the two other conditions have a large variance and mostly overlap. The plots above the diagonal are two-dimensional distributions of the projections on the principal components, as presented by contours. One can clearly see a peak in the distribution representing the healthy state, which is very different from any of the disease states. This is also obvious in the separate blub in the contour plot.
Figure 2.
Figure 2.
Receiver operating characteristic curve for chronic periodontitis (ChP) and aggressive periodontitis (AgP) versus healthy (dashed line) and for ChP versus AgP (solid line) using a linear support vector machine. The train and test results are highly similar, showing that there is no over-fitting. The values are above the diagonal showing that a meaningful differentiation can be obtained. The true-positive and false-positive values specified in the text box are just one possible point along the curve.
Figure 3.
Figure 3.
Similarly to Figure 1, when the healthy state is removed. Clear differences can be observed between the two disease states, but the difference is smaller than between healthy and diseased, as observed in Figure 1. Note that here also, there is a clear separate cluster of aggressive periodontitis point (see the third principal component analysis).
Figure 4.
Figure 4.
Weights of linear support vector machine classifier for chronic periodontitis (ChP) versus aggressive periodontitis (AgP). Positive weights imply that a high level of the bacterium is correlated with ChP, and the opposite for negative weights, correlated with AgP. The absolute value of the weight represents the contribution of the specific bacterium to the classification. It is noteworthy that the values in Figure 4 do not represent prevalence of the bacterial species but rather their ability to differentiate between ChP (positive values) and AgP (negative values).

Source: PubMed

3
Subscribe