Machine Learning for the Diagnosis of Orthodontic Extractions: A Computational Analysis Using Ensemble Learning

Yasir Suhail, Madhur Upadhyay, Aditya Chhibber, Kshitiz, Yasir Suhail, Madhur Upadhyay, Aditya Chhibber, Kshitiz

Abstract

Extraction of teeth is an important treatment decision in orthodontic practice. An expert system that is able to arrive at suitable treatment decisions can be valuable to clinicians for verifying treatment plans, minimizing human error, training orthodontists, and improving reliability. In this work, we train a number of machine learning models for this prediction task using data for 287 patients, evaluated independently by five different orthodontists. We demonstrate why ensemble methods are particularly suited for this task. We evaluate the performance of the machine learning models and interpret the training behavior. We show that the results for our model are close to the level of agreement between different orthodontists.

Keywords: ensemble methods; machine learning; neural network; orthodontics; random forests.

Conflict of interest statement

The authors and the University of Connecticut have filed U.S. Provisional Patent Application No. 62/915,725 based on this work on October 16, 2019.

Figures

**Figure 1**
Schematic of the procedure followed in this work, from data collection to the machine learning diagnosis.

**Figure 2**
Index of the different extraction options. The diagram on the left shows the locations of the upper and lower premolars. The 14 options on the right list the specific extraction procedures in terms of the locations of the teeth. NE refers to no extraction.

**Figure 3**
Demographic background of patients. (A) Age distribution, and (B) gender distribution.

**Figure 4**
Performance of the single classifiers, when considering (A) only the primary diagnosis, and (B) both the primary and alternative.

**Figure 5**
Training time for the single classifiers. Logistic regression was also trained with product terms (i.e., two-way interactions), dramatically increasing the training time. Due to the large dynamic range needed on the y-axis, it is drawn in the pseudolog transform.

**Figure 6**
Effect of the weight regularization in the neural network/multinomial regression model on the training set error. The x-axis shows the regularization weight. The top row is for the neural network utilizing the raw input features while the bottom row is for using two-way interactions. The 3 columns show the results for the weight regularization penalty term formulated as an elastic net, L1 norm, and L2 norm.

**Figure 7**
Effect of the weight regularization in the neural network/multinomial regression model on the test set error. The x-axis shows the regularization weight, with rows and columns corresponding to the interactions and weight regularization method as in Figure 6.

**Figure 8**
Effect of various training parameters on the random forest model for the prediction of the specific extraction. The minimum node size, features tried at every level of split, and the number of trees are varied and the error rates for the training and test split are plotted. In (A), a prediction is considered as an error if it does not agree with the expert’s primary diagnosis, and in (B), it is considered an error if the prediction does not agree with the primary or alternative diagnosis.

**Figure 9**
Effect of various training parameters on the random forest model for the binary prediction problem. The minimum node size, features tried at every level of split, and the number of trees are varied and the error rates for the training and test split are plotted. In (A), a prediction is considered as an error if it does not agree with the expert’s primary diagnosis, and in (B), it is considered an error if the prediction does not agree with the primary or alternative diagnosis.

**Figure 10**
Saturating effect of increasing the number of classifiers in the random forest. The out-of-bag accuracy (an estimate of the test accuracy) plotted against the number of trees for the random forest model predicting the specific type of interaction. This is for a minimal node size of 1 and trying all possible features at every split.

**Figure 11**
Performance of all the classifiers for predicting (A) the primary diagnosis, and (B) where agreement with either the primary or the alternative diagnoses is considered to be accurate. Here, both the single and ensemble (random forest) classifiers are included.

**Figure 12**
Effect of individual features. The test error for the predicting of the specific treatment plan using the neural network after independently deleting single features from the dataset.

References

1. Weintraub J.A., Vig P.S., Brown C., Kowalski C.J. The prevalence of orthodontic extractions. Am. J. Orthod. Dentofac. Orthop. 1989;96:462–466. doi: 10.1016/0889-5406(89)90112-1.
1. Burrow S.J. To extract or not to extract: A diagnostic decision, not a marketing decision. Am. J. Orthod. 2008;133:341–342. doi: 10.1016/j.ajodo.2007.11.016.
1. Ribarevski R., Vig P., Vig K.D., Weyant R., O’Brien K. Consistency of orthodontic extraction decisions. Eur. J. Orthod. 1996;18:77–80. doi: 10.1093/ejo/18.1.77.
1. Dunbar A.C., Bearn D., McIntyre G. The influence of using digital diagnostic information on orthodontic treatment planning—A pilot study. J. Healthc. Eng. 2014;5:411–427. doi: 10.1260/2040-2295.5.4.411.
1. Baumrind S. The decision to extract: Part I—Inter-clinician agreement. Am. J. Orthod. Dentofac. Orthop. 1996;109:297–309. doi: 10.1016/S0889-5406(96)70153-1.
1. R Core Team . Vienna, Austria: 2018. [(accessed on 1 May 2020)]. R: A language and Environment for Statistical Computing. R Foundation for Statistical Computing. Available online:
1. Venables W.N., Ripley B.D. Modern Applied Statistics with S. 4th ed. Springer; Berlin/Heidelberg, Germany: 2002.
1. Liaw A., Wiener M. Classification and Regression by randomForest. R News. 2002;2:18–22.
1. Xie X., Wang L., Wang A. Artificial neural network modeling for deciding if extractions are necessary prior to orthodontic treatment. Angle Orthod. 2010;80:262–266. doi: 10.2319/111608-588.1.
1. Jung S.K., Kim T.W. New approach for the diagnosis of extractions with neural network machine learning. Am. J. Orthod. Dentofac. Orthop. 2016;149:127–133. doi: 10.1016/j.ajodo.2015.07.030.
1. Konstantonis D., Anthopoulou C., Makou M. Extraxxtion decision and identification of treatment predictors in class I malocclusions. Prog. Orthod. 2013;14:1–8. doi: 10.1186/2196-1042-14-47.
1. Ho T.K. Random Decision Forests; Proceedings of the 3rd International Conference on Document Analysis and Recognition; Montreal, QC, USA. 14–16 August 1995.
1. Dietterich T.G. Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science. Springer; Berlin/Heidelberg, Germany: 2020. Ensemble Methods in Machine Learning.
1. Breiman L. Random Forests. Mach. Learn. 2001;45:5. doi: 10.1023/A:1010933404324.
1. Friedman J., Hastie T., Tibshirani R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors) Ann. Stat. 2000;28:337–407. doi: 10.1214/aos/1016218223.
1. Nishimoto S., Sotsuka Y., Kawai K., Ishise H., Kakibuchi M. Personal Computer-Based Cephalometric Landmark Detection With Deep Learning, Using Cephalograms on the Internet. J. Craniofacial Surg. 2019;30:91–95. doi: 10.1097/SCS.0000000000004901.
1. Lee H., Park M., Kim J. Cephalometric landmark detection in dental x-ray images using convolutional neural networks. Proc. SPIE Med. Imaging. 2017 doi: 10.1117/12.2255870.
1. Ronneberger O., Fischer P., Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Munich, Germany. 5–9 October 2015.
1. Narayanan B.N., Hardie R.C., Kebede T.M., Sprague M.J. Optimized feature selection-based clustering approach for computer-aided detection of lung nodules in different modalities. Pattern Anal. Appl. 2019;22:559–571. doi: 10.1007/s10044-017-0653-4.
1. Lambin P., Rios-Velazquez E., Leijenaar R., Carvalho S., Van Stiphout R.G., Granton P., Aerts H.J. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer. 2012;48:441–446. doi: 10.1016/j.ejca.2011.11.036.
1. Narayanan B.N., Hardie R.C., Kebede T.M. Performance Analysis of Feature Selection Techniques for Support Vector Machine and its Application for Lung Nodule Detection; Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference; Dayton, OH, USA. 23–26 July 2018.

Source: PubMed

Machine Learning for the Diagnosis of Orthodontic Extractions: A Computational Analysis Using Ensemble Learning

Abstract

Conflict of interest statement

Figures

References

Sponsors and Collaborators

Medical Conditions

Drug Interventions