StAR: a simple tool for the statistical comparison of ROC curves

Ismael A Vergara, Tomás Norambuena, Evandro Ferrada, Alex W Slater, Francisco Melo, Ismael A Vergara, Tomás Norambuena, Evandro Ferrada, Alex W Slater, Francisco Melo

Abstract

Background: As in many different areas of science and technology, most important problems in bioinformatics rely on the proper development and assessment of binary classifiers. A generalized assessment of the performance of binary classifiers is typically carried out through the analysis of their receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) constitutes a popular indicator of the performance of a binary classifier. However, the assessment of the statistical significance of the difference between any two classifiers based on this measure is not a straightforward task, since not many freely available tools exist. Most existing software is either not free, difficult to use or not easy to automate when a comparative assessment of the performance of many binary classifiers is intended. This constitutes the typical scenario for the optimization of parameters when developing new classifiers and also for their performance validation through the comparison to previous art.

Results: In this work we describe and release new software to assess the statistical significance of the observed difference between the AUCs of any two classifiers for a common task estimated from paired data or unpaired balanced data. The software is able to perform a pairwise comparison of many classifiers in a single run, without requiring any expert or advanced knowledge to use it. The software relies on a non-parametric test for the difference of the AUCs that accounts for the correlation of the ROC curves. The results are displayed graphically and can be easily customized by the user. A human-readable report is generated and the complete data resulting from the analysis are also available for download, which can be used for further analysis with other software. The software is released as a web server that can be used in any client platform and also as a standalone application for the Linux operating system.

Conclusion: A new software for the statistical comparison of ROC curves is released here as a web server and also as standalone software for the LINUX operating system.

References

    1. Swets JA, Dawes RM, Monahan J. Better decisions through science. Sci Am. 2000;283:82–87.
    1. Usuka J, Brendel V. Gene structure prediction next term by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring. Journal of Molecular Biology. 2000;297:1075–1085. doi: 10.1006/jmbi.2000.3641.
    1. Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994;372:631–634. doi: 10.1038/372631a0.
    1. Chou KC, Elrod DW. Protein subcellular location prediction. Protein Engineering. 1999;12:107–118. doi: 10.1093/protein/12.2.107.
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556.
    1. Vazquez A, Flammini A, Maritan A, Vespignani A. Global protein function prediction from protein-protein interaction networks. Nature Biotechnology. 2003;21:697–700. doi: 10.1038/nbt825.
    1. Fawcett T. ROC Graphs: Notes and Practical Considerations for Researchers. Tech Report HPL-2003-4, HP Laboratories. 2004.
    1. Swets JA. Measuring the accuracy of diagnostic systems. Science. 1988;240:1285–1293. doi: 10.1126/science.3287615.
    1. Metz CE, Herman BA, Roe CA. Statistical Comparison of Two ROC-curve Estimates Obtained from Partially-paired Datasets. Medical Decision Making. 1998;18:110–121. doi: 10.1177/0272989X9801800118.
    1. Hanley JA. The use of the binormal model for parametric ROC analysis of quantitative diagnostic tests. Medical Decision Making. 1988;8:197–203. doi: 10.1177/0272989X8800800308.
    1. Metz CE. Basic Principles of ROC analysis. Semin nucl med. 1978;8:283–298. doi: 10.1016/S0001-2998(78)80014-2.
    1. Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine. 1998;17:1033–1053. doi: 10.1002/(SICI)1097-0258(19980515)17:9<1033::AID-SIM784>;2-Z.
    1. Zou KH, Hall WJ, Shapiro DE. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine. 1996;16:2143–2156. doi: 10.1002/(SICI)1097-0258(19971015)16:19<2143::AID-SIM655>;2-3.
    1. Delong ER, Delong DM, Clarke-Pearson DL. Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics. 1988;44:837–845. doi: 10.2307/2531595.
    1. Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals - rating - method data. Journal of Mathematical Psychology. 1969;6:487–496. doi: 10.1016/0022-2496(69)90019-4.
    1. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology. 1975;12:387–415. doi: 10.1016/0022-2496(75)90001-2.
    1. Hajian-Tilaki KO, Hanley JA, Joseph L, Collet JP. A Comparison of Parametric and Nonparametric Approaches to ROC Analysis of Quantitative Diagnostic Tests. Medical Decision Making. 1997;17:94–102. doi: 10.1177/0272989X9701700111.
    1. Goddard MJ, Hinberg I. Receiver operator characteristic (ROC) curves and non-normal data: An empirical study. Statistics in Medicine. 1989;9:325–337. doi: 10.1002/sim.4780090315.
    1. Stephan C, Wesseling S, Schink T, Jung K. Comparison of eight computer programs for receiver-operating characteristic analysis. Clin Chem. 2003;49:433–439. doi: 10.1373/49.3.433.
    1. Metz CE. Multiple regression analysis: applications in the health sciences (D Herbert and R Myers, eds) New York: American Institute of Physics ; 1986. Statistical analysis of ROC data in evaluating diagnostic performance; p. 365.
    1. Metz CE. Information processing in medical imaging (Ed F Deconinck) Nijhoff, The Hague. ; 1984. A new approach for testing the significance of differences between ROC curves measured from correlated data. pp. 432–445.
    1. DBM MRMC 2.1
    1. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jacknife method. Invest Radiol. 1992;27:723–731. doi: 10.1097/00004424-199209000-00015.
    1. Dorfman DD, Metz CE. Multi-reader multi-case ROC analysis: comments on Begg’s commentary. Academic Radiol. 1995;2:S76.
    1. Hillis SL, Berbaum KS. Montecarlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification. Academic radiology. 2005;12:1534–1541. doi: 10.1016/j.acra.2005.07.012.
    1. Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Statistics in Medicine. 2005;24:1579–1607. doi: 10.1002/sim.2024.
    1. Roe CA, Metz CE. Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation. Academic radiology. 1997;4:298–303. doi: 10.1016/S1076-6332(97)80032-3.
    1. Roe CA, Metz CE. Variance-component modeling in the analysis of receiver operating characteristic index estimates. Academic radiology. 1997;4:587–600. doi: 10.1016/S1076-6332(97)80210-3.
    1. Ferrada E, Melo F. Non-bonded terms extrapolated from non-local knowledge based energy functions improve error detection in near native protein structure models. Protein Science. 2007;16:1410–1421. doi: 10.1110/ps.062735907.
    1. Ferrada E, Vergara IA, Melo F. A knowledge-based potential with an accurate description of local interactions improves discrimination between native and near-native protein conformations. Cell Biochemistry and Biophysics. 2007;49:111–124. doi: 10.1007/s12013-007-0050-5.
    1. Melo F, Sali A. Fold assessment for comparative protein structure modeling. Protein Science. 2007;16:2412–2426. doi: 10.1110/ps.072895107.

Source: PubMed

3
Abonner