Indirect calibration between clinical observers - application to the New York Heart Association functional classification system

Milton Severo, Rita Gaio, Patrícia Lourenço, Margarida Alvelos, Paulo Bettencourt, Ana Azevedo, Milton Severo, Rita Gaio, Patrícia Lourenço, Margarida Alvelos, Paulo Bettencourt, Ana Azevedo

Abstract

Background: Previous studies showed an inter-observer agreement for the NYHA classification of approximately 55%. The aim of this study was to calibrate the New York Heart Association (NYHA) classification system between observers, increasing its reliability.

Results: Among 1136 community-dwellers in Porto, Portugal, aged ≥ 45 years, 265 reporting breathlessness answered a 4-item questionnaire to characterize symptom severity. The questionnaire was administered by 7 physicians who also classified the subject's functional capacity according to NYHA. Each subject was assessed by one physician. We calibrated NYHA classifications by the concurrent method, using 1-parameter logistic graded response model. Discrepancies between observers were assessed by differences in ability thresholds between NYHA classes I-II and II-III. The ability estimated by the model was used to predict the NYHA classification for each observer.Estimates of the first and second thresholds for each observer ranged from -1.92 to 0.46 and from 1.42 to 2.30, respectively. The agreement between estimated ability and the observers' NYHA classification was 88% (kappa = 0.61).

Conclusions: The study objectively indicates the main reason why several studies have reported low inter-observer is the existence of discrepant thresholds between observers in the definition of NYHA classes. The concurrent method can be used to minimize the reliability problem of NYHA classification.

Figures

Figure 1
Figure 1
Item operation characteristic curves1 for 4 anchor items (dashed lines) and 7 observers for NYHA classification (solid lines). 1Item operation characteristic curves (IOCC) for category k represent the probability of endorsing categories higher than k conditional on subject's ability.

References

    1. Nomenclature and Criteria for Diagnosis of Diseases of the Heart and Great Vessels. 9, revised. Little, Brown; 1994.
    1. Bennett JA, Riegel B, Bittner V, Nichols J. Validity and reliability of the NYHA classes for measuring research outcomes in patients with cardiac disease. Heart Lung. 2002;31:262–270. doi: 10.1067/mhl.2002.124554.
    1. Raphael C, Briscoe C, Davies J, Ian Whinnett Z, Manisty C, Sutton R, Mayet J, Francis DP. Limitations of the New York Heart Association functional classification system and self-reported walking distances in chronic heart failure. Heart. 2007;93:476–482. doi: 10.1136/hrt.2006.089656.
    1. Goldman L, Hashimoto B, Cook EF, Loscalzo A. Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new specific activity scale. Circulation. 1981;64:1227–1234. doi: 10.1161/01.CIR.64.6.1227.
    1. Ramos E, Lopes C, Barros H. Investigating the effect of nonparticipation using a population-based case-control study on myocardial infarction. Ann Epidemiol. 2004;14:437–441. doi: 10.1016/j.annepidem.2003.09.013.
    1. McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31:247–263. doi: 10.1097/00005650-199303000-00006.
    1. Severo M, Santos AC, Lopes C, Barros H. Reliability and validity in measuring physical and mental health construct of the Portuguese version of MOS SF-36. Acta Med Port. 2006;19:281–287.
    1. Ainsworth BE, Haskell WL, Leon AS, Jacobs DR Jr, Montoye HJ, Sallis JF, Paffenbarger RS Jr. Compendium of physical activities: classification of energy costs of human physical activities. Med Sci Sports Exerc. 1993;25:71–80. doi: 10.1249/00005768-199301000-00011.
    1. Krupp LB, LaRocca NG, Muir-Nash J, Steinberg AD. The fatigue severity scale. Application to patients with multiple sclerosis and systemic lupus erythematosus. Arch Neurol. 1989;46:1121–1123.
    1. Cortina JM. What Is Coefficient Alpha? An Examination of Theory and Applications. Journal of applied psychology. 1993;78:98–98.
    1. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 1999;6:1–55.
    1. McHorney CA, Cohen AS. Equating health status measures with item response theory: illustrations with functional status items. Med Care. 2000;38:II43–59.
    1. Samejima F. Graded response model. Handbook of modern item response theory. 1997. pp. 85–100.
    1. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174. doi: 10.2307/2529310.
    1. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Development Core Team; 2008.
    1. Rizopoulos D. ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software. 2006;17:1–25.
    1. Muthén L, Muthén B. Mplus User's Guide [Computer software and manual]. 5. Los Angeles: Muthén & Muthén. 2008.
    1. van den Broek SA, van Veldhuisen DJ, de Graeff PA, Landsman ML, Hillege H, Lie KI. Comparison between New York Heart Association classification and peak oxygen consumption in the assessment of functional status and prognosis in patients with mild to moderate chronic congestive heart failure secondary to either ischemic or idiopathic dilated cardiomyopathy. Am J Cardiol. 1992;70:359–363. doi: 10.1016/0002-9149(92)90619-A.
    1. Ganiats TG, Browner DK, Dittrich HC. Comparison of Quality of Well-Being scale and NYHA functional status classification in patients with atrial fibrillation. New York Heart Association. Am Heart J. 1998;135:819–824. doi: 10.1016/S0002-8703(98)70040-7.
    1. Koren-Morag N, Goldbourt U, Tanne D. Poor functional status based on the New York Heart Association classification exposes the coronary patient to an elevated risk of ischemic stroke. Am Heart J. 2008;155:515–520. doi: 10.1016/j.ahj.2007.10.032.
    1. Tedesco C, Manning S, Lindsay R, Alexander C, Owen R, Smucker ML. Functional assessment of elderly patients after percutaneous aortic balloon valvuloplasty: New York Heart Association classification versus functional status questionnaire. Heart Lung. 1990;19:118–125.
    1. Downing SM. Item response theory: applications of modern test theory in medical education. Med Educ. 2003;37:739–745. doi: 10.1046/j.1365-2923.2003.01587.x.
    1. Marco GL. Item Characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement. 1977;14:139–160. doi: 10.1111/j.1745-3984.1977.tb00033.x.

Source: PubMed

3
Prenumerera