An item response theory evaluation of three depression assessment instruments in a clinical sample

Mats Adler, Jerker Hetta, Göran Isacsson, Ulf Brodin, Mats Adler, Jerker Hetta, Göran Isacsson, Ulf Brodin

Abstract

Background: This study investigates whether an analysis, based on Item Response Theory (IRT), can be used for initial evaluations of depression assessment instruments in a limited patient sample from an affective disorder outpatient clinic, with the aim to finding major advantages and deficiencies of the instruments.

Methods: Three depression assessment instruments, the depression module from the Patient Health Questionnaire (PHQ9), the depression subscale of Affective Self Rating Scale (AS-18-D) and the Montgomery-Åsberg Depression Rating Scale (MADRS) were evaluated in a sample of 61 patients with affective disorder diagnoses, mainly bipolar disorder. A '3- step IRT strategy' was used.

Results: In a first step, the Mokken non-parametric analysis showed that PHQ9 and AS-18-D had strong overall scalabilities of 0.510 [C.I. 0.42, 0.61] and 0,513 [C.I. 0.41, 0.63] respectively, while MADRS had a weak scalability of 0.339 [C.I. 0.25, 0.43]. In a second step, a Rasch model analysis indicated large differences concerning the item discriminating capacity and was therefore considered not suitable for the data. In third step, applying a more flexible two parameter model, all three instruments showed large differences in item information and items had a low capacity to reliably measure respondents at low levels of depression severity.

Conclusions: We conclude that a stepwise IRT-approach, as performed in this study, is a suitable tool for studying assessment instruments at early stages of development. Such an analysis can give useful information, even in small samples, in order to construct more precise measurements or to evaluate existing assessment instruments. The study suggests that the PHQ9 and AS-18-D can be useful for measurement of depression severity in an outpatient clinic for affective disorder, while the MADRS shows weak measurement properties for this type of patients.

Figures

Figure 1
Figure 1
An insufficient coverage of the items compared to the severity of depression in a large part of the sample.
Figure 2
Figure 2
An insufficient coverage of the items compared to the severity of depression in a large part of the sample.
Figure 3
Figure 3
An insufficient coverage of the items compared to the severity of depression in a large part of the sample.
Figure 4
Figure 4
The areas under the curves in Figure 4-6 represent the approximate amount of information that can be picked up by the individual items.
Figure 5
Figure 5
The areas under the curves in Figure 4-6 represent the approximate amount of information that can be picked up by the individual items.
Figure 6
Figure 6
The areas under the curves in Figure 4-6 represent the approximate amount of information that can be picked up by the individual items.
Figure 7
Figure 7
The relationship between the item locations in the three instruments.

References

    1. Spitzer RL, Kroenke K, Williams JBW. Patient Health Questionnaire Primary Care Study Group. Validation and utility of a self-report version of PRIME-MD: The PHQ Primary Care Study. JAMA. 1999;282(18):1737–1744. doi: 10.1001/jama.282.18.1737.
    1. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–613. doi: 10.1046/j.1525-1497.2001.016009606.x.
    1. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003;41(11):1284–1292. doi: 10.1097/01.MLR.0000093487.78664.3C.
    1. Lowe B, Kroenke K, Herzog W, Grafe K. Measuring depression outcome with a brief self-report instrument: sensitivity to change of the Patient Health Questionnaire (PHQ-9) J Affect Disord. 2004;81(1):61–66. doi: 10.1016/S0165-0327(03)00198-8.
    1. Lowe B, Spitzer RL, Grafe K, Kroenke K, Quenter A, Zipfel S, Buchholz C, Witte S, Herzog W. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses. J Affect Disord. 2004;78(2):131–140. doi: 10.1016/S0165-0327(02)00237-9.
    1. Lowe B, Unutzer J, Callahan CM, Perkins AJ, Kroenke K. Monitoring depression treatment outcomes with the patient health questionnaire-9. Med Care. 2004;42(12):1194–1201. doi: 10.1097/00005650-200412000-00006.
    1. Williams RT, Heinemann AW, Bode RK, Wilson CS, Fann JR, Tate DG. Improving measurement properties of the Patient Health Questionnaire-9 with rating scale analysis. Rehabil Psychol. 2009;54(2):198–203.
    1. Kendel F, Wirtz M, Dunkel A, Lehmkuhl E, Hetzer R, Regitz-Zagrosek V. Screening for depression: Rasch analysis of the dimensional structure of the PHQ-9 and the HADS-D. J Affect Disord. 2010;122(3):241–246. doi: 10.1016/j.jad.2009.07.004.
    1. DSM-5. The Future of Psychiatric Diagnosis. .
    1. Adler M, Brodin U. An IRT validation of the affective self rating scale. Nord J Psychiatry. 2011. in press.
    1. Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry. 1979;134:382–389. doi: 10.1192/bjp.134.4.382.
    1. Grunze H, Vieta E, Goodwin GM, Bowden C, Licht RW, Moller HJ, Kasper S. The World Federation of Societies of Biological Psychiatry (WFSBP) Guidelines for the Biological Treatment of Bipolar Disorders: update 2010 on the treatment of acute bipolar depression. World J Biol Psychiatry. 2010;11(2):81–109. doi: 10.3109/15622970903555881.
    1. Furukawa TA. Assessment of mood: guides for clinicians. J Psychosom Res. 2010;68(6):581–589. doi: 10.1016/j.jpsychores.2009.05.003.
    1. Williamson D, Brown E, Perlis RH, Ahl J, Baker RW, Tohen M. Clinical relevance of depressive symptom improvement in bipolar I depressed patients. J Affect Disord. 2006;92(2–3):261–266.
    1. Bagby RM, Ryder AG, Schuller DR, Marshall MB. The Hamilton Depression Rating Scale: has the gold standard become a lead weight? Am J Psychiatry. 2004;161(12):2163–2177. doi: 10.1176/appi.ajp.161.12.2163.
    1. Foley BP. Improving IRT parameter estimates with small sample sizes: Evaluating the efficacy of a new data augmentation technique. Lincoln: University of Nebraska - Lincoln; 2010.
    1. Adler M, Liberg B, Andersson S, Isacsson G, Hetta J. Development and validation of the Affective Self Rating Scale for manic, depressive, and mixed affective states. Nord J Psychiatry. 2008;62(2):130–135. doi: 10.1080/08039480801960354.
    1. Sijtsma K. Molenaar IW: Introduction to Nonparametric Item Response Theory 5. Thousand Oaks: SAGE Publications; 2002.
    1. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2010. ISBN 3-900051-07-0, URL
    1. Davison AC, Hinkley DV. Bootstrap Methods and their Application. The Edinburgh Building, Cambridge CB2 2RU, United Kingdom; 40 West 20th Street, New York, NY10011-4211, USA; 10 Stamford Road, Oakleigh, Melbourne 3166, Australia: Cambridge University Press; 1997.
    1. Embretson SE, Reise SP. Item Response Theory for Psychologists. Mahawah, New Jersey: Lawrence Erlbaum Associates, Publishers; 2000.
    1. Rizopoulos D. ltm: An R package for latent variable modelling and item response theory analyses. J Stat Softw. 2006;17(5):1–25.
    1. Muraki E, Bock RD. Parscale 4 for Windows: IRT based test scoring and item analysis for graded items and rating scales [Computer software] 41. Skokie, IL: Scientific Software International, Inc; 2003.
    1. Reise SP, Haviland MG. Item response theory and the measurement of clinical change. J Pers Assess. 2005;84(3):228–238. doi: 10.1207/s15327752jpa8403_02.
    1. Sijtsma K. On the use, the misuse, and the very limited usefulness of Cronbach's Alpha. Psychometrika. 2009;74(1):107–120. doi: 10.1007/s11336-008-9101-0.
    1. Maier W, Philipp M. Comparative analysis of observer depression scales. Acta Psychiatr Scand. 1985;72(3):239–245. doi: 10.1111/j.1600-0447.1985.tb02601.x.
    1. Allerup P. Statistical Analysis of MADRS - a rating scale. Copenhagen: Danish Institute of Educational Research; 1986.
    1. Galinowski A, Lehert P. Structural validity of MADRS during antidepressant treatment. Int Clin Psychopharmacol. 1995;10(3):157–161. doi: 10.1097/00004850-199510030-00004.
    1. Rocca P, Fonzo V, Ravizza L, Rocca G, Scotta M, Zanalda E, Bogetto F. A comparison of paroxetine and amisulpride in the treatment of dysthymic disorder. J Affect Disord. 2002;70(3):313–317. doi: 10.1016/S0165-0327(01)00327-5.
    1. Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG. Quantitative rating of depressive states. Acta Psychiatr Scand. 1975;51(3):161–170. doi: 10.1111/j.1600-0447.1975.tb00002.x.
    1. Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ. The responsiveness of the Hamilton Depression Rating Scale. J Psychiatr Res. 2000;34(1):3–10. doi: 10.1016/S0022-3956(99)00037-0.

Source: PubMed

Подписаться