Rater Training for a Multi-Site, International Clinical Trial: What Mood Symptoms may be most Difficult to Rate?

Martha Sajatovic, Richa Gaur, Curtis Tatsuoka, Susan De Santi, Nathan Lee, Judith Laredo, Sulabh Tripathi, Martha Sajatovic, Richa Gaur, Curtis Tatsuoka, Susan De Santi, Nathan Lee, Judith Laredo, Sulabh Tripathi

Abstract

Aims: Given resource constraints in conducting clinical trials, it is critical that rater training focuses on scale items wherein standardization is most challenging. This analysis examined mood disorder symptom ratings submitted in an online rater training program conducted preparatory to the initiation of a multi-site, international mood disorder treatment trial. Ratings were entered online and analyzed for consistency and variability, and compared to established standards (Gold Consensus Ratings/ GCRs).

Methods: Raters participated in web-based rater training on the Hamilton Depression Rating Scale (HAM-D), Montgomery Asberg Rating Scale (MADRS), and Young Mania Rating Scale (YMRS). Training included integration of didactic materials and videos of two bipolar depressed patients interviewed by two U.S. clinicians. Raters viewed the videos and rated the mood scales. Inter-rater agreement was assessed using Kappa statistics. Ratings between the raters and the GCRs for individual scale items were assessed using McNemar test for paired binomial proportions.

Results: 194 raters from 16 countries, 80 sites and speaking 20 different languages participated. Interrater agreement on videos ratings ranged from substantial to moderate (HAM-D, Kappa video A = 0.72, video B = 0.65, p < 0.001), (MADRS, Kappa = 0.65 and 0.47, p < 0.001), (YMRS, Kappa = 0.75, and 0.64, p < 0.001). There was no significant difference on agreement based upon on English proficiency, clinical experience, or by country. Scale items that differed from the GCR on the HAM-D were depressed mood, delayed insomnia, retardation, and anxiety (psychic). Items that differed on the MADRS were apparent sadness, inner tension, concentration difficulties, lassitude and inability to feel. Items that differed on the YMRS were irritability and disruptive behavior.

Conclusions: Identification of specific rating scale items in which rater variability is greatest may facilitate training approaches that target these areas for more efficient training in international clinical trials.

Keywords: bipolar disorder; clinical trials; depression; rating scales.

Figures

Figure 1
Figure 1
Format for Web-Based Training on 3 Mood Symptom Scales

References

    1. Mulsant BH, Kastango KB, Rosen J, Stone RA, Mazumdar S, Pollock BG. Interrater Reliability in Clinical Trials of Depressive Disorders. Am J Psychiatry. 2002;159(9):1598–1600.
    1. Targum SD. Evaluating rater competency for CNS clinical trials. J Clin Psychopharmacol. 2006;26(3):308–310.
    1. Thiers FA, Sinskey AJ, Berndt ER. Trends in the globalization of clinical trials. Nat Rev Drug Discov. 2008;7:13–14.
    1. Kobak KA, Lipsitz JD, Williams JB, Engelhardt N, Bellew KM. A new approach to rater training and certification in a multicenter clinical trial. J Clin Psychopharmacol. 2005;25(5):407–412.
    1. Kobak KA, Lipsitz JD, Feiger AD. Development of a standardized training program for the Hamilton Depression Scale using internet-based technologies: results from a pilot study. J Psychiatr Res. 2003;37(6):509–515.
    1. Kobak KA, Engelhardt N, Lipsitz JD. Enriched rater training using Internet based technologies: a comparison to traditional rater training in a multi-site depression trial. J Psychiatr Res. 2006;40(3):192–199. Epub 2005 Sep 28.
    1. Kobak KA, Opler MG, Engelhardt N. PANSS rater training using Internet and videoconference: results from a pilot study. Schizophr Res. 2007;92(1-3):63–67. Epub 2007 Mar 1.
    1. Glickman SW, McHutchison JG, Peterson ED, Cairns CB, Harrington RA, Califf RM, Schulman KA. Ethical and Scientific Implications of the Globalization of Clinical Research. N Engl J Med. 2009;360:816–823.
    1. Rai S. Drug companies cut costs with foreign clinical trials. New York Times. 2005 Feb 24;:C4.
    1. Garnier JP. Rebuilding the R&D engine in big pharma. Harv Bus Rev. 2008;86:68–76.
    1. DiMasi JA, Hansen RW, Grabowski HG. The price of innovation: new estimates of drug development costs. J Health Econ. 2003;22:151–185.
    1. Schmidt CW. Monitoring research overseas. Modern Drug Discovery. 2001;4(2):25–26.
    1. Kalai A, Williams JB, Koback KA, Lipsitz J, Engelhardt N, Evans K, Olin J, Pearson J, Rothman M, Bech P. The new GRID HAM-D: pilot testing and international field trials. Int J Neuropsychopharmacol. 2002;5:S147–S148.
    1. Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry. 1979;134:382–389.
    1. Young RC, Biggs JT, Ziegler VE, Meyer DA. A rating scale for mania: Reliability, validity and sensitivity. Br J Psychiatry. 1978;133:429–435.
    1. McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153–157.
    1. Rosen J, Mulsant BH, Marino P, Broening C, Young RC, Fox D. Web-based training and interrater reliability testing for scoring the Hamilton Depression Rating Scale. Psychiatry Res. 2008;161(1):126–130.
    1. Leidy NK. Evolving Concepts in the Measurement of Treatment Effects. Proc Am Thorac Soc. 2006;3:212–217.
    1. Jeglic E, Kobak KA, Engelhardt N, Williams JB, Lipsitz JD, Salvucci D, Bryson H, Bellew K. A novel approach to rater training and certification in multinational trials. Int Clin Psychopharmacol. 2007;22(4):187–191.
    1. Müller MJ, Dragicevic A. Standardized rater training for the Hamilton Depression Rating Scale (HAMD-17) in psychiatric novices. J Affect Disord. 2003;77(1):65–69.
    1. Yavorsky W, Liechti S, Defries A, Opler M. The impact of language and culture on the delivery of standardized rater training for the PANSS across seven countries. European Psychiatry. 2010;25(Suppl 1):1555.
    1. LO G, Yavorsky C. Tourian Cross-Cultural Comparisons of American and Japanese Clinical Raters on Patients with Major Depressive Disorder using the Hamilton-Depression Scale-17 (HAM-D17) 2010 Oct 21; accessed.
    1. Williams JB, Kobak KA. Development and reliability of a structured interview guide for the Montgomery-Åsberg Depression Rating Scale (SIGMA) BR J Psychiatry. 2008;192:52–58.
    1. Mackin P, Targum SD, Kalali A, Rom D, Young AH. Culture and assessment of manic symptoms. Br J Psychiatry. 2006;189:379–380.
    1. Spitzer RL, Williams JBW. Classification in Psychiatry. In: Kaplan HI, Freeman AM, Sadock BJ, editors. Comprehensive Textbook of Psychiatry III. Baltimore: Williams & Wilkins; 1980. pp. 1035–1072.
    1. Engelhardt N, Feiger AD, Cogger KO, Sikich D, DeBrota DJ, Lipsitz JD, Kobak KA, Evans KR, Potter WZ. Rating the raters: assessing the quality of Hamilton rating scale for depression clinical interviews in two industry-sponsored clinical drug trials. J Clin Psychopharmacol. 2006;26(1):71–74.
    1. Lipsitz J, Kobak K, Feiger AD, Sikich D, Moroze G, Engelhardt N. The Rater Applied Performance Scale: development and reliability. Psychiatry Research. 2004;127:147–155.

Source: PubMed

3
購読する