Accuracy of the PHQ-2 Alone and in Combination With the PHQ-9 for Screening to Detect Major Depression: Systematic Review and Meta-analysis

Brooke Levis, Ying Sun, Chen He, Yin Wu, Ankur Krishnan, Parash Mani Bhandari, Dipika Neupane, Mahrukh Imran, Eliana Brehaut, Zelalem Negeri, Felix H Fischer, Andrea Benedetti, Brett D Thombs, Depression Screening Data (DEPRESSD) PHQ Collaboration, Liying Che, Alexander Levis, Kira Riehm, Nazanin Saadat, Marleine Azar, Danielle Rice, Jill Boruff, Lorie Kloda, Pim Cuijpers, Simon Gilbody, John Ioannidis, Dean McMillan, Scott Patten, Ian Shrier, Roy Ziegelstein, Ainsley Moore, Dickens Akena, Dagmar Amtmann, Bruce Arroll, Liat Ayalon, Hamid Baradaran, Anna Beraldi, Charles Bernstein, Arvin Bhana, Charles Bombardier, Ryna Imma Buji, Peter Butterworth, Gregory Carter, Marcos Chagas, Juliana Chan, Lai Fong Chan, Dixon Chibanda, Rushina Cholera, Kerrie Clover, Aaron Conway, Yeates Conwell, Federico Daray, Janneke de Man-van Ginkel, Jaime Delgadillo, Crisanto Diez-Quevedo, Jesse Fann, Sally Field, Jane Fisher, Daniel Fung, Emily Garman, Bizu Gelaye, Leila Gholizadeh, Lorna Gibson, Felicity Goodyear-Smith, Eric Green, Catherine Greeno, Brian Hall, Petra Hampel, Liisa Hantsoo, Emily Haroz, Martin Harter, Ulrich Hegerl, Leanne Hides, Stevan Hobfoll, Simone Honikman, Marie Hudson, Thomas Hyphantis, Masatoshi Inagaki, Khalida Ismail, Hong Jin Jeon, Nathalie Jetté, Mohammad Khamseh, Kim Kiely, Sebastian Kohler, Brandon Kohrt, Yunxin Kwan, Femke Lamers, María Asunción Lara, Holly Levin-Aspenson, Valéria Lino, Shen-Ing Liu, Manote Lotrakul, Sonia Loureiro, Bernd Löwe, Nagendra Luitel, Crick Lund, Ruth Ann Marrie, Laura Marsh, Brian Marx, Anthony McGuire, Sherina Mohd Sidik, Tiago Munhoz, Kumiko Muramatsu, Juliet Nakku, Laura Navarrete, Flávia Osório, Vikram Patel, Brian Pence, Philippe Persoons, Inge Petersen, Angelo Picardi, Stephanie Pugh, Terence Quinn, Elmars Rancans, Sujit Rathod, Katrin Reuter, Svenja Roch, Alasdair Rooney, Heather Rowe, Iná Santos, Miranda Schram, Juwita Shaaban, Eileen Shinn, Abbey Sidebottom, Adam Simning, Lena Spangenberg, Lesley Stafford, Sharon Sung, Keiko Suzuki, Richard Swartz, Pei Lin Lynnette Tan, Martin Taylor-Rowan, Thach Tran, Alyna Turner, Christina van der Feltz-Cornelis, Thandi van Heyningen, Henk van Weert, Lynne Wagner, Jian Li Wang, Jennifer White, Kirsty Winkley, Karen Wynter, Mitsuhiko Yamada, Qing Zhi Zeng, Yuying Zhang, Brooke Levis, Ying Sun, Chen He, Yin Wu, Ankur Krishnan, Parash Mani Bhandari, Dipika Neupane, Mahrukh Imran, Eliana Brehaut, Zelalem Negeri, Felix H Fischer, Andrea Benedetti, Brett D Thombs, Depression Screening Data (DEPRESSD) PHQ Collaboration, Liying Che, Alexander Levis, Kira Riehm, Nazanin Saadat, Marleine Azar, Danielle Rice, Jill Boruff, Lorie Kloda, Pim Cuijpers, Simon Gilbody, John Ioannidis, Dean McMillan, Scott Patten, Ian Shrier, Roy Ziegelstein, Ainsley Moore, Dickens Akena, Dagmar Amtmann, Bruce Arroll, Liat Ayalon, Hamid Baradaran, Anna Beraldi, Charles Bernstein, Arvin Bhana, Charles Bombardier, Ryna Imma Buji, Peter Butterworth, Gregory Carter, Marcos Chagas, Juliana Chan, Lai Fong Chan, Dixon Chibanda, Rushina Cholera, Kerrie Clover, Aaron Conway, Yeates Conwell, Federico Daray, Janneke de Man-van Ginkel, Jaime Delgadillo, Crisanto Diez-Quevedo, Jesse Fann, Sally Field, Jane Fisher, Daniel Fung, Emily Garman, Bizu Gelaye, Leila Gholizadeh, Lorna Gibson, Felicity Goodyear-Smith, Eric Green, Catherine Greeno, Brian Hall, Petra Hampel, Liisa Hantsoo, Emily Haroz, Martin Harter, Ulrich Hegerl, Leanne Hides, Stevan Hobfoll, Simone Honikman, Marie Hudson, Thomas Hyphantis, Masatoshi Inagaki, Khalida Ismail, Hong Jin Jeon, Nathalie Jetté, Mohammad Khamseh, Kim Kiely, Sebastian Kohler, Brandon Kohrt, Yunxin Kwan, Femke Lamers, María Asunción Lara, Holly Levin-Aspenson, Valéria Lino, Shen-Ing Liu, Manote Lotrakul, Sonia Loureiro, Bernd Löwe, Nagendra Luitel, Crick Lund, Ruth Ann Marrie, Laura Marsh, Brian Marx, Anthony McGuire, Sherina Mohd Sidik, Tiago Munhoz, Kumiko Muramatsu, Juliet Nakku, Laura Navarrete, Flávia Osório, Vikram Patel, Brian Pence, Philippe Persoons, Inge Petersen, Angelo Picardi, Stephanie Pugh, Terence Quinn, Elmars Rancans, Sujit Rathod, Katrin Reuter, Svenja Roch, Alasdair Rooney, Heather Rowe, Iná Santos, Miranda Schram, Juwita Shaaban, Eileen Shinn, Abbey Sidebottom, Adam Simning, Lena Spangenberg, Lesley Stafford, Sharon Sung, Keiko Suzuki, Richard Swartz, Pei Lin Lynnette Tan, Martin Taylor-Rowan, Thach Tran, Alyna Turner, Christina van der Feltz-Cornelis, Thandi van Heyningen, Henk van Weert, Lynne Wagner, Jian Li Wang, Jennifer White, Kirsty Winkley, Karen Wynter, Mitsuhiko Yamada, Qing Zhi Zeng, Yuying Zhang

Abstract

Importance: The Patient Health Questionnaire depression module (PHQ-9) is a 9-item self-administered instrument used for detecting depression and assessing severity of depression. The Patient Health Questionnaire-2 (PHQ-2) consists of the first 2 items of the PHQ-9 (which assess the frequency of depressed mood and anhedonia) and can be used as a first step to identify patients for evaluation with the full PHQ-9.

Objective: To estimate PHQ-2 accuracy alone and combined with the PHQ-9 for detecting major depression.

Data sources: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, and Web of Science (January 2000-May 2018).

Study selection: Eligible data sets compared PHQ-2 scores with major depression diagnoses from a validated diagnostic interview.

Data extraction and synthesis: Individual participant data were synthesized with bivariate random-effects meta-analysis to estimate pooled sensitivity and specificity of the PHQ-2 alone among studies using semistructured, fully structured, or Mini International Neuropsychiatric Interview (MINI) diagnostic interviews separately and in combination with the PHQ-9 vs the PHQ-9 alone for studies that used semistructured interviews. The PHQ-2 score ranges from 0 to 6, and the PHQ-9 score ranges from 0 to 27.

Results: Individual participant data were obtained from 100 of 136 eligible studies (44 318 participants; 4572 with major depression [10%]; mean [SD] age, 49 [17] years; 59% female). Among studies that used semistructured interviews, PHQ-2 sensitivity and specificity (95% CI) were 0.91 (0.88-0.94) and 0.67 (0.64-0.71) for cutoff scores of 2 or greater and 0.72 (0.67-0.77) and 0.85 (0.83-0.87) for cutoff scores of 3 or greater. Sensitivity was significantly greater for semistructured vs fully structured interviews. Specificity was not significantly different across the types of interviews. The area under the receiver operating characteristic curve was 0.88 (0.86-0.89) for semistructured interviews, 0.82 (0.81-0.84) for fully structured interviews, and 0.87 (0.85-0.88) for the MINI. There were no significant subgroup differences. For semistructured interviews, sensitivity for PHQ-2 scores of 2 or greater followed by PHQ-9 scores of 10 or greater (0.82 [0.76-0.86]) was not significantly different than PHQ-9 scores of 10 or greater alone (0.86 [0.80-0.90]); specificity for the combination was significantly but minimally higher (0.87 [0.84-0.89] vs 0.85 [0.82-0.87]). The area under the curve was 0.90 (0.89-0.91). The combination was estimated to reduce the number of participants needing to complete the full PHQ-9 by 57% (56%-58%).

Conclusions and relevance: In an individual participant data meta-analysis of studies that compared PHQ scores with major depression diagnoses, the combination of PHQ-2 (with cutoff ≥2) followed by PHQ-9 (with cutoff ≥10) had similar sensitivity but higher specificity compared with PHQ-9 cutoff scores of 10 or greater alone. Further research is needed to understand the clinical and research value of this combined approach to screening.

Conflict of interest statement

Conflict of Interest Disclosures: None reported.

Figures

Figure 1.. Flow Diagram of Study Selection…
Figure 1.. Flow Diagram of Study Selection Process
MINI indicates Mini International Neuropsychiatric Interview; PHQ, Patient Health Questionnaire.
Figure 2.. Receiver Operating Characteristic (ROC) Plots…
Figure 2.. Receiver Operating Characteristic (ROC) Plots Comparing Sensitivity and Specificity Estimates for the Patient Health Questionnaire–2 (PHQ-2) Alone, the Patient Health Questionnaire–9 (PHQ-9) Alone, and for PHQ-2 Scores of 2 or Greater Followed By PHQ-9
The figure is for the 44 studies (participants = 10 627; No. with major depression = 1361) that used a semistructured reference standard and had both PHQ-2 and PHQ-9 item scores available. Among the 48 PHQ-2 studies that used a semistructured reference standard, 4 studies did not have PHQ-9 item scores available, and thus could not be included in the comparison of screening strategies. The PHQ-2 line has 7 calculated points (inflections), representing possible scores of 0 (right) to 6 (left). The PHQ-9 alone and PHQ-2 scores of 2 or greater followed by PHQ-9 lines have 28 calculated points (inflections), representing possible scores of 0 (right) to 27 (left). The area under the curve was 0.88 (95% CI, 0.87-0.89) for PHQ-2 alone, 0.92 (95% CI, 0.91-0.93) for PHQ-9 alone, and 0.90 (95% CI, 0.89-0.91) for PHQ-2 scores of 2 or greater followed by PHQ-9.

References

    1. Thombs BD, Ziegelstein RC. Does depression screening improve depression outcomes in primary care? BMJ. 2014;348:g1253. doi:10.1136/bmj.g1253
    1. Maurer DM, Raymond TJ, Davis BN. Depression: screening and diagnosis. Am Fam Physician. 2018;98(8):508-515.
    1. Mitchell J, Trangle M, Degnan B, et al. Adult depression in primary care guideline. Institute for Clinical Systems Improvement. Accessed April 7, 2020.
    1. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003;41(11):1284-1292. doi:10.1097/01.MLR.0000093487.78664.3C
    1. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613. doi:10.1046/j.1525-1497.2001.016009606.x
    1. Siu AL, Bibbins-Domingo K, Grossman DC, et al. ; US Preventive Services Task Force (USPSTF) . Screening for depression in adults: US Preventive Services Task Force recommendation statement. JAMA. 2016;315(4):380-387. doi:10.1001/jama.2015.18392
    1. American Academy of Family Physicians Clinical preventive service recommendation: depression. Accessed April 7, 2020.
    1. Manea L, Gilbody S, Hewitt C, et al. . Identifying depression with the PHQ-2. J Affect Disord. 2016;203:382-395. doi:10.1016/j.jad.2016.06.003
    1. Moriarty AS, Gilbody S, McMillan D, Manea L. Screening and case finding for major depressive disorder using the Patient Health Questionnaire (PHQ-9). Gen Hosp Psychiatry. 2015;37(6):567-576. doi:10.1016/j.genhosppsych.2015.06.012
    1. Manea L, Gilbody S, McMillan D. A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. Gen Hosp Psychiatry. 2015;37(1):67-75. doi:10.1016/j.genhosppsych.2014.09.009
    1. Levis B, Benedetti A, Riehm KE, et al. . Probability of major depression diagnostic classification using semi-structured versus fully structured diagnostic interviews. Br J Psychiatry. 2018;212(6):377-385. doi:10.1192/bjp.2018.54
    1. Levis B, McMillan D, Sun Y, et al. . Comparison of major depression diagnostic classification probability using the SCID, CIDI, and MINI diagnostic interviews among women in pregnancy or postpartum. Int J Methods Psychiatr Res. 2019;28(4):e1803. doi:10.1002/mpr.1803
    1. Wu Y, Levis B, Sun Y, et al. . Probability of major depression diagnostic classification based on the SCID, CIDI and MINI diagnostic interviews controlling for Hospital Anxiety and Depression Scale–Depression subscale scores. J Psychosom Res. 2020;129:109892. doi:10.1016/j.jpsychores.2019.109892
    1. Richardson TM, He H, Podgorski C, Tu X, Conwell Y. Screening depression aging services clients. Am J Geriatr Psychiatry. 2010;18(12):1116-1123. doi:10.1097/JGP.0b013e3181dd1c26
    1. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major depression among patients with coronary artery disease using the patient health questionnaire. J Gen Intern Med. 2008;23(12):2014-2017. doi:10.1007/s11606-008-0802-y
    1. Thombs BD, Benedetti A, Kloda LA, et al. . The diagnostic accuracy of the Patient Health Questionnaire-2 (PHQ-2), Patient Health Questionnaire-8 (PHQ-8), and Patient Health Questionnaire-9 (PHQ-9) for detecting major depression. Syst Rev. 2014;3(1):124. doi:10.1186/2046-4053-3-124
    1. McInnes MDF, Moher D, Thombs BD, et al. ; and the PRISMA-DTA Group . Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy studies. JAMA. 2018;319(4):388-396. doi:10.1001/jama.2017.19163
    1. Stewart LA, Clarke M, Rovers M, et al. ; PRISMA-IPD Development Group . Preferred Reporting Items for Systematic Review and Meta-Analyses of Individual Participant Data. JAMA. 2015;313(16):1657-1665. doi:10.1001/jama.2015.3656
    1. Wu Y, Levis B, Riehm KE, et al. . Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9. Psychol Med. Published online July 12, 2019. doi:10.1017/S0033291719001314
    1. Levis B, Benedetti A, Thombs BD; DEPRESsion Screening Data (DEPRESSD) Collaboration . Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression. BMJ. 2019;365:l1476. doi:10.1136/bmj.l1476
    1. American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders. 3rd ed, revised American Psychiatric Association; 1987.
    1. American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders. 4th ed American Psychiatric Association; 1994.
    1. American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders. 4th ed, text revision American Psychiatric Association; 2000.
    1. World Health Organization The ICD-10 Classifications of Mental and Behavioural Disorder: Clinical Descriptions and Diagnostic Guidelines. World Health Organization; 1992.
    1. Thombs BD, Arthurs E, El-Baalbaki G, Meijer A, Ziegelstein RC, Steele RJ. Risk of bias from inclusion of patients who already have diagnosis of or are undergoing treatment for depression in diagnostic accuracy studies of screening tools for depression. BMJ. 2011;343:d4825. doi:10.1136/bmj.d4825
    1. Canadian Agency for Drugs and Technologies in Health PRESS Peer Review of Electronic Search Strategies: 2015 Guideline Explanation and Elaboration (PRESS E&E). Canadian Agency for Drugs and Technologies in Health; 2016.
    1. United Nations Development Programme. Human development report 2019: beyond income, beyond averages, beyond today. Accessed April 7, 2020.
    1. Whiting PF, Rutjes AW, Westwood ME, et al. ; QUADAS-2 Group . QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529-536. doi:10.7326/0003-4819-155-8-201110180-00009
    1. Lecrubier Y, Sheehan DV, Weiller E, et al. . The Mini International Neuropsychiatric Interview (MINI): a short diagnostic structured interview. Eur Psychiatry. 1997;12(5):224-231. doi:10.1016/S0924-9338(97)83296-8
    1. Sheehan DV, Lecrubier Y, Sheehan KH, et al. . The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. Eur Psychiatry. 1997;12(5):232-241. doi:10.1016/S0924-9338(97)83297-X
    1. Robins LN, Wing J, Wittchen HU, et al. . The Composite International Diagnostic Interview: an epidemiologic instrument suitable for use in conjunction with different diagnostic systems and in different cultures. Arch Gen Psychiatry. 1988;45(12):1069-1077. doi:10.1001/archpsyc.1988.01800360017003
    1. Brugha TS, Jenkins R, Taub N, Meltzer H, Bebbington PE. A general population comparison of the Composite International Diagnostic Interview (CIDI) and the Schedules for Clinical Assessment in Neuropsychiatry (SCAN). Psychol Med. 2001;31(6):1001-1013. doi:10.1017/S0033291701004184
    1. Brugha TS, Bebbington PE, Jenkins R. A difference that matters: comparisons of structured and semi-structured psychiatric diagnostic interviews in the general population. Psychol Med. 1999;29(5):1013-1020. doi:10.1017/S0033291799008880
    1. Nosen E, Woody SR. Diagnostic assessment in research In: McKay D, ed. Handbook of Research Methods in Abnormal and Clinical Psychology. Sage; 2008:chap 8.
    1. Kurdyak PA, Gnam WH. Small signal, big noise: performance of the CIDI depression module. Can J Psychiatry. 2005;50(13):851-856. doi:10.1177/070674370505001308
    1. Riley RD, Dodd SR, Craig JV, Thompson JR, Williamson PR. Meta-analysis of diagnostic test studies using individual patient data and aggregate data. Stat Med. 2008;27(29):6111-6136. doi:10.1002/sim.3441
    1. van der Leeden R, Busing FMTA, Meijer E. Bootstrap Methods for Two-Level Models: Technical Report PRM 97-04. Leiden University, Department of Psychology; 1997.
    1. van der Leeden R, Meijer E, Busing FMTA. Resampling multilevel models In: Leeuw J, Meijer E, eds. Handbook of Multilevel Analysis. Springer; 2008:401-433. doi:10.1007/978-0-387-73186-5_11
    1. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539-1558. doi:10.1002/sim.1186
    1. Liu ZW, Yu Y, Hu M, Liu HM, Zhou L, Xiao SY. PHQ-9 and PHQ-2 for screening depression in Chinese rural elderly. PLoS One. 2016;11(3):e0151042. doi:10.1371/journal.pone.0151042
    1. Phelan E, Williams B, Meeker K, et al. . A study of the diagnostic accuracy of the PHQ-9 in primary care elderly. BMC Fam Pract. 2010;11(1):63. doi:10.1186/1471-2296-11-63
    1. Wang L, Lu K, Li J, Sheng L, Ding R, Hu D. Value of Patient Health Questionnaires (PHQ)-9 and PHQ-2 for screening depression disorders in cardiovascular outpatients [in Chinese]. Zhonghua Xin Xue Guan Bing Za Zhi. 2015;43(5):428-431.
    1. Choi SK, Boyle E, Burchell AN, et al. ; OHTN Cohort Study Group . Validation of six short and ultra-short screening instruments for depression for people living with HIV in Ontario. PLoS One. 2015;10(11):e0142706. doi:10.1371/journal.pone.0142706
    1. Rathore JS, Jehi LE, Fan Y, et al. . Validation of the Patient Health Questionnaire-9 (PHQ-9) for depression screening in adults with epilepsy. Epilepsy Behav. 2014;37:215-220. doi:10.1016/j.yebeh.2014.06.030
    1. Seo JG, Park SP. Validation of the Patient Health Questionnaire-9 (PHQ-9) and PHQ-2 in patients with migraine. J Headache Pain. 2015;16(1):65. doi:10.1186/s10194-015-0552-2
    1. Xiong N, Fritzsche K, Wei J, et al. . Validation of Patient Health Questionnaire (PHQ) for major depression in Chinese outpatients with multiple somatic symptoms. J Affect Disord. 2015;174:636-643. doi:10.1016/j.jad.2014.12.042
    1. Bates D, Machler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67(1):1-48. doi:10.18637/jss.v067.i01
    1. First MB. Structured Clinical Interview for the DSM (SCID). John Wiley & Sons Inc; 1995.
    1. Thombs BD, Ziegelstein RC, Roseman M, Kloda LA, Ioannidis JP. There are no randomized controlled trials that support the United States Preventive Services Task Force guideline on screening for depression in primary care. BMC Med. 2014;12(1):13. doi:10.1186/1741-7015-12-13
    1. Joffres M, Jaramillo A, Dickinson J, et al. ; Canadian Task Force on Preventive Health Care . Recommendations on screening for depression in adults. CMAJ. 2013;185(9):775-782. doi:10.1503/cmaj.130403
    1. Allaby M. Screening for Depression: A Report for the UK National Screening Committee (Revised Report). UK National Screening Committee; 2010.
    1. National Institute for Health and Care Excellence Depression in adults: treatment and management: consultation draft. Accessed April 7, 2020.

Source: PubMed

3
Subscribe