Constructing validity: New developments in creating objective measuring instruments

Lee Anna Clark, David Watson, Lee Anna Clark, David Watson

Abstract

In this update of Clark and Watson (1995), we provide a synopsis of major points of our earlier article and discuss issues in scale construction that have become more salient as clinical and personality assessment has progressed over the past quarter-century. It remains true that the primary goal of scale development is to create valid measures of underlying constructs and that Loevinger's theoretical scheme provides a powerful model for scale development. We still discuss practical issues to help developers maximize their measures' construct validity, reiterating the importance of (a) clear conceptualization of target constructs, (b) an overinclusive initial item pool, (c) paying careful attention to item wording, (d) testing the item pool against closely related constructs, (e) choosing validation samples thoughtfully, and (f) emphasizing unidimensionality over internal consistency. We have added (g) consideration of the hierarchical structures of personality and psychopathology in scale development, discussion of (h) codeveloping scales in the context of these structures, (i) "orphan," and "interstitial" constructs, which do not fit neatly within these structures, (j) problems with "conglomerate" constructs, and (k) developing alternative versions of measures, including short forms, translations, informant versions, and age-based adaptations. Finally, we have expanded our discussions of (l) item-response theory and of external validity, emphasizing (m) convergent and discriminant validity, (n) incremental validity, and (o) cross-method analyses, such as questionnaires and interviews. We conclude by reaffirming that all mature sciences are built on the bedrock of sound measurement and that psychology must redouble its efforts to develop reliable and valid measures. (PsycINFO Database Record (c) 2019 APA, all rights reserved).

References

    1. Achenbach TM, Ivanova MY, & Rescorla LA (2017). Empirically based assessment and taxonomy of psychopathology for ages 1½–90+ years: Developmental, multi-informant, and multicultural findings. Comprehensive Psychiatry, 79, 4–18. 10.1016/j.comppsych.2017.03.006
    1. Achenbach TM, Krukowski RA, Dumenci L, & Ivanova MY (2005). Assessment of adult psychopathology: Meta-analyses and implications of cross-informant correlations. Psychological Bulletin, 131(3), 361–382. doi: 10.1037/0033-2909.131.3.361
    1. Ahmed SR, Fowler PJ, & Toro PA (2011). Family, public and private religiousness and psychological well-being over time in at-risk adolescents. Mental Health, Religion & Culture, 14(4), 393–408. doi: 10.1080/13674671003762685
    1. Allport GW, & Odbert HS (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(1), i–171. doi: 10.1037/h0093360
    1. American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing (AERA, APA, & NCME). (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
    1. American Psychological Association (1985). Standards for educational and psychological testing. Washington, DC: Author.
    1. Angleitner A, & Wiggins JS (1985). Personality assessment via questionnaires. New York: Springer-Verlag.
    1. Boyle GJ (1991). Does item homogeneity indicate internal consistency or item redundancy in psychometric scales? Personality and Individual Differences, 12(3), 291–294.doi: 10.1016/0191-8869(91)90115-R
    1. Briggs SR, & Cheek JM (1986). The role of factor analysis in the development and evaluation of personality scales. Journal of Personality, 54, 106–148.> doi: 10.1111/j.1467-6494.1986.tb00391
    1. Brown A, & Maydeu-Olivares A (2013). How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychological Methods, 18(1), 36–52. doi: 10.1037/a0030641
    1. Burisch M (1984). Approaches to personality inventory construction: A comparison of merits. American Psychologist, 39, 214–227. doi: 10.1037/0003-066X.39.3.214
    1. Campbell DT, & Fiske DW (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. 10.1037/h0046016
    1. Carmona-Perera M, Caracuel A, Pérez-García M, & Verdejo-García A (2015). Brief moral decision-making questionnaire: A Rasch-derived short form of the Greene dilemmas. Psychological Assessment, 27(2), 424–432. doi: 10.1037/pas0000049
    1. Clark LA, Livesley WJ, & Morey L (1997). Personality disorder assessment: The challenge of construct validity. Journal of Personality Disorders, 11, 205–231. 10.1521/pedi.1997.11.3.205
    1. Clark LA (2018). The Improving the Measurement of Personality Project (IMPP). Unpublished dataset. University of Notre Dame, Notre Dame, IN.
    1. Clark LA, Simms LJ, Wu Kevin, D., & Casillas A (2014). Schedule for Nonadaptive and Adaptive Personality-2nd Edition (SNAP-2): Manual for Administration, Scoring, and Interpretation. Notre Dame, IN: University of Notre Dame.
    1. Clark LA, & Watson DB (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309–319. doi: 10.1037/1040-3590.7.3.309
    1. Comrey AL (1988). Factor-analytic methods of scale development in personality and clinical psychology. Journal of Consulting and Clinical Psychology, 56, 754–761. doi: 10.1037/0022-006X.56.5.754
    1. Connelly BS, & Ones DS (2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122. doi: 10.1037/a0021212
    1. Costa PT, & McCrae RR (1992). Revised NEO Personality Inventory (NEO-PIR) and NEO Five Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.
    1. Cortina JM (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104. doi: 10.1037/0021-9010.78.1.98
    1. Credé M, Tynan MC, & Harms PD (2017). Much ado about grit: A meta-analytic synthesis of the grit literature. Journal of Personality and Social Psychology, 113(3), 492–511. 10.1037/pspp0000102
    1. Cronbach LJ (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334. doi: 10.1007/BF02310555
    1. Cronbach LJ, & Meehl PE. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. 10.1037/h0040957
    1. Dawes RM, Faust D, & Meehl PE (2002). Clinical versus actuarial judgment In Gilovich T, Griffin D & Kahneman D (Eds.), Heuristics and biases: The psychology of intuitive judgment; heuristics and biases: The psychology of intuitive judgment (pp. 716–729, Chapter xvi, 857 Pages) Cambridge University Press, New York, NY.
    1. De Los Reyes A, Augenstein TM, Wang M, Thomas SA, Drabick DAG, Burgers DE, & Rabinowitz J (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. doi: 10.1037/a0038498
    1. Dornbach-Bender A, Ruggero CJ, Waszczuk MA, Gamez W, Watson D, & Kotov R (2017). Mapping emotional disorders at the finest level: Convergent validity and joint structure based on alternative measures. Comprehensive Psychiatry, 79, 31–39.. 10.1016/j.comppsych.2017.06.011
    1. Duckworth AL, Peterson C, Matthews MD, & Kelly DR (2007). Grit: Perseverance and passion for long-term goals. Journal of Personality and Social Psychology, 92(6), 1087–1101. doi: 10.1037/0022-3514.92.6.1087
    1. Duckworth AL, & Quinn PD (2009). Development and validation of the short grit scale (GRIT–S). Journal of Personality Assessment, 91(2), 166–174. doi: 10.1080/00223890802634290
    1. Fabrigar LR, Wegener DT, MacCallum RC, & Strahan EJ (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299. doi: 10.1037/1082-989X.4.3.272
    1. Finch JF, & West SG (1997). The investigation of personality structure: Statistical models. Journal of Research in Personality, 31(4), 439–485. doi: 10.1006/jrpe.1997.2194
    1. Funder DC (2012). Accurate personality judgment. Current Directions in Psychological Science, 21(3), 177–182. doi: 10.1177/0963721412445309
    1. Geisinger KF (2003). Testing and assessment in cross-cultural psychology. In Graham JR, & Naglieri JA (Eds.), Handbook of psychology: Assessment psychology, vol. 10 (pp. 95–117, Chapter xix, 630 Pages) John Wiley & Sons Inc., Hoboken, NJ.
    1. Glenn JJ, Michel BD, Franklin JC, Hooley JM, & Nock MK (2014). Pain analgesia among adolescent self-injurers. Psychiatry Research, 220(3), 921–926. doi: 10.1016/j.psychres.2014.08.016
    1. Green BF Jr. (1978). In defense of measurement. American Psychologist, 33, 664–670.doi: 10.1037/0003-066X.33.7.664
    1. Green DP, Goldman SL, & Salovey P (1993). Measurement error masks bipolarity in affect ratings. Journal of Personality and Social Psychology, 64(6), 1029–1041. doi: 10.1037/0022-3514.64.6.1029
    1. Guadagnoli E, & Velicer WF (1988). Relation to sample size to the stability of component patterns. Psychological Bulletin, 103(2), 265–275. doi: 10.1037/0033-2909.103.2.265
    1. Haslam N, Holland E, & Kuppens P (2012). Categories versus dimensions in personality and psychopathology: A quantitative review of taxometric research. Psychological Medicine, 42, 903–920. doi: 10.1017/S0033291711001966
    1. Haynes SN, Richard DCS, & Kubany ES. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7(3), 238–247. 10.1037/1040-3590.7.3.238
    1. Hogan RT (1983). A socioanalytic theory of personality In Page M (Ed.), 1982 Nebraska Symposium on Motivation (pp. 55–89). Lincoln: University of Nebraska Press.
    1. Hu L, & Bentler PM (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. doi: 10.1080/10705519909540118
    1. Hu L, & Bentler PM (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3(4), 424–453. doi: 10.1037/1082-989X.3.4.424
    1. Hunsley J, & Meyer GJ (2003). The incremental validity of psychological testing and assessment: Conceptual, methodological, and statistical issues. Psychological Assessment, 15, 446–455. doi: 10.1037/1040-3590.15.4.446
    1. Kotov R, Perlman G, Gamez W, & Watson D (2015). The structure and short-term stability of the emotional disorders: A dimensional approach. Psychological Medicine, 45, 1687–1698. doi: 10.1017/S0033291714002815
    1. Krueger RF, Derringer J, Markon KE, Watson D, & Skodol AE. (2012). Initial construction of a maladaptive personality trait model and inventory for DSM-5. Psychological Medicine, 42(9), 1879–1890. 10.1017/S0033291711002674
    1. Lee K, Ogunfowora B, & Ashton MC (2005). Personality traits beyond the big five: Are they within the HEXACO space? Journal of Personality, 73(5), 1437–1463.doi: 10.1111/j.1467-6494.2005.00354.x
    1. Linde JA, Stringer DM, Simms LJ, & Clark LA (2013). The Schedule for Nonadaptive and Adaptive Personality Youth version (SNAP-Y): Psychometric properties and initial validation. Assessment, 20(4), 387–404. doi: 10.1177/1073191113489847
    1. Loevinger J (1954). The attenuation paradox in test theory. Psychological Bulletin, 51(5), 493–504. doi: 10.1037/h0058543
    1. Loevinger J (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694. doi: 10.2466/PR0.3.7.635-694
    1. Lowe JR, Edmundson M, & Widiger TA (2009). Assessment of dependency, agreeableness, and their relationship. Psychological Assessment, 21(4), 543–553. doi: 10.1037/a0016899
    1. MacCallum RC, Widaman KF, Zhang S, & Hong S (1999). Sample size in factor analysis. Psychological Methods, 4(1), 84–99. doi: 10.1037/1082-989X.4.1.84
    1. MacCann C, & Roberts RD (2010). Do time management, grit, and self-control relate to academic achievement independently of conscientiousness? In Hicks R (Ed.), Personality and individual differences: Current directions (pp. 79–90). Queensland, Australia: Australian Academic Press.
    1. Mansolf M, & Reise SP (2016). Exploratory bifactor analysis: The Schmid-Leiman orthogonalization and Jennrich-Bentler analytic rotations. Multivariate Behavioral Research, 51(5), 698–717. doi: 10.1080/00273171.2016.1215898
    1. Markon KE, Chmielewski M, & Miller CJ (2011a). The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review. Psychological Bulletin, 137(5), 856–879. doi: 10.1037/a0023678
    1. Markon KE, Chmielewski M, & Miller CJ (2011b). “The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review”: Correction to Markon et al. (2011). Psychological Bulletin, 137(6), 1–1093. doi: 10.1037/a0025727
    1. McCrae RR (2015). A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19(2), 97–112.doi: 10.1177/1088868314541857
    1. McCrae RR, Costa PT Jr., & Martin TA (2005). The NEO-PI-3: A more readable revised NEO personality inventory. Journal of Personality Assessment, 84(3), 261–270.doi: 10.1207/s15327752jpa8403_05
    1. McDade-Montez E, Watson D, O’Hara MW, & Denburg NL (2008). The effect of symptom visibility on informant reporting. Psychology and Aging, 23(4), 940–946.doi: 10.1037/a0014297
    1. Meehl PE (1945). The dynamics of “structured” personality tests. Journal of Clinical Psychology, 1, 296–303. (no doi)
    1. Meehl PE (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. doi: 10.1037/0022-006X.46.4.806
    1. Meehl PE & Golden RR (1982). Taxometric methods In Kendall PC & Butcher JN (Eds.). Handbook of research methods in clinical psychology (pp. 127–181). New York: Wiley.
    1. Messick S (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8.doi: 10.1111/j.1745-3992.1995.tb00881.x
    1. Nunnally JC (1978). Psychometric theory (2nd. ed.). New York: McGraw-Hill.
    1. Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, & Cella D (2011). Item banks for measuring emotional distress from the patient-reported outcomes measurement information system (PROMIS®): Depression, anxiety, and anger. Assessment, 18(3), 263–283. 10.1177/1073191111411667
    1. Putnam SP, Rothbart MK, & Gartstein MA (2008). Homotypic and heterotypic continuity of fine-grained temperament during infancy, toddlerhood, and early childhood. Infant and Child Development, 17(4), 387–405. doi: 10.1002/icd.582
    1. Reise SP, Ainsworth AT, & Haviland MG (2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101. doi: 10.1111/j.0963-7214.2005.00342.x
    1. Reise SP, & Waller NG (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5, 27–48. doi: 10.1146/annurev.clinpsy.032408.153553
    1. Rudick MM, Yam WH, & Simms LJ (2013). Comparing countdown- and IRT-based approaches to computerized adaptive personality testing. Psychological Assessment, 25(3), 769–779. doi: 10.1037/a0032541
    1. Russell DW (2002). In search of underlying dimensions: The use (and abuse) of factor analysis in personality and social psychology bulletin. Personality and Social Psychology Bulletin, 28(12), 1629–1646. doi: 10.1177/014616702237645
    1. SAS Institute, Inc. (2013). SAS/ STAT software: Version 9.4. Cary, NC: SAS Institute.
    1. Schmidt FL, Le H, & Ilies R (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individual-differences constructs. Psychological Methods, 8(2), 206–224. 10.1037/1082-989X.8.2.206
    1. Schwartz SJ, Benet-Martínez V, Knight GP, Unger JB, Zamboanga BL, Des Rosiers SE, . . . Szapocznik J (2014). Effects of language of assessment on the measurement of acculturation: Measurement equivalence and cultural frame switching. Psychological Assessment, 26(1), 100–114. doi: 10.1037/a0034717
    1. Simms LJ & Watson D (2007). The construct validation approach to personality scale construction In Robins RW, Fraley RC, & Krueger RF (Eds.), Handbook of research methods in personality psychology (pp. 240–258). New York: Guilford Press.
    1. Simms LJ, Zelazny K, Williams TF, & Bernstein L (2019). Does the number of response options matter? psychometric perspectives using personality questionnaire data Psychological Assessment, 31(4), 557–566. 10.1037/pas0000648
    1. Smith GT, & McCarthy DM (1995). Methodological considerations in the refinement of clinical assessment instruments. Psychological Assessment, 7, 300–308. doi: 10.1037/1040-3590.7.3.300
    1. Smith GT, McCarthy DM, & Anderson KG (2001). On the sins of short-form development. Psychological Assessment, 12, 102–111. doi: 10.1037/1040-3590.12.1.102
    1. Soto CJ, & John OP (2017). Short and extra-short forms of the Big Five Inventory–2: The BFI-2-S and BFI-2-XS. Journal of Research in Personality, 68, 69–81.doi: 10.1016/j.jrp.2017.02.004
    1. Spitzer RL, Forman JB, & Nee J (1979). DSM-III field trials: I. initial interrater diagnostic reliability. The American Journal of Psychiatry, 136(6), 815–817.doi: 10.1176/ajp.136.6.815
    1. Streiner DL (2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80(1), 99–103.doi: 10.1207/S15327752JPA8001_18
    1. Tackett JL, Lahey BB, van Hulle C, Waldman I, Krueger RF, & Rathouz PJ (2013). Common genetic influences on negative emotionality and a general psychopathology factor in childhood and adolescence. Journal of Abnormal Psychology, 122(4), 1142–1153. doi: 10.1037/a0034151
    1. Tellegen A, & Waller NG (2008). Exploring personality through test construction: Development of the multidimensional personality questionnaire In Boyle GJ, Matthews G & Saklofske DH (Eds.), The SAGE handbook of personality theory and assessment, vol. 2: Personality measurement and testing (pp. 261–292) Sage Publications, Inc., Thousand Oaks, CA. doi: 10.4135/9781849200479.n1
    1. Watson D, Clark LA, & Chmielewski M (2008). Structures of personality and their relevance to psychopathology: II. Further articulation of a comprehensive unified trait structure. Journal of Personality, 76(6), 1485–1522. 10.1111/j.1467-6494.2008.00531.x
    1. Watson D (2012). Objective tests as instruments of psychological theory and research In Cooper H (Ed.), Handbook of Research Methods in Psychology. Volume 1: Foundations, planning, measures, and psychometrics (pp. 349–369). Washington, DC: American Psychological Association.
    1. Watson D, & Clark LA (1992). On traits and temperament: General and specific factors of emotional experience and their relation to the five-factor model. Journal of Personality, 60(2), 441–476. doi: 10.1111/j.1467-6494.1992.tb00980.x
    1. Watson D, Clark LA, Chmielewski M, & Kotov R (2013). The value of suppressor effects in explicating the construct validity of symptom measures. Psychological Assessment, 25(3), 929–941. doi: 10.1037/a0032781
    1. Watson D, Clark LA, & Harkness AR (1994). Structures of personality and their relevance to psychopathology. Journal of Abnormal Psychology, 103, 18–31. doi: 10.1037/0021-843X.103.1.18
    1. Watson D, Clark LA, & Tellegen A (1984). Cross-cultural convergence in the structure of mood: A Japanese replication and a comparison with U. S. findings. Journal of Personality and Social Psychology, 47, 127–144. doi: 10.1037/0022-3514.47.1.127
    1. Watson D, Nus E, & Wu KD (2019). Development and validation of the faceted inventory of the five-factor model (FI-FFM). Assessment, 26(1), 17–44. 10.1177/1073191117711022
    1. Watson D, O’Hara MW, Simms LJ, Kotov R, Chmielewski M, McDade-Montez E, . . . Stuart S (2007). Development and validation of the inventory of depression and anxiety symptoms (IDAS). Psychological Assessment, 19(3), 253–268. doi: 10.1037/1040-3590.19.3.253
    1. Watson D, Stanton K, & Clark LA (2017). Self-report indicators of negative valence constructs within the research domain criteria (RDoC): A critical review. Journal of Affective Disorders, 216, 58–69. doi: 10.1016/j.jad.2016.09.065
    1. Watson D, Stasik SM, Ellickson-Larew S, & Stanton K (2015). Extraversion and psychopathology: A facet-level analysis. Journal of Abnormal Psychology, 124(2), 432–446. doi: 10.1037/abn0000051
    1. Watson D, Suls J, & Haig J (2002). Global self-esteem in relation to structural models of personality and affectivity. Journal of Personality and Social Psychology, 83(1), 185–197. doi: 10.1037/0022-3514.83.1.185
    1. Zimmermann J, & Wright AGC (2017). Beyond description in interpersonal construct validation: Methodological advances in the circumplex structural summary approach. Assessment, 24(1), 3–23. doi: 10.1177/1073191115621795
    1. Zimmerman M (1994). Diagnosing personality disorders: A review of issues and research models. Archives of General Psychiatry, 51(3), 225–245.doi: 10.1001/archpsyc.1994.03950030061006

Source: PubMed

3
S'abonner