Artigo Revisado por pares

Getting Entangled in the Nomological Net

2013; Hogrefe Verlag; Volume: 29; Issue: 3 Linguagem: Inglês

10.1027/1015-5759/a000173

ISSN

2151-2426

Autores

Matthias Ziegler, Tom Booth, Doreen Bensch,

Tópico(s)

Psychological Well-being and Life Satisfaction

Resumo

Free AccessEditorialGetting Entangled in the Nomological NetThoughts on Validity and Conceptual OverlapMatthias Ziegler, Tom Booth, and Doreen BenschMatthias Ziegler Humboldt-Universität zu Berlin, Germany , Tom Booth Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, The University of Edinburgh, UK , and Doreen Bensch Humboldt-Universität zu Berlin, Germany Published Online:June 20, 2013https://doi.org/10.1027/1015-5759/a000173PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinkedInReddit SectionsMorePsychological research relies heavily on tests and questionnaires to measure constructs and traits. Thus, tests and questionnaires are not only in high demand, but are constantly being developed anew. Likewise, researchers have suggested many new traits or constructs. Such suggestions then cause a wave of test and questionnaire development. This investment of time and research resources is necessary to ensure high-quality measurement tools that can be trusted by other researchers and practitioners as well to assess the intended trait or construct.For this reason, journals such as the European Journal of Psychological Assessment publish studies evaluating the psychometric properties of such new measurement tools. Typically, such evaluation studies include some estimate of reliability (Schweizer, 2011) and concentrate on demonstrating the validity of the score derived from the new measure. A look at the history of this journal reveals that the number of publications that apply some form of factor analysis has risen from around 28% in the 1990s to 40% since the year 2000 (Alonso-Arbiol & van de Vijver, 2010). From this one can assume that the factorial validity of the published measurement tools has been a central theme of published research. Factorial validity is of course an extremely important issue and provides information necessary to many scoring procedures. Since it is trait scores (in their various forms) which are most commonly used in applied studies, factorial validity should not be neglected. However, while a newly devised measurement tool may demonstrate factorial validity and produce reliable test scores, its utility in the field is far from assured. Construct validity-related evidence is still necessary to ensure that the new measure truly captures the trait it was intended to capture. Campbell and Fiske (1959) asserted this as follows: We believe that before one can test the relationships between a specific trait and other traits, one must have some confidence in one's measures of that trait. Such confidence can be supported by evidence of convergent and discriminant validation. (p. 100) Within this Editorial we would like to outline some problems related to construct validity and suggest some research lines to solve these. Schweizer (2012) already discussed problems with convergent validity at length in an editorial in this journal. Therefore, we will try to enlarge on the focus and include some additional issues we deem important as well.The Idea of Convergent and Discriminant Validity as Proposed by Campbell and Fiske (1959)Campbell and Fiske (1959) started their seminal paper by pointing out four aspects important to a validation process. They stated, first, that convergent validity necessarily requires independent measurement procedures, i.e., it is necessary to apply different measurement approaches (e.g., paper-pencil and observation). Second, besides convergent validity-related evidence, discriminant validity-related evidence is also required. Only then does a full picture of validity emerge. Third, each measure includes variance due to a trait and variance due to method. Without disentangling these different variance sources, validity estimates for a test score might be inflated. Finally, in order to achieve these goals, it is necessary to employ more than one method and to assess more than one trait. The approach they suggest – the multitrait-multimethod matrix (MTMM) – allows all of these aspects to be part of a single analysis. Such a matrix summarizes the correlations computed based on data for several traits, all assessed with the same methods. Importantly, it should be done with more than one method. Within the matrix Campbell and Fiske differentiate reliability diagonals, validity diagonals, heterotrait-monomethod triangles, and heterotrait-heteromethod triangles. Moreover, by specifying the relationships between validity diagonals and triangles as well as between the correlational patterns within the triangles, Campbell and Fiske defined what evidence is needed to speak of convergent and discriminant validity. Schweizer (2012) outlined some of the problems around this approach, which we do not need to repeat here.Two Implications From Campbell and Fiske's MTMM ApproachIssue 1: Selecting Traits for the Study of Convergent and Discriminant ValidityWe want to focus on two important issues. The first is important to discriminant validity-related evidence. In selecting discriminant traits, Campbell and Fiske emphasized the importance of providing a definition as well as positioning the discriminant trait within a nomological net of the trait to be measured by the new instrument. Such a framework provides the necessary depth of information to select appropriate discriminant traits. Surprisingly, validity studies often include correlations with numerous different operationalizations of the same trait, suggesting convergent validity-related evidence. When it comes to discriminant validity-related evidence, sometimes no clear underlying rationale for selecting exactly these traits becomes obvious, making it look arbitrary. However, as Campbell and Fiske already pointed out: When a dimension of personality is hypothesized, when a construct is proposed, the proponent invariably has in mind distinctions between the new dimension and other constructs already in use. One cannot define without implying distinctions, and the verification of these distinctions is an important part of the validational process. (p. 84) Following this statement and the demands for selecting discriminant traits, it seems necessary to again recall the requirement to clearly define the trait to be measured, embed it in a nomological net, and base the selection of discriminant traits on this network. This way, obtaining discriminant validity-related evidence is more difficult but more informative. It is necessary to show that a new measure assessing a trait can be distinguished from an existing measure, capturing a (closely) related trait. Nevertheless, these findings tell us a lot more about discriminant validity than do correlations with measures assessing very distant traits.Visualizing the Nomological Net of PersonalityPace and Brannick (2010) conducted a bare bones meta-analysis (corrections only for sampling error) for different Big Five questionnaires. The underlying assumption here was that all questionnaires should basically capture the same trait. Pace and Brannick concluded: Convergent validities were lower than expected, indicating substantial differences among tests. Such a result begs for an explanation of the differences among tests as well as a consideration of the implications of such differences for theory and practice. (p. 674) In fact, the largest overall convergent correlation was found for Extraversion at .56, whereas the estimated overall reliability of Extraversion measures was .83. Thus, even in the best case, about 50% of the instrument's reliable variance is not shared but rather unique to the specific questionnaires.What the Pace and Brannick (2010) study highlights is what has become known as the "jingle-jangle" fallacy, namely, that scales with the same name may measure different things, and that scales with a different name may measure the same thing. Here we demonstrate the utility of network diagrams (see Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, 2012) in representing cross-sectional association matrices to visualize the "jingle-jangle" within personality inventories and to highlight the challenge of selecting traits in discriminant and convergent validity studies.Figure 1 shows a network representation of the correlation matrix between 113 personality facet scale scores from the NEO-PI-R, HEXACO, 6FPQ, 16PF, MPQ, and JPI taken from the Eugene-Springfield Community Sample (Goldberg, 2005). Correlations are based on a sample of 459 participants for whom complete data were available. Within the figure, each facet scale score is a node (circle) and the magnitude of the correlations between them is depicted as an edge (line). The thickness of the edge represents the magnitude of the associations. For clarity, associations less than r = .35 have been suppressed. Figure 1. A network diagram of the correlations between 113 facet scales from the NEO-PI-R, HEXACO, 6FPQ, 16PF, MPQ and JPI derived from the Eugene-Springfield Community Sample (n = 459). The diagram was constructed using the qgraph package in R (Epskamp, Cramer, Waldrop, Schmittmann & Borsboom, 2012). The graph uses the "spring" option and produces a diagram in which the length of the edges is dependent on the weight (correlation) between nodes. This has the visual effect of drawing more closely associated nodes together in the graph layout.Marked in gray are three clusters and two pairs of facets that represent a series of situations with respect to the jingle-jangle fallacy and new test construction. First, consider the two pairs of scales on the right hand side of Figure 1. The two nodes labeled TR are the facet scales of Traditionalism from the MPQ and JPI. As might be expected from two scales that share a label, they are highly associated (r = .77). However, the two nodes labeled CR and IN are the Creativity facet of the HEXACO and the Innovation scale of the CPI, respectively. Despite being labeled differently, the pairwise association between these scales is nearly identical (r = .76) to that of the Traditionalism scales. Thus, the question is whether this correlation is evidence for convergent or discriminant validity?This same question arises when we consider both broader clusters of traits within the nomological net and traits that may be considered to be among the most highly researched in the field. For example, consider the cluster of nodes at the bottom center of Figure 1. Two nodes are the Social Boldness (SB; r = .70) facets of the 16PF and HEXACO. The remaining three nodes, which share equivalent associations (mean r = .72; range = .61 to .79) with the other nodes in the cluster, are the Exhibition facet of the 6FPQ (EX), Social Potency facet of the MPQ (SP), and the Social Confidence facet of the JPI (SC). If we regard the magnitude of associations between the two Social Boldness scales as being indicative of their convergent validity, then we have four different labels for the same construct within this single cluster. Next, consider the cluster of four nodes to the left of Figure 1, which represent the Anxiety (AX) facet from the NEO-PI-R, JPI, and HEXACO, and the Stress Reaction (SR) facet of the MPQ. The situation within this cluster is the same as that found for the sociability scales. The Anxiety scales have a mean correlation of .68, whereas the average correlation of the Stress Reaction scale with the three Anxiety scales is .72.Finally, consider the cluster of nodes at the top of Figure 1. Two are the Order (OR) facets of the 6FPQ and NEO-PI-R, two are the Perfectionism (PF) facets of the 16PF and HEXACO, and two are the Organization (OG) facets of the JPI and HEXACO. Of note here is that, while most nodes are quite highly related – something we may expect as they can be argued to all cluster under some (perhaps higher-order) Conscientiousness factor – the two Perfectionism scales have a notably different pattern of associations with other related scales, despite sharing a facet label. As such, when selecting a Perfectionism scale from an extant inventory to study the convergent or discriminant validity of a new measure, our choice of comparison Perfectionism scale may have profound implications for whether we consider our new scale to be distinct or not.A cursory glance at the rest of the network graph shown in Figure 1 highlights many other areas of local clustering not emphasized here. Thus, when researchers follow Campbell and Fiske's guidelines and select measures that purportedly capture the same trait to ascertain discriminant validity, they might be in for a surprise: The pattern of convergent and discriminant correlations may not be as expected. Test constructors have to be careful when selecting convergent measures and ensure the highest possible conceptual and statistical overlap. Again, this judgment requires a clearly defined construct embedded in a clearly defined nomological net. Network visualizations of the nomological net of personality facets may greatly aid such decisions during scale development.Possible Reasons for the Jingle-JangleReasons for the low convergent validities Pace and Brannick provided in their paper are item context (e.g., general context or work specific context), breadth of the instrument, and test family. The latter refers to the differences between instruments from the NEO family and those from the BFI family (see also Miller, Gaughan, Maples, & Price, 2011). However, the first as well as the second reason stated bear further implications for assessment-oriented research. It is a well-documented fact that changing the context of an item, for example, by adding "in school" changes, (mostly) improves test-criterion correlations. Reasons for this might be found within the ideas of Brunswik's lens model (see also Miller et al., 2011). More important here though is the question how this added piece of information might change construct validity of the measurement tool used. Thus, we need empirical research to investigate these effects.The second reason for the low convergent correlations was breadth of the measurement tool. It is no new insight that most traits can be described as being hierarchically organized: Below a rather abstract domain there are narrower facets. As before, there is evidence suggesting that such facets improve test-criterion correlations (Brunswik, 1955). However, for most traits there is no common agreement about the number and nature of such facets. Pace and Brannick stated: Recognition of the facets measured by tests may lead toward understanding similarities and differences among personality tests, and perhaps the nature of any differential prediction by tests. (p. 675)Issue 2: The Issue of Method VarianceThe second issue we want to raise with regard to Campbell and Fiske is method variance. Campbell and Fiske (1959) wrote: The interpretation of the validity diagonal in an absolute fashion requires the fortunate coincidence of both an independence of traits and an independence of methods, represented by zero values in the heterotrait-heteromethod triangles. ... In practice, perhaps all that can be hoped for is evidence for relative validity, that is, for common variance specific to a trait, above and beyond shared method variance. (p. 84) This pessimistic conclusion can be mitigated today. There are different methodological approaches to modeling all kinds of method effects (e.g., Eid, Lischetzke, Nussbeck, & Trierweiler, 2003; Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). Despite these new modeling techniques, Campbell and Fiske's remark should call our attention to the fact that we still do not know enough about the nature of method variance. Oftentimes method variance is perceived as variance due to the administration mode (e.g., paper-pencil). However, method variance could also be social desirability (Ziegler & Bühner, 2009), response sets or styles (Wetzel, Carstensen, & Böhnke, 2013), or acquiescence (Rammstedt & Kemper, 2011), to name just three examples. All of these terms are well known. However, with the possible exception of social desirability (Paulhus, 2002; Ziegler, MacCann, & Roberts, 2011), elaborated theories of such method variance producing phenomena is scarce. Thus, researchers with an interest in psychological assessment should strengthen their efforts to shed light on method variance producing phenomena like social desirability, response sets and styles, or acquiescence.ConclusionIn this Editorial we wanted to raise awareness concerning some of the problems we believe to have been identified regarding convergent and discriminant validation efforts. Summarizing the thoughts outlined above, let us stress three aspects that papers reporting validation studies should follow: (1) The trait to be measured should be clearly defined and embedded within a nomological network. (2) Besides convergent validity, discriminant validity is important in order to gain a more complete picture of the validity of an instrument. To this end, different traits have to be assessed with different methods. The nomological network should guide the selection of the discriminant trait(s). (3) Effects of method variance should be modeled.Moreover, this Editorial also suggests the need for more research in the areas of method variance producing phenomena (e.g., acquiescence, response sets and styles, and social desirability), effects of item context (e.g., items specifically phrased for school or work context), and the facet structure underlying and defining broad domains.We want to end this Editorial with a quote from Campbell and Fiske (1959), which in our opinion is as true today as it was in 1959: The test constructor is asked to generate from his literary conception or private construct not one operational embodiment, but two or more, each as different in research vehicle as possible. Furthermore, he is asked to make explicit the distinction between his new variable and other variables, distinctions which are almost certainly implied in his literary definition. In his very first validational efforts, before he ever rushes into print, he is asked to apply the several methods and several traits jointly. His literary definition, his conception, is now best represented in what his independent measures of the trait hold distinctively in common. (p. 101)References Alonso-Arbiol, I. , & van de Vijver, F. J. R. (2010). A historical analysis of the European Journal of Psychological Assessment, 26, 238–247. doi 10.1027/1015-5759/a000032 First citation in articleLink, Google Scholar Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193–217. First citation in articleCrossref, Google Scholar Campbell, D. T. , & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. First citation in articleCrossref, Google Scholar Eid, M. , Lischetzke, T. , Nussbeck, F. W. , Trierweiler, L. I. (2003). Separating trait effects from trait-specific method effects in multitrait-multimethod models: A multiple-indicator CT-C(M-1) model. Psychological Methods, 8, 38–60. doi 10.1037/1082-989x.8.1.38 First citation in articleCrossref, Google Scholar Epskamp, S. , Cramer, A. O. J. , Waldorp, L. J. , Schmittmann, V. D. , Borsboom, D. (2012). Qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48, 1–18. First citation in articleCrossref, Google Scholar Goldberg, L. R. (2005). The Eugene-Springfield community sample: Information available from the research participants. (Vol. 45, 1, Technical Report). Eugene, OR: Oregon Research Institute. First citation in articleGoogle Scholar Miller, J. D. , Gaughan, E. T. , Maples, J. , Price, J. (2011). A comparison of agreeableness scores from the Big Five Inventory and the NEO PI-R: Consequences for the study of narcissism and psychopathy. Assessment, 18, 335–339. doi 10.1177/1073191111411671 First citation in articleCrossref, Google Scholar Pace, V. L. , Brannick, M. T. (2010). How similar are personality scales of the "same" construct? A meta-analytic investigation. Personality and Individual Differences, 49(7), 669–676. doi 10.1016/j.paid.2010.06.014 First citation in articleCrossref, Google Scholar Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, D. N. Jackson, D. E. Wiley, (Eds.), The role of constructs in psychological and educational measurement (pp. 49–69). Mahwah, NJ: Erlbaum. First citation in articleGoogle Scholar Podsakoff, P. M. , MacKenzie, S. B. , Lee, J. Y. , Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903. First citation in articleCrossref, Google Scholar Rammstedt, B. , & Kemper, C. J. (2011). Measurement equivalence of the Big Five: Shedding further light on potential causes of the educational bias. Journal of Research in Personality, 45, 121–125. First citation in articleCrossref, Google Scholar Schweizer, K. (2011). On the changing role of Cronbach's α in the evaluation of the quality of a measure. European Journal of Psychological Assessment, 27, 143–144. doi 10.1027/1015-5759/a000069 First citation in articleLink, Google Scholar Schweizer, K. (2012). On issues of validity and especially on the misery of convergent validity. European Journal of Psychological Assessment, 28, 249–254. doi 10.1027/1015-5759/a000156 First citation in articleLink, Google Scholar Wetzel, E. , Carstensen, C. H. , Böhnke, J. R. (2013). Consistency of extreme response style and nonextreme response style across traits. Journal of Research in Personality, 47, 178–189. doi 10.1016/j.jrp.2012.10.010 First citation in articleCrossref, Google Scholar Ziegler, M. , Bühner, M. (2009). Modeling socially desirable responding and its effects. Educational and Psychological Measurement, 69, 548–565. First citation in articleCrossref, Google Scholar Ziegler, M. , MacCann, C. , Roberts, R. (Eds.). (2011). New perspectives on faking in personality assessments. New York: Oxford University Press. First citation in articleCrossref, Google ScholarMatthias Ziegler, Institut für Psychologie, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany, +49 30 2093-9447, +49 30 2093-9361, zieglema@hu-berlin.deTom Booth, Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, The University of Edinburgh, Edinburgh EH8 9AD, United Kingdom, +44 131 650-8405, tom.booth@ed.ac.ukDoreen Bensch, Institut für Psychologie, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin, Germany, +49 30 2093 9447, +49 30 2093 9361, benschd@cms.hu-berlin.deFiguresReferencesRelatedDetailsCited byMeasuring Evidence-Based Viral Respiratory Illness Mitigation Behaviors in Pregnant Populations: Development and Validation of a Short, Single-Factor Scale During the COVID-19 Pandemic2 May 2022 | Disaster Medicine and Public Health Preparedness, Vol. 17Construct validity of questionnaires for the original and revised reinforcement sensitivity theory21 November 2022 | Frontiers in Psychology, Vol. 13Sexuality Assessment of the Brazilian Population: An Integrative Review of the Available Instruments23 February 2022 | Journal of Sex & Marital Therapy, Vol. 14Assessing the Nomological Network of the South African Personality Inventory With Psychological Traits11 October 2021 | Frontiers in Psychology, Vol. 12Development and initial validation of the Perfectionistic Climate Questionnaire-Sport (PCQ-S)Psychology of Sport and Exercise, Vol. 56Psychological Test Adaptation and Development – How Papers Are Structured and WhyMatthias Ziegler25 June 2020 | Psychological Test Adaptation and Development, Vol. 1, No. 1New and Continuing Developments in the Assessment of Personality Disorders: Commentary on Methods and Current Issues in Dimensional Assessments of Personality PathologyUncovering the Motivational Core of Traits: The Case of Conscientiousness4 February 2020 | European Journal of Personality, Vol. 2010Discriminant Validity22 April 2020Nomological Nets22 April 2020Investigating Construct Validity of the Cyber-Peer Experiences QuestionnaireKendall D. Moore and Amanda J. Fairchild28 March 2018 | European Journal of Psychological Assessment, Vol. 35, No. 6Methods for questionnaire design: a taxonomy linking procedures to test goals18 May 2019 | Quality of Life Research, Vol. 28, No. 9Nomological consistency: A comprehensive test of the equivalence of different trait indicators for the same constructs15 September 2018 | Journal of Personality, Vol. 87, No. 3Experimental Test Validation Examining the Path From Test Elements to Test PerformanceStefan Krumm, Joachim Hüffmeier, and Filip Lievens10 March 2017 | European Journal of Psychological Assessment, Vol. 35, No. 2Perceived Mutual Understanding (PMU) Development and Initial Testing of a German Short Scale for Perceptual Team CognitionMichael J. Burtscher and Jeannette Oostlander7 October 2016 | European Journal of Psychological Assessment, Vol. 35, No. 1Situational perception and affect: Barking up the wrong tree?Personality and Individual Differences, Vol. 136Psychometric Validity16 February 2018Facet Benchmarking: Advanced application of a new instrument refinement methodPersonality and Individual Differences, Vol. 120Discriminant Validity5 September 2017Nomological Nets19 January 2017Prediction Consistency: A Test of the Equivalence Assumption across Different Indicators of the Same Construct1 November 2016 | European Journal of Personality, Vol. 30, No. 6Network Analysis29 April 201650 Facets of a Trait – 50 Ways to Mess Up?Matthias Ziegler and Martin Bäckström6 June 2016 | European Journal of Psychological Assessment, Vol. 32, No. 2Test Standards and Psychometric Modeling2 April 2016Journal of Research in Personality, Vol. 65Testing the Unidimensionality of Items Pitfalls and LoopholesMatthias Ziegler and Dirk Hagemann14 December 2015 | European Journal of Psychological Assessment, Vol. 31, No. 4The Issue of Fuzzy Concepts in Test Construction and Possible Remedies Matthias Ziegler , Christoph J. Kemper , and Timo Lenzner 4 February 2015 | European Journal of Psychological Assessment, Vol. 31, No. 1Discovering the Second Side of the Coin Integrating Situational Perception into Psychological AssessmentMatthias Ziegler and Kai Horstmann16 June 2015 | European Journal of Psychological Assessment, Vol. 31, No. 2State of the aRt personality research: A tutorial on network analysis of personality data in RJournal of Research in Personality, Vol. 54Stop and State Your Intentions! Let's Not Forget the ABC of Test Construction Matthias Ziegler 7 November 2014 | European Journal of Psychological Assessment, Vol. 30, No. 4Does the Length of a Questionnaire Matter? Expected and Unexpected Answers From Generalizability Theory Matthias Ziegler , Arthur Poropat , and Julija Mell 1 January 2014 | Journal of Individual Differences, Vol. 35, No. 4News Within the European Journal of Psychological AssessmentNever Change a Running System, or Change it?Matthias Ziegler and Doreen Bensch1 January 2013 | European Journal of Psychological Assessment, Vol. 29, No. 4 Volume 29Issue 3September 2013ISSN: 1015-5759eISSN: 2151-2426 InformationEuropean Journal of Psychological Assessment (2013), 29, pp. 157-161 https://doi.org/10.1027/1015-5759/a000173.© 2013Hogrefe PublishingPDF download

Referência(s)