Chi‐square for model fit in confirmatory factor analysis
2020; Wiley; Volume: 76; Issue: 9 Linguagem: Inglês
10.1111/jan.14399
ISSN1365-2648
AutoresMousa Alavi, Denis Visentin, Deependra Kaji Thapa, Glenn E. Hunt, Roger Watson, Michelle Cleary,
Tópico(s)Mental Health Research Topics
ResumoConfirmatory factor analysis (CFA) aims to confirm a theoretical model using empirical data and is an element of the broader multivariate technique structural equation modelling (SEM; Alavi et al., 2020). CFA is commonly used across clinical research (Brown, 2015; Kääriäinen et al., 2011) including the development and psychometric evaluation of measurement instruments. The three main uses of CFA in psychometric evaluation studies are construct validity evaluation, response pattern comparison, and competing model comparison (Sun, 2005), with construct validity evaluation the most widely used CFA application. A fundamental characteristic of CFA is its hypothesis-driven approach (Brown, 2015). The researcher first establishes a hypothesis regarding the model structure expressed as particular factor(s) underlying a set of items. Analysis is then performed to determine how much of the covariance between the items would be captured by the hypothesized factor structure (Hooper, Coughlan, & Mullen, 2008). In addition to assessing the covariance captured by the model, evaluating the goodness of fit of the proposed model, which reflects how well the model fits the observed data, is a critical step in CFA (Hooper et al., 2008). Goodness-of fit is evaluated using a range of model fit indices, which assess the relationship between the observed data and the theoretical data which would be expected from the model. Model fit indices can be used with either thresholds or hypothesis testing to reject or retain the proposed model (Costa & Sarmento, 2019). There are several statistical software packages available to estimate model fit, which report a variety of fit indices (e.g. see Jöreskog & Sörbom, 1989). There are two types of model fit indices available for CFA; global and local fit indices (Brown, 2015; Kline, 2005). Global model fit indices measure the global recovery of empirical observations without considering the mean and covariance structure. Local fit indices examine model components including but not limited to factor correlations, inter-item residual covariance, and suggested model re-specification statistics. Global model fit indices fall into three categories; absolute, incremental (also known as comparative or relative), and parsimony fit indices (Hooper et al., 2008; Kline, 2005). Absolute fit indices assess the overall theoretical model against the observed data. They are generated from either a test statistic and/or model residuals, and assess overall fit to the covariance structure of the population. They assess how well the model fits the data compared with no model. In addition to the chi-square (χ2) statistic other examples of absolute fit indices are goodness-of-fit index (GFI), adjusted GFI, root mean square error of approximation (RMSEA), and root mean square residual and standardized root mean square residual (SRMR; Jöreskog & Sörbom, 1989; Steiger, 2007). Incremental fit indices compare a hypothesized model to a baseline or minimal model that specifies no relationships between the variables and contains only variances for observed variables. Hence, the baseline model represents the hypothesis of no meaningful relationships between variables. Incremental fit indices represent the improved fit for the model compared to the assumption of independence of variables. Examples are comparative fit index (CFI), normed-fit index (NFI), and non-normed fit index (Bentler, 1990; Bentler & Bonett, 1980). As parameters are added to a model, the model fit will improve. Parsimonious models with fewer parameters are preferred to complex models which should be taken into account when determining fit. Parsimonious fit indices aim to address this issue by adding a penalty for model complexity. This introduces a trade-off between model fit and degrees of freedom. Parsimony goodness-of-fit index (Mulaik et al., 1989) and the parsimony normed fit index (James, Mulaik, & Brett, 1982) are examples. The many fit indices available introduces issues in the complexity of reporting results and appropriate use of the range of fit indices. It is not expected that all indices would be used or reported in fitting any hypothesized model. The application of each fit index depends on the study purpose and characteristics of the fit indices. The chi-square fit index assesses the fit between the hypothesized model and data from a set of measurement items (the observed variables). The model chi-square is the chi-square statistic obtained using maximum likelihood method. When a model is estimated using maximum likelihood, the likelihood ratio test statistic is commonly used to assess the overall goodness of fit (Jöreskog, 1969; Maydeu-Olivares, Fairchild, & Hall, 2017). Assuming the hypothesized model is correctly specified, the likelihood ratio test statistic would approach a central chi-square distribution. The chi-square test is the most commonly used global fit index in CFA and is also used to generate other fit indices. It tests whether the covariance matrix derived from the model represents the population covariance. Generally, chi-square is used as an absolute fit index, with a low chi-square value relative to the degrees of freedom (and higher p-value) indicating better model fit. Since the test is used to reject a null hypothesis representing perfect fit, chi-square is often referred to as a 'badness of fit' or 'lack of fit index' (Kline, 2005). The chi-square statistic using a likelihood ratio test can also be used to assess nested models, where one model is a subset of an alternative model created by constraining some of the parameters. The difference in fit between the models is expressed as the difference in chi-square values for each model, which also has a chi-square distribution. For nested likelihood ratio tests the degrees of freedom are the added parameters for the less parsimonious model (Kline, 2005; Tomarken & Waller, 2003). As with all tests the assumptions of the chi-square model fit index must be met including multivariate normality of data, adequate sample size, no systematic missing data, and appropriate specification of the model. There are limitations with using the chi-square statistic as a model fit index. First, it is sensitive to sample size with larger sample sizes decreasing the p-value where there may only be a trivial misfit (Babyak & Green, 2010). Overemphasis on model chi-square may lead to a preference for smaller samples in which the null hypothesis is not rejected. This is more likely to accept poor models and may yield inaccurate or imprecise parameter estimations. Parameter estimates should be given consideration rather than merely model fit indices, as they often hold substantive clinical interest. Adequate sample size in CFA can be assessed several ways. Cut-offs include a minimum sample size of 200, a ratio of sample size to model variables ≥10 or a ratio of sample size to the number of model parameters ≥5 (Myers, Ahn, & Jin, 2011). It is important that the model assumptions of chi-square are assessed when using this fit index. The model statistic does not always follow a chi-square distribution particularly in cases where data are not multivariate normal or when the sample size is small. In addition, as with any statistical test it is often interpreted as a binary result, in this case a fit or no-fit decision resulting in the model being retained or rejected. Assessment of the test statistic itself, which indicates the degree to which a model is discrepant, should be preferred. The chi-square model fit is a non-parsimonious approach and hence the model fit improves as the model size increases (Schermelleh-Engel, Moosbrugger, & Müller, 2003). Increasing the number of parameters may provide unnecessarily complex models, which are more likely to be accepted than parsimonious ones. The complexity of the model needs to be considered when assessing model fit using chi-square. Given that the chi-square fit statistic is affected by large samples, the ratio of the chi-square statistic to the respective degrees of freedom (χ2/df) is preferred (Wheaton, Muthen, Alwin, & Summers, 1977). A ratio of ≤2 indicates a superior fit between the hypothesized model and the sample data (Cole, 1987). Nevertheless, the chi-square statistic can be useful when a CFA model fails to fit. It is common then to enter an exploratory phase which involves inspecting the modification indices of all the pairs of error terms and correlating those pairs with the largest indices until the model fits (Watson et al., 2013). As better fitting models are achieved, the fit indices would improve. The chi-square statistic should decrease but with large sample sizes it will most probably remain statistically significant. Considering the sensitivity of the chi-square statistic to sample size, a wide variety of other indices have been suggested to assess model adequacy. In practice, the chi-square test is "not always the final word in assessing fit" (West, Taylor, & Wu, 2012, p. 211). Kline (2005) suggests that at a minimum the following indices should be reported and assessed in combination: chi-square; RMSEA; CFI; and SRMSR. The use of multiple fit indices provides a more holistic view of goodness of fit, accounting for sample size, model complexity, and other considerations relevant to the particular study. No conflict of interest was declared by the authors in relation to the study itself. Note that Roger Watson is a JAN editor. All authors have agreed on the final version and meet at least one of the following criteria recommended by the ICMJE (http://www.icmje.org/recommendations/): Substantial contributions to conception and design, acquisition of data or analysis and interpretation of data; drafting the article or revising it critically for important intellectual content.
Referência(s)