DSM-5: How Reliable Is Reliable Enough?
2012; American Psychiatric Association; Volume: 169; Issue: 1 Linguagem: Inglês
10.1176/appi.ajp.2011.11010050
ISSN1535-7228
AutoresHelena C. Kraemer, David J. Kupfer, Diana E. Clarke, William E. Narrow, Darrel A. Regier,
Tópico(s)Musculoskeletal pain and rehabilitation
ResumoBack to table of contents Previous article Next article PerspectivesFull AccessDSM-5: How Reliable Is Reliable Enough?Helena Chmura Kraemer, Ph.D., David J. Kupfer, M.D., Diana E. Clarke, Ph.D., William E. Narrow, M.D., M.P.H., and Darrel A. Regier, M.D., M.P.H.Helena Chmura KraemerFrom Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va., Ph.D., David J. KupferFrom Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va., M.D., Diana E. ClarkeFrom Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va., Ph.D., William E. NarrowFrom Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va., M.D., M.P.H., and Darrel A. RegierFrom Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va., M.D., M.P.H.Published Online:1 Jan 2012https://doi.org/10.1176/appi.ajp.2011.11010050AboutSectionsPDF/EPUB ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail DSM-5 is being developed for clinical decision making to provide the greatest possible assurance that those with a particular disorder will have it correctly identified (sensitivity) and that those without it will not have it mistakenly identified (specificity). Clinical diagnoses differ from diagnoses for other purposes: they are not necessarily sensitive enough for epidemiological studies or specific enough for basic and clinical research. We previously commented in these pages (1) on the need for field trials. Our purpose here is to set out realistic expectations concerning that assessment.In setting those expectations, one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses. However, to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5. The evaluations primarily address reliability.A DSM-5 field trial at a large clinical site is designed to draw a sample that is representative of the site's patient population. These patients are to be evaluated independently by two clinicians, who are new to the patients, within an interval during which the presence or absence of the disorder is unlikely to have changed (i.e., between 4 hours and 2 weeks) to assess the test-retest reliability of the proposed diagnostic criteria. The clinicians are trained to use DSM-5 with training methods that would be available to any clinician. Reliability will be assessed using the intraclass kappa coefficient κI (2). For a categorical diagnosis with prevalence P, among subjects with an initial positive diagnosis, the probability of a second positive diagnosis is κI+P(1–κI), and among the remaining, it is P(1–κI). The difference between these probabilities is κI (3). Thus κI=0 means that the first diagnosis has no predictive value for a second diagnosis, and κI=1 means that the first diagnosis is perfectly predictive of a second diagnosis.Reliability is essentially a signal-to-noise ratio indicator. In diagnosis, there are two major sources of "noise": the inconsistency of expression of the diagnostic criteria by patients and the application of those criteria by the clinicians. It is all too easy to exaggerate reliability by removing some of that noise by design. Instead of a representative sample, as in DSM-5 field trials, one might select "case subjects" who are unequivocally symptomatic and "control subjects" who are unequivocally asymptomatic, omitting the ambiguous middle of the population for whom diagnostic errors are the most common and most costly. That approach would hide much of the patient-generated noise.Moreover, there are three major types of reliability assessments that can be used depending on which sources of "noise" are permitted by design to affect diagnosis. Intrarater reliability requires that the same rater be asked to "blindly" review the same patient material two or more times. Noise related both to patients and to raters is removed. Interrater reliability requires that two or more different raters review the same patient material. Now the noise related to clinicians is included, but the noise related to patients is removed. Test-retest reliability requires that the same patients be observed separately by two or more raters within an interval during which the clinical conditions of the patients are unlikely to have changed. Now the noise related to both patients and to clinicians is included, as it would be in clinical practice. Consequently, for any diagnosis, intrarater reliability will be greater than interrater reliability, which will in turn be greater than test-retest reliability. It is test-retest reliability that reflects the effect of the diagnosis on clinical decision making and that is the focus of the DSM-5 field trials.In addition, many reliability studies report "percentage agreement," which substantially exaggerates reliability and fails to take into account agreement by chance. If diagnoses are randomly assigned a positive diagnosis with probability P, percentage agreement always exceeds 50% and approaches 100% when P is either very large or very small. For example, when P=0.95, chance agreement would be 90%. The intraclass kappa is percentage agreement with chance agreement taken into account.It is unrealistic to expect that the quality of psychiatric diagnoses can be much greater than that of diagnoses in other areas of medicine, where diagnoses are largely based on evidence that can be directly observed. Psychiatric diagnoses continue to be based on inferences derived from patient self-reports or observations of patient behavior. Nevertheless, we propose that the standard of evaluation of the test-retest reliability of DSM-5 be consistent with what is known about the reliability of diagnoses in other areas of medicine. Intrarater reliability is almost never assessed for psychiatric diagnosis because it is difficult to ensure blinding of two diagnoses by the same clinician viewing, for example, the same diagnostic interview. However, where intrarater reliability has been assessed for standard medical diagnostic procedures, it is common to see intrarater kappa values between 0.6 and 0.8 (4, 5), but there are exceptions (e.g., 0.54 for assessment of hand films for osteoarthrosis [4]).Most medical reliability studies, including past DSM reliability studies, have been based on interrater reliability: two independent clinicians viewing, for example, the same X-ray or interview. While one occasionally sees interrater kappa values between 0.6 and 0.8, the more common range is between 0.4 and 0.6 (4, 5). For instance, in evaluating coronary angiograms, Detre et al. (6) reported that "the level of observer agreement for most angiographic items (of 15 evaluated) [was] found to be approximately midway between chance expectation and 100% agreement" (i.e., κI around 0.5).Examples in the medical literature of test-retest reliability are rare. The diagnosis of anemia based on conjunctival inspection was associated with kappa values between 0.36 and 0.60 (7), and the diagnosis of skin and soft-tissue infections was associated with kappa values between 0.39 and 0.43 (8). The test-retest reliability of various findings of bimanual pelvic examinations was associated with kappa values from 0.07 to 0.26 (9).From these results, to see a κI for a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κI between 0.6 and 0.8 would be cause for celebration. A realistic goal is κIbetween 0.4 and 0.6, while κI between 0.2 and 0.4 would be acceptable. We expect that the reliability (intraclass correlation coefficient) of DSM-5 dimensional measures will be larger, and we will aim for between 0.6 and 0.8 and accept between 0.4 and 0.6. The validity criteria in each case mirror those for reliability.Generally, the lower kappa values are likely to occur with the rarer diagnoses. Thus, for a diagnosis with prevalence 0.05 and κI=0.2, 24% of those with a positive first diagnosis and 4% of those with a negative first diagnosis will be positive on the second diagnosis (a risk ratio of 6.0). For a diagnosis with prevalence 0.5, our target would be closer to κI=0.5, in which 75% with a positive first diagnosis and 24% with a negative first diagnosis would be positive on the second diagnosis (a risk ratio of 3.0).The Lancet (10) once described the evaluation of medical diagnostic tests as "the backwoods of medical research," pointing out that many books and articles have been written on the methods of evaluation of medical treatments, but little attention has been paid to the evaluation of the quality of diagnoses. Only recently has there been attention to standards for assessing diagnostic quality (11–13). Yet the impact of diagnostic quality on the quality and costs of patient care is great. Many medical diagnoses go into common use without any evaluation, and many believe that the rates of reliability and validity of diagnoses in other areas of medicine are much higher than they are. Indeed, psychiatry is the exception in that we have paid considerable attention to the reliability of our diagnoses. It is important that our expectations of DSM-5 diagnoses be viewed in the context of what is known about the reliability and validity of diagnoses throughout medicine and not be set unrealistically high, exceeding the standards that pertain to the rest of medicine.From Stanford University, Palo Alto, Calif.; University of Pittsburgh School of Medicine, Pittsburgh; and American Psychiatric Institute for Research and Education, American Psychiatric Association, Arlington, Va.Address correspondence to Dr. Kupfer ([email protected]edu).Commentary accepted for publication May 2011.The authors report no financial relationships with commercial interests.References1. Kraemer HC , Kupfer DJ , Narrow WE , Clarke DE , Regier DA: Moving toward DSM-5: the field trials. Am J Psychiatry 2010; 167:1158–1160Link, Google Scholar2. Kraemer HC , Periyakoil VS , Noda A: Kappa coefficients in medical research. Stat Med 2002; 21:2109–2129Crossref, Medline, Google Scholar3. Kraemer HC: Measurement of reliability for categorical data in medical research. Stat Methods Med Res 1992; 1:183–199Crossref, Medline, Google Scholar4. Koran LM: The reliability of clinical methods, data, and judgments (first of two parts). N Engl J Med 1975; 293:642–646Crossref, Medline, Google Scholar5. Koran LM: The reliability of clinical methods, data, and judgments (second of two parts). N Engl J Med 1975; 293:695–701Crossref, Medline, Google Scholar6. Detre KM , Wright E , Murphy ML , Takaro T: Observer agreement in evaluating coronary angiograms. Circulation 1975; 52:979–986Crossref, Medline, Google Scholar7. Wallace DE , McGreal GT , O'Toole G , Holloway P , Wallace M , McDermott EW , Blake J: The influence of experience and specialization on the reliability of a common clinical sign. Ann R Coll Surg Engl 2000; 82:336–338Medline, Google Scholar8. Marin JR , Bilker W , Lautenbach E , Alpern ER: Reliability of clinical examinations for pediatric skin and soft-tissue infections. Pediatrics 2010; 126:925–930Crossref, Medline, Google Scholar9. Close RJ , Sachs CJ , Dyne PL: Reliability of bimanual pelvic examinations performed in emergency departments. West J Med 2001; 175:240–244Crossref, Medline, Google Scholar10. The value of diagnostic tests (editorial). Lancet 1979; 1:809–810Medline, Google Scholar11. Bossuyt PM , Reitsma JB , Bruns DE , Gatsonis CA , Glasziou PP , Irwig LM , Moher D , Rennie D , de Vet HC , Lijmer JG Standards for Reporting of Diagnostic Accuracy: The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med 2003; 138:W1–W12Crossref, Medline, Google Scholar12. Bossuyt PM , Reitsma JB , Bruns DE , Gatsonis CA , Glasziou PP , Irwig LM , Lijmer JG , Moher D , Rennie D , de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Intern Med 2003; 138:40–44Crossref, Medline, Google Scholar13. Meyer GJ: Guidelines for reporting information in studies of diagnostic test accuracy: the STARD initiative. J Pers Assess 2003; 81:191–193Crossref, Medline, Google Scholar FiguresReferencesCited byDetailsCited ByAcetabular cup position differs in spinopelvic mobility types: a prospective observational study of primary total hip arthroplasty patients11 October 2021 | Archives of Orthopaedic and Trauma Surgery, Vol. 142, No. 10Does Total Hip Arthroplasty Affect Spinopelvic and Spinal Alignment?30 March 2022 | Clinical Spine Surgery: A Spine Publication, Vol. 35, No. 8Validation of questionnaires for restless legs syndrome in the general population: the Trøndelag Health Study (HUNT)6 March 2022 | Journal of Sleep Research, Vol. 31, No. 5Usability research in educational technology: a state-of-the-art systematic review22 August 2022 | Educational technology research and development, Vol. 54A Three-Step Reliability Strategy Applied to Police-Worn Body Camera FootageHuman Behavior and Emerging Technologies, Vol. 2022Effect of Coronal and Sagittal Spinal Malalignment on Spinopelvic Mobility in Patients Undergoing Total Hip Replacement3 March 2022 | Clinical Spine Surgery: A Spine Publication, Vol. 35, No. 6A Systematic Review of Sleep–Wake Disorder Diagnostic Criteria Reliability Studies6 July 2022 | Biomedicines, Vol. 10, No. 7How Much Training Do We Need? Assessing the Validity and Interrater Reliability of the PDM-2's Psychodiagnostic Chart among Less Experienced Clinicians30 June 2022 | Journal of Personality Assessment, Vol. 93Brief battery of the Social Cognition Psychometric Evaluation study (BB-SCOPE): Development and validation in schizophrenia spectrum disordersJournal of Psychiatric Research, Vol. 150Predicting Personality and Psychological Distress Using Natural Language Processing: A Study Protocol7 April 2022 | Frontiers in Psychology, Vol. 13Getting "clean" from nonsuicidal self-injury: Experiences of addiction on the subreddit r/selfharmJournal of Behavioral Addictions, Vol. 15Total Hip Replacement Influences Spinopelvic Mobility: A Prospective Observational StudyThe Journal of Arthroplasty, Vol. 37, No. 2AUTOMATED RECOGNITION OF DEPRESSION FROM FEWER-SHOT LEANING IN RESTING-STATE fMRI WITH ReHo USING DEEP CONVOLUTIONAL NEURAL NETWORK25 October 2021 | Journal of Mechanics in Medicine and Biology, Vol. 21, No. 10Does obesity affect acetabular cup position, spinopelvic function and sagittal spinal alignment? A prospective investigation with standing and sitting assessment of primary hip arthroplasty patients26 October 2021 | Journal of Orthopaedic Surgery and Research, Vol. 16, No. 1American psychiatry in the new millennium: a critical appraisal23 June 2021 | Psychological Medicine, Vol. 51Validation of insomnia questionnaires in the general population: The Nord‐Trøndelag Health Study (HUNT)27 October 2020 | Journal of Sleep Research, Vol. 30, No. 1EEG Classification by Factoring in Sensor Spatial ConfigurationIEEE Access, Vol. 9Assessing psychotic symptoms in forensic evaluations of criminal responsibility – a pilot study using Positive And Negative Syndrome Scale31 May 2020 | The Journal of Forensic Psychiatry & Psychology, Vol. 31, No. 4Psychometric Properties of Persian Version of Structured Clinical Interview for DSM-5-Research Version (SCID-5-RV): A Diagnostic Accuracy Study29 June 2020 | Iranian Journal of Psychiatry and Behavioral Sciences, Vol. 14, No. 2Measuring mentalizing: A comparison of scoring methods for the hinting task9 May 2020 | International Journal of Methods in Psychiatric Research, Vol. 29, No. 2Diagnostiquer le trouble de stress post-traumatique chez l'enfant : le passage du DSM-IV-TR au DSM-5Psychologie Française, Vol. 65, No. 1The Need for a More Rigorous Approach to Diagnostic Reliability: Commentary on Categorical Assessment of Personality DisordersMeasurement Invariance and Informant Discrepancies of the KIDSCREEN-27 in Children with Mental Disorder11 December 2019 | Applied Research in Quality of Life, Vol. 30Psychometric evaluation of a screening question for persistent depressive disorder23 April 2019 | BMC Psychiatry, Vol. 19, No. 1The Aspirations for a Paradigm Shift in DSM-5Journal of Nervous & Mental Disease, Vol. 207, No. 9The Lancet Psychiatry, Vol. 6, No. 11IEEE Transactions on Affective Computing, Vol. 10, No. 4Journal of Child Psychology and Psychiatry, Vol. 60, No. 1PLOS ONE, Vol. 14, No. 7International Journal of Methods in Psychiatric Research, Vol. 27, No. 1Culturally Appropriate Assessment11 March 2018Comprehensive Psychiatry, Vol. 85Journal of Personality Assessment, Vol. 100, No. 6Schizophrenia Bulletin, Vol. 44, No. 4Psychometric Properties of a Structured Diagnostic Interview for DSM-5 Anxiety, Mood, and Obsessive-Compulsive and Related Disorders17 March 2016 | Assessment, Vol. 25, No. 1Journal of Medical Internet Research, Vol. 20, No. 3Three Approaches to Understanding and Classifying Mental Disorder: ICD-11, DSM-5 , and the National Institute of Mental Health's Research Domain Criteria (RDoC)6 December 2017 | Psychological Science in the Public Interest, Vol. 18, No. 2Expert and self-assessment of lifetime symptoms and diagnosis of major depressive disorder in large-scale genetic studies in the general populationPsychiatric Genetics, Vol. 27, No. 5The True North Strong and Free? Opportunities for Improving Canadian Mental Health Care and Education by Adopting the WHO's ICD-11 Classification30 June 2017 | The Canadian Journal of Psychiatry, Vol. 62, No. 10Aporia of power: On the crises, science, and internal dynamics of the mental health field15 June 2016 | European Journal for Philosophy of Science, Vol. 7, No. 2Archives of Sexual Behavior, Vol. 46, No. 3Culture, Communication, and DSM-5 Diagnostic ReliabilityJournal of the National Medical Association, Vol. 109, No. 3Medical Journal of Australia, Vol. 206, No. 2Agreement Among Categorical, Dimensional, and Impairment Criteria for ADHD and Common Comorbidities11 February 2013 | Journal of Attention Disorders, Vol. 20, No. 8DSM-5: Basics and Critics18 May 2016Conceptualizing Major DepressionSchizophrenia Bulletin, Vol. 42, No. 2Defining Treatment Response and Symptom Remission for Anxiety Disorders in Pediatric Autism Spectrum Disorders Using the Pediatric Anxiety Rating Scale2 June 2015 | Journal of Autism and Developmental Disorders, Vol. 45, No. 10Kappa Coefficient22 June 2015International Journal of Eating Disorders, Vol. 48, No. 5, Vol. 10Drug and Alcohol Dependence, Vol. 153Epidemiology and Psychiatric Sciences, Vol. 24, No. 5Journal of Child and Adolescent Psychopharmacology, Vol. 25, No. 9Bipolar Disorders, Vol. 17, No. 7Clinical Psychology: Science and Practice, Vol. 22, No. 1Developmental trauma disorder: An attachment-based perspective16 May 2014 | Clinical Child Psychology and Psychiatry, Vol. 19, No. 4Personality, Emotions, and the Emotional Disorders30 June 2014 | Clinical Psychological Science, Vol. 2, No. 4The Reliability of Clinical Diagnoses: State of the ArtAnnual Review of Clinical Psychology, Vol. 10, No. 1Depression and Anxiety, Vol. 31, No. 10Journal of Traumatic Stress, Vol. 27, No. 2International Journal of Methods in Psychiatric Research, Vol. 23, No. 2Journal of Personality Assessment, Vol. 96, No. 4Relation of symptom-induced impairment with other illness parameters in clinic-referred youth16 April 2013 | Journal of Child Psychology and Psychiatry, Vol. 54, No. 11Neuroimaging-Based Biomarkers in Psychiatry: Clinical Opportunities of a Paradigm Shift1 September 2013 | The Canadian Journal of Psychiatry, Vol. 58, No. 9Agreement on diagnoses of mental health problems between an online clinical assignment and a routine clinical assignment19 March 2013 | Journal of Telemedicine and Telecare, Vol. 19, No. 2DSM-5 Field Trials in the United States and Canada, Part I: Study Design, Sampling Strategy, Implementation, and Analytic ApproachesDiana E. Clarke, Ph.D., M.Sc., William E. Narrow, M.D., M.P.H., Darrel A. Regier, M.D., M.P.H., S. Janet Kuramoto, Ph.D., M.H.S., David J. Kupfer, M.D., Emily A. Kuhl, Ph.D., Lisa Greiner, M.S.S.A., and Helena C. Kraemer, Ph.D.1 January 2013 | American Journal of Psychiatry, Vol. 170, No. 1DSM-5 Field Trials in the United States and Canada, Part II: Test-Retest Reliability of Selected Categorical DiagnosesDarrel A. Regier, M.D., M.P.H., William E. Narrow, M.D., M.P.H., Diana E. Clarke, Ph.D., M.Sc., Helena C. Kraemer, Ph.D., S. Janet Kuramoto, Ph.D., M.H.S., Emily A. Kuhl, Ph.D., and David J. Kupfer, M.D.1 January 2013 | American Journal of Psychiatry, Vol. 170, No. 1DSM-5 Field Trials in the United States and Canada, Part III: Development and Reliability Testing of a Cross-Cutting Symptom Assessment for DSM-5William E. Narrow, M.D., M.P.H., Diana E. Clarke, Ph.D., M.Sc., S. Janet Kuramoto, Ph.D., M.H.S., Helena C. Kraemer, Ph.D., David J. Kupfer, M.D., Lisa Greiner, M.S.S.A., and Darrel A. Regier, M.D., M.P.H.1 January 2013 | American Journal of Psychiatry, Vol. 170, No. 1Journal of Abnormal Child Psychology, Vol. 41, No. 6Current Psychiatry Reports, Vol. 15, No. 11Psychological Injury and Law, Vol. 6, No. 4Contemporary Clinical Trials, Vol. 36, No. 2The American Journal of Geriatric Psychiatry, Vol. 21, No. 7Journal of Psychosomatic Research, Vol. 75, No. 3Medical Hypotheses, Vol. 81, No. 4Medical Anthropology, Vol. 32, No. 5BMC Medicine, Vol. 11, No. 1Journal of Personality Disorders, Vol. 27, No. 5DSM-5: what is new and what is next?Trends in Psychiatry and Psychotherapy, Vol. 35, No. 3Psychosocial adversity and mental illness: Differentiating distress, contextualizing diagnosisIndian Journal of Psychiatry, Vol. 55, No. 2DSM-5 and clinical trials in psychiatry: challenges to come?1 August 2012 | Nature Reviews Drug Discovery, Vol. 11, No. 8Child maltreatment-Clinical PTSD diagnosis not enough?!: Comment on Resick et al. (2012)15 May 2012 | Journal of Traumatic Stress, Vol. 25, No. 3PTSD: Constructs, diagnoses, disorders, syndromes, symptoms, and structure21 June 2012 | Journal of Traumatic Stress, Vol. 25, No. 3A Fatal Case of Adynamic Ileus Following Initiation of ClozapineSarah M. Fayad, M.D., and Dawn M. Bruijnzeel, M.D.1 May 2012 | American Journal of Psychiatry, Vol. 169, No. 5Social Phobia and Social Anxiety Disorder: Effect of Disorder Name on Recommendation for TreatmentLaura C. Bruce, M.A., Richard G. Heimberg, Ph.D., and Meredith E. Coles, Ph.D.1 May 2012 | American Journal of Psychiatry, Vol. 169, No. 5Standards for DSM-5 ReliabilityRobert L. Spitzer, M.D., Janet B.W. Williams, Ph.D., and Jean Endicott, Ph.D.1 May 2012 | American Journal of Psychiatry, Vol. 169, No. 5Response to Spitzer et al. LetterHelena Chmura Kraemer, Ph.D., David J. Kupfer, M.D., Diana E. Clarke, Ph.D., William E. Narrow, M.D., M.P.H., and Darrel A. Regier, M.D., M.P.H.1 May 2012 | American Journal of Psychiatry, Vol. 169, No. 5Opportunities for Bioinformatics in the Classification of Behavior and Psychiatric DisordersPsychiatric Clinics of North America, Vol. 35, No. 3Journal of Personality Assessment, Vol. 94, No. 5PLoS ONE, Vol. 7, No. 12 Volume 169Issue 1 January 2012Pages 13-15 Metrics PDF download History Accepted 1 May 2011 Published online 1 January 2012 Published in print 1 January 2012
Referência(s)