Misleading Negative Chest Radiographs: Should We ADHERE to the Conclusions?

Carta Revisado por pares

Misleading Negative Chest Radiographs: Should We ADHERE to the Conclusions?

2005; Elsevier BV; Volume: 47; Issue: 1 Linguagem: Inglês

10.1016/j.annemergmed.2005.11.030

ISSN

1097-6760

Autores

Richelle J. Cooper,

Tópico(s)

Ultrasound in Clinical Applications

Resumo

In this issue, Collins et al present data from the Acute Decompensated Heart Failure Registry (ADHERE) and suggest that the initial emergency department (ED) chest radiograph may be insensitive to predict a hospital discharge diagnosis of acutely decompensated heart failure.1Collins S. Lindsell C.J. Storrow A.B. et al.Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure.Ann Emerg Med. 2006; 47: 13-18Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar The assertion that the chest radiograph is not that sensitive (for many disorders) is not new; however, the idea that 1 in 5 congestive heart failure patients has a false-negative chest radiograph in the ED seems inconsistent with clinical practice. If the chest radiograph does not show signs of heart failure, are we really missing clinically important cases? The face validity of these conclusions seems incongruent with our practice. Either the results are correct and we have been wrong in a number of cases for many years, or the data and assumptions used by Collins et al to infer these conclusions are invalid. I will make an argument for why the latter is likely true.Limitations of Registry Databases and ADHEREThe number of patient registry databases and literature based on predominantly convenience sample case series (such as this one) has grown in recent years. Proponents of registries argue that we can obtain more information about actual practice and patient outcomes (effectiveness) as opposed to the efficacy reported from highly structured randomized controlled trials. However, effectiveness can only be measured if there is no selection bias and if the care delivered at the centers that participate in these registries represents typical practice. It is not evident that either of these is true.2Armstrong D. Kline-Rogers E. Jani S.M. et al.Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome.Arch Intern Med. 2005; 165: 1125-1129Crossref PubMed Scopus (79) Google Scholar, 3Tu J.V. Willison D.J. Silver F.L. et al.for the Investigators in the Registry of the Canadian Stroke RegistryImpracticability of informed consent in the Registry of the Canadian Stroke Network.N Engl J Med. 2004; 350: 1414-1421Crossref PubMed Scopus (347) Google Scholar In fact, there are many features inherent in registries, including ADHERE, either by design or convenience, that threaten the internal validity of analyses.Medical record review, a common method of data collection in registries, as well as other research, is best when performed with standardized methods.4Gilbert E.H. Lowenstein S.R. Koziol-McLain J. et al.Chart reviews in emergency medicine research: where are the methods?.Ann Emerg Med. 1996; 27: 305-308Abstract Full Text Full Text PDF PubMed Scopus (937) Google Scholar, 5Schwartz R.J. Panacek E.A. Basics of research, part 7: archival data research.Air Med J. 1996; 15: 119-124Abstract Full Text PDF PubMed Scopus (11) Google Scholar The ADHERE design includes good medical record review methods, but the retrospectively collected data are still limited to the quality and accuracy of information recorded in the medical chart.6Lowenstein S.R. Medical record reviews in emergency medicine: the blessing and the curse.Ann Emerg Med. 2005; 45: 452-455Abstract Full Text Full Text PDF PubMed Scopus (48) Google ScholarA registry's potential benefits often relate to its large, multicenter patient recruitment, but the large sample also leads to a greater potential for misinterpretation of significance. Just as a study that is underpowered may fail to recognize a true difference, when a sample is large, the chance that a difference, whether meaningful or not, will be found during analyses is much greater. With more than 100,000 hospital encounters in ADHERE,7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google Scholar it is easy to find significant statistical differences or to report estimates which appear to be very precise with narrow confidence intervals. However, any statistical test or confidence interval calculation assumes perfect data, without any bias in the data collection or analysis.8Maclure M. Schneeweis S. Causation of bias: the episcope.Epidemiology. 2001; 12: 114-122Crossref PubMed Scopus (95) Google Scholar, 9Schriger D. Problems with current methods of data analysis and reporting, and suggestions for moving beyond incorrect ritual.Eur J Emerg Med. 2002; 9: 203-207Crossref PubMed Scopus (18) Google Scholar Because this is not true in many registries, the potential to misinterpret the veracity and even the importance of a difference noted is much greater. Research based on data from trial registries should be interpreted with caution and should not be confused with prospective, observational trials. Registry database studies are usually best as descriptive reports that help provide insights into developing a hypothesis for prospective research.A registry is only representative of the patients enrolled. Although that statement seems obvious, simple, and even intuitive, the consequences of bias in the selection of patients for the registry are frequently not considered in analyses or conclusions. Research costs time and money, and it may therefore be reasonable or even necessary to enroll a convenience sample. Selection bias associated with convenience sampling can be minimized if every potential case has an equal probability of being selected or consecutive cases are enrolled. That is not always the case, and in more than 1 registry (including ADHERE, in which participating centers only need to submit a monthly quota of selected cases), there is no assurance of consecutive or random sampling. Selection bias makes it impossible to ensure accurate population estimates. In addition, ADHERE allows repeated enrollment of the same patient (without any means to account for this in analysis) that further threatens the accuracy of the population estimates of patient demographics and outcomes.7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google ScholarLimitations to the Analysis of Test Characteristics of the ED Chest RadiographIf we ignore the general registry problems and biases and pretend ADHERE data were perfect, we must accept 2 more key assumptions to believe the results of the Collins et al analysis of ADHERE: that the criterion standard is correct and that there is no selection bias with regard to ED cases. The validity of these assumptions is dubious.The diagnosis of decompensated heart failure is the criterion standard in the Collins et al analysis. What does it mean that the patient has decompensated heart failure, and what is the best way to extrapolate this to a criterion standard for the purposes of diagnostic research? Because ADHERE data are based on medical record review, we do not know how the clinicians decided the patients' diagnosis, whether there was a standard evaluation of all patients, or even how the definition may have varied at each participating center. This lack of a criterion standard is a fundamental issue that produces bias in diagnostic research that is hard to adjust for.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar, 11Knottnerus J.A. van Weel C. Muris J.W.M. Evidence base of clinical diagnosis: evaluation of diagnostic procedures.BMJ. 2002; 324: 477-480Crossref PubMed Scopus (289) Google ScholarIn practice, the diagnosis of heart failure is usually based on clinical criteria. Echocardiography and radionucleotide and other ancillary laboratory and radiographic tests may be used to assess cardiac function in patients who present with symptoms of heart failure, but how they were used in ADHERE patients is unknown. Presumably, some diagnoses were made based on clinical criteria, and in other cases ancillary tests suggesting cardiac dysfunction influenced the final discharge diagnosis. If the criterion standard (diagnosis of heart failure at hospital discharge) was based on the results of ancillary and functional tests, then only those patients for whom this evaluation was performed would be identified (verification bias), and the relevance to ED practice is uncertain.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar If the diagnosis of congestive heart failure is in part determined by the chest radiograph (incorporation bias), then it is not valid to evaluate the sensitivity of the imaging study for the diagnosis. This type of circular reasoning, in which the chest radiograph is used in practice to establish the discharge diagnosis of congestive heart failure and then later analyzed for its accuracy to detect the same criterion standard (discharge diagnosis), results in overly optimistic estimates of sensitivity. Ultimately, the accuracy of the discharge diagnosis in ADHERE cannot be confirmed.The evaluation of false-negative chest radiographs not only assumes the discharge diagnosis of congestive heart failure (the criterion standard) is correct but also that the ED diagnosis is incorrect, an assumption not supported by any data. The authors' analysis fails to consider the possibility that a change in the patient's disease occurred during the hospitalization. The ED and the hospital discharge diagnosis may be different, and both may be correct. The patients who were “missed” in the ED were often diagnosed with a condition that may develop into heart failure as a complication of the underlying disease process (eg, arrhythmias or myocardial ischemia). It is likely that in some cases in which the ED diagnosis does not match the discharge diagnosis, the heart failure was not present during the ED evaluation, resulting in the misclassification of ED radiograph interpretations as false-negative.In ADHERE, the hospital discharge diagnosis of congestive heart failure is not just the criterion standard but is simultaneously the means to identify the convenience sample of enrolled cases. This methodology creates an additional selection bias pertinent to the Collins et al research questions because ADHERE does not include all ED patients with congestive heart failure. ED patients discharged home after ED treatment or those identified with heart failure in the ED but without verification of their disease or change in diagnosis by the admitting physician are not included. Thus, not only does ADHERE not capture a representative sample of the hospitals' discharged heart failure patients but also the registry does not capture the EDs' population of heart failure patients. If the study involves a selected sample, do the results really apply to our patients and our practice? The test is not assessed in the patients (all ED patients with heart failure) for whom we will use the results.12Jaeschke R. Guyatt G. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 389-391Crossref PubMed Scopus (1022) Google Scholar, 13Jaeschke R. Guyatt G.H. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 703-707Crossref PubMed Scopus (2133) Google Scholar, 14[No authors listed]. How to read clinical journals, II: to learn about a diagnostic test. Can Med Assoc J. 1981;124:703-720.Google Scholar Failure to include all ED patients with an ED diagnosis of congestive heart failure makes the accuracy of the results sought in the Collins et al research question (the sensitivity and false negative rate of the ED chest radiograph to detect the hospital discharge diagnosis congestive heart failure) and the premise of their article (the prevalence of negative chest radiographs in ED patients with congestive heart failure) unknowable. To accurately answer these questions, we need an unbiased sample of all ED patients across the spectrum of the disease.It is not clear what information we will be able to learn from the ADHERE database. It can provide some information about the care and outcomes of the selected patients identified at hospital discharge with decompensated heart failure. The prevalence of negative ED chest radiographs cannot be accurately determined with this database. Collins et al1Collins S. Lindsell C.J. Storrow A.B. et al.Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure.Ann Emerg Med. 2006; 47: 13-18Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar dutifully record many of their study's limitations but then offer a conclusion that is predicated on the assumption that none of the limitations matter.15Schriger D.L. Suggestions for improving the reporting of clinical research: the role of narrative.Ann Emerg Med. 2005; 45: 437-443Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar If we made different assumptions based on the spectrum of patients treated in the ED and different assumptions about the accuracy of the diagnosis based on differential evaluation, the true number of patients “missed” based on their initial ED radiograph likely would not be 1 in 5 but would be far more infrequent.So what does the emergency physician do with this information? The initial ED chest radiography might be insensitive to the discharge diagnosis in a convenience sample of patients admitted to the hospital. However, the radiograph is a simple test that provides useful clinical information. It not only helps in defining the disease in patients with clinical signs of heart failure but also can reveal complicating features and detect other etiologies associated with a vague or unclear presentation. Although it is important to consider the diagnosis of heart failure even if there are no classic radiograph signs, the alternative diagnoses that provide chest discomfort and dyspnea or hypoxia are equally important. As with any test, one needs to consider whether a test's result will change patient outcomes before ordering it. Until there is better research to suggest differently, I will continue to order a simple chest radiograph and not feel angst that I am missing clinically important cases of heart failure because the radiograph is “negative.” In this issue, Collins et al present data from the Acute Decompensated Heart Failure Registry (ADHERE) and suggest that the initial emergency department (ED) chest radiograph may be insensitive to predict a hospital discharge diagnosis of acutely decompensated heart failure.1Collins S. Lindsell C.J. Storrow A.B. et al.Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure.Ann Emerg Med. 2006; 47: 13-18Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar The assertion that the chest radiograph is not that sensitive (for many disorders) is not new; however, the idea that 1 in 5 congestive heart failure patients has a false-negative chest radiograph in the ED seems inconsistent with clinical practice. If the chest radiograph does not show signs of heart failure, are we really missing clinically important cases? The face validity of these conclusions seems incongruent with our practice. Either the results are correct and we have been wrong in a number of cases for many years, or the data and assumptions used by Collins et al to infer these conclusions are invalid. I will make an argument for why the latter is likely true. Limitations of Registry Databases and ADHEREThe number of patient registry databases and literature based on predominantly convenience sample case series (such as this one) has grown in recent years. Proponents of registries argue that we can obtain more information about actual practice and patient outcomes (effectiveness) as opposed to the efficacy reported from highly structured randomized controlled trials. However, effectiveness can only be measured if there is no selection bias and if the care delivered at the centers that participate in these registries represents typical practice. It is not evident that either of these is true.2Armstrong D. Kline-Rogers E. Jani S.M. et al.Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome.Arch Intern Med. 2005; 165: 1125-1129Crossref PubMed Scopus (79) Google Scholar, 3Tu J.V. Willison D.J. Silver F.L. et al.for the Investigators in the Registry of the Canadian Stroke RegistryImpracticability of informed consent in the Registry of the Canadian Stroke Network.N Engl J Med. 2004; 350: 1414-1421Crossref PubMed Scopus (347) Google Scholar In fact, there are many features inherent in registries, including ADHERE, either by design or convenience, that threaten the internal validity of analyses.Medical record review, a common method of data collection in registries, as well as other research, is best when performed with standardized methods.4Gilbert E.H. Lowenstein S.R. Koziol-McLain J. et al.Chart reviews in emergency medicine research: where are the methods?.Ann Emerg Med. 1996; 27: 305-308Abstract Full Text Full Text PDF PubMed Scopus (937) Google Scholar, 5Schwartz R.J. Panacek E.A. Basics of research, part 7: archival data research.Air Med J. 1996; 15: 119-124Abstract Full Text PDF PubMed Scopus (11) Google Scholar The ADHERE design includes good medical record review methods, but the retrospectively collected data are still limited to the quality and accuracy of information recorded in the medical chart.6Lowenstein S.R. Medical record reviews in emergency medicine: the blessing and the curse.Ann Emerg Med. 2005; 45: 452-455Abstract Full Text Full Text PDF PubMed Scopus (48) Google ScholarA registry's potential benefits often relate to its large, multicenter patient recruitment, but the large sample also leads to a greater potential for misinterpretation of significance. Just as a study that is underpowered may fail to recognize a true difference, when a sample is large, the chance that a difference, whether meaningful or not, will be found during analyses is much greater. With more than 100,000 hospital encounters in ADHERE,7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google Scholar it is easy to find significant statistical differences or to report estimates which appear to be very precise with narrow confidence intervals. However, any statistical test or confidence interval calculation assumes perfect data, without any bias in the data collection or analysis.8Maclure M. Schneeweis S. Causation of bias: the episcope.Epidemiology. 2001; 12: 114-122Crossref PubMed Scopus (95) Google Scholar, 9Schriger D. Problems with current methods of data analysis and reporting, and suggestions for moving beyond incorrect ritual.Eur J Emerg Med. 2002; 9: 203-207Crossref PubMed Scopus (18) Google Scholar Because this is not true in many registries, the potential to misinterpret the veracity and even the importance of a difference noted is much greater. Research based on data from trial registries should be interpreted with caution and should not be confused with prospective, observational trials. Registry database studies are usually best as descriptive reports that help provide insights into developing a hypothesis for prospective research.A registry is only representative of the patients enrolled. Although that statement seems obvious, simple, and even intuitive, the consequences of bias in the selection of patients for the registry are frequently not considered in analyses or conclusions. Research costs time and money, and it may therefore be reasonable or even necessary to enroll a convenience sample. Selection bias associated with convenience sampling can be minimized if every potential case has an equal probability of being selected or consecutive cases are enrolled. That is not always the case, and in more than 1 registry (including ADHERE, in which participating centers only need to submit a monthly quota of selected cases), there is no assurance of consecutive or random sampling. Selection bias makes it impossible to ensure accurate population estimates. In addition, ADHERE allows repeated enrollment of the same patient (without any means to account for this in analysis) that further threatens the accuracy of the population estimates of patient demographics and outcomes.7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google Scholar The number of patient registry databases and literature based on predominantly convenience sample case series (such as this one) has grown in recent years. Proponents of registries argue that we can obtain more information about actual practice and patient outcomes (effectiveness) as opposed to the efficacy reported from highly structured randomized controlled trials. However, effectiveness can only be measured if there is no selection bias and if the care delivered at the centers that participate in these registries represents typical practice. It is not evident that either of these is true.2Armstrong D. Kline-Rogers E. Jani S.M. et al.Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome.Arch Intern Med. 2005; 165: 1125-1129Crossref PubMed Scopus (79) Google Scholar, 3Tu J.V. Willison D.J. Silver F.L. et al.for the Investigators in the Registry of the Canadian Stroke RegistryImpracticability of informed consent in the Registry of the Canadian Stroke Network.N Engl J Med. 2004; 350: 1414-1421Crossref PubMed Scopus (347) Google Scholar In fact, there are many features inherent in registries, including ADHERE, either by design or convenience, that threaten the internal validity of analyses. Medical record review, a common method of data collection in registries, as well as other research, is best when performed with standardized methods.4Gilbert E.H. Lowenstein S.R. Koziol-McLain J. et al.Chart reviews in emergency medicine research: where are the methods?.Ann Emerg Med. 1996; 27: 305-308Abstract Full Text Full Text PDF PubMed Scopus (937) Google Scholar, 5Schwartz R.J. Panacek E.A. Basics of research, part 7: archival data research.Air Med J. 1996; 15: 119-124Abstract Full Text PDF PubMed Scopus (11) Google Scholar The ADHERE design includes good medical record review methods, but the retrospectively collected data are still limited to the quality and accuracy of information recorded in the medical chart.6Lowenstein S.R. Medical record reviews in emergency medicine: the blessing and the curse.Ann Emerg Med. 2005; 45: 452-455Abstract Full Text Full Text PDF PubMed Scopus (48) Google Scholar A registry's potential benefits often relate to its large, multicenter patient recruitment, but the large sample also leads to a greater potential for misinterpretation of significance. Just as a study that is underpowered may fail to recognize a true difference, when a sample is large, the chance that a difference, whether meaningful or not, will be found during analyses is much greater. With more than 100,000 hospital encounters in ADHERE,7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google Scholar it is easy to find significant statistical differences or to report estimates which appear to be very precise with narrow confidence intervals. However, any statistical test or confidence interval calculation assumes perfect data, without any bias in the data collection or analysis.8Maclure M. Schneeweis S. Causation of bias: the episcope.Epidemiology. 2001; 12: 114-122Crossref PubMed Scopus (95) Google Scholar, 9Schriger D. Problems with current methods of data analysis and reporting, and suggestions for moving beyond incorrect ritual.Eur J Emerg Med. 2002; 9: 203-207Crossref PubMed Scopus (18) Google Scholar Because this is not true in many registries, the potential to misinterpret the veracity and even the importance of a difference noted is much greater. Research based on data from trial registries should be interpreted with caution and should not be confused with prospective, observational trials. Registry database studies are usually best as descriptive reports that help provide insights into developing a hypothesis for prospective research. A registry is only representative of the patients enrolled. Although that statement seems obvious, simple, and even intuitive, the consequences of bias in the selection of patients for the registry are frequently not considered in analyses or conclusions. Research costs time and money, and it may therefore be reasonable or even necessary to enroll a convenience sample. Selection bias associated with convenience sampling can be minimized if every potential case has an equal probability of being selected or consecutive cases are enrolled. That is not always the case, and in more than 1 registry (including ADHERE, in which participating centers only need to submit a monthly quota of selected cases), there is no assurance of consecutive or random sampling. Selection bias makes it impossible to ensure accurate population estimates. In addition, ADHERE allows repeated enrollment of the same patient (without any means to account for this in analysis) that further threatens the accuracy of the population estimates of patient demographics and outcomes.7Adams K.F. Fonarrow G.C. Emerman C.L. et al.Characteristics and outcomes of patients hospitalized for heart failure in the United States: rationale, design, and preliminary observations from the first 100,000 cases in the Acute Decompensated Heart Failure Registry (ADHERE).Am Heart J. 2005; 149: 209-216Abstract Full Text Full Text PDF PubMed Scopus (1632) Google Scholar Limitations to the Analysis of Test Characteristics of the ED Chest RadiographIf we ignore the general registry problems and biases and pretend ADHERE data were perfect, we must accept 2 more key assumptions to believe the results of the Collins et al analysis of ADHERE: that the criterion standard is correct and that there is no selection bias with regard to ED cases. The validity of these assumptions is dubious.The diagnosis of decompensated heart failure is the criterion standard in the Collins et al analysis. What does it mean that the patient has decompensated heart failure, and what is the best way to extrapolate this to a criterion standard for the purposes of diagnostic research? Because ADHERE data are based on medical record review, we do not know how the clinicians decided the patients' diagnosis, whether there was a standard evaluation of all patients, or even how the definition may have varied at each participating center. This lack of a criterion standard is a fundamental issue that produces bias in diagnostic research that is hard to adjust for.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar, 11Knottnerus J.A. van Weel C. Muris J.W.M. Evidence base of clinical diagnosis: evaluation of diagnostic procedures.BMJ. 2002; 324: 477-480Crossref PubMed Scopus (289) Google ScholarIn practice, the diagnosis of heart failure is usually based on clinical criteria. Echocardiography and radionucleotide and other ancillary laboratory and radiographic tests may be used to assess cardiac function in patients who present with symptoms of heart failure, but how they were used in ADHERE patients is unknown. Presumably, some diagnoses were made based on clinical criteria, and in other cases ancillary tests suggesting cardiac dysfunction influenced the final discharge diagnosis. If the criterion standard (diagnosis of heart failure at hospital discharge) was based on the results of ancillary and functional tests, then only those patients for whom this evaluation was performed would be identified (verification bias), and the relevance to ED practice is uncertain.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar If the diagnosis of congestive heart failure is in part determined by the chest radiograph (incorporation bias), then it is not valid to evaluate the sensitivity of the imaging study for the diagnosis. This type of circular reasoning, in which the chest radiograph is used in practice to establish the discharge diagnosis of congestive heart failure and then later analyzed for its accuracy to detect the same criterion standard (discharge diagnosis), results in overly optimistic estimates of sensitivity. Ultimately, the accuracy of the discharge diagnosis in ADHERE cannot be confirmed.The evaluation of false-negative chest radiographs not only assumes the discharge diagnosis of congestive heart failure (the criterion standard) is correct but also that the ED diagnosis is incorrect, an assumption not supported by any data. The authors' analysis fails to consider the possibility that a change in the patient's disease occurred during the hospitalization. The ED and the hospital discharge diagnosis may be different, and both may be correct. The patients who were “missed” in the ED were often diagnosed with a condition that may develop into heart failure as a complication of the underlying disease process (eg, arrhythmias or myocardial ischemia). It is likely that in some cases in which the ED diagnosis does not match the discharge diagnosis, the heart failure was not present during the ED evaluation, resulting in the misclassification of ED radiograph interpretations as false-negative.In ADHERE, the hospital discharge diagnosis of congestive heart failure is not just the criterion standard but is simultaneously the means to identify the convenience sample of enrolled cases. This methodology creates an additional selection bias pertinent to the Collins et al research questions because ADHERE does not include all ED patients with congestive heart failure. ED patients discharged home after ED treatment or those identified with heart failure in the ED but without verification of their disease or change in diagnosis by the admitting physician are not included. Thus, not only does ADHERE not capture a representative sample of the hospitals' discharged heart failure patients but also the registry does not capture the EDs' population of heart failure patients. If the study involves a selected sample, do the results really apply to our patients and our practice? The test is not assessed in the patients (all ED patients with heart failure) for whom we will use the results.12Jaeschke R. Guyatt G. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 389-391Crossref PubMed Scopus (1022) Google Scholar, 13Jaeschke R. Guyatt G.H. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 703-707Crossref PubMed Scopus (2133) Google Scholar, 14[No authors listed]. How to read clinical journals, II: to learn about a diagnostic test. Can Med Assoc J. 1981;124:703-720.Google Scholar Failure to include all ED patients with an ED diagnosis of congestive heart failure makes the accuracy of the results sought in the Collins et al research question (the sensitivity and false negative rate of the ED chest radiograph to detect the hospital discharge diagnosis congestive heart failure) and the premise of their article (the prevalence of negative chest radiographs in ED patients with congestive heart failure) unknowable. To accurately answer these questions, we need an unbiased sample of all ED patients across the spectrum of the disease.It is not clear what information we will be able to learn from the ADHERE database. It can provide some information about the care and outcomes of the selected patients identified at hospital discharge with decompensated heart failure. The prevalence of negative ED chest radiographs cannot be accurately determined with this database. Collins et al1Collins S. Lindsell C.J. Storrow A.B. et al.Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure.Ann Emerg Med. 2006; 47: 13-18Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar dutifully record many of their study's limitations but then offer a conclusion that is predicated on the assumption that none of the limitations matter.15Schriger D.L. Suggestions for improving the reporting of clinical research: the role of narrative.Ann Emerg Med. 2005; 45: 437-443Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar If we made different assumptions based on the spectrum of patients treated in the ED and different assumptions about the accuracy of the diagnosis based on differential evaluation, the true number of patients “missed” based on their initial ED radiograph likely would not be 1 in 5 but would be far more infrequent.So what does the emergency physician do with this information? The initial ED chest radiography might be insensitive to the discharge diagnosis in a convenience sample of patients admitted to the hospital. However, the radiograph is a simple test that provides useful clinical information. It not only helps in defining the disease in patients with clinical signs of heart failure but also can reveal complicating features and detect other etiologies associated with a vague or unclear presentation. Although it is important to consider the diagnosis of heart failure even if there are no classic radiograph signs, the alternative diagnoses that provide chest discomfort and dyspnea or hypoxia are equally important. As with any test, one needs to consider whether a test's result will change patient outcomes before ordering it. Until there is better research to suggest differently, I will continue to order a simple chest radiograph and not feel angst that I am missing clinically important cases of heart failure because the radiograph is “negative.” If we ignore the general registry problems and biases and pretend ADHERE data were perfect, we must accept 2 more key assumptions to believe the results of the Collins et al analysis of ADHERE: that the criterion standard is correct and that there is no selection bias with regard to ED cases. The validity of these assumptions is dubious. The diagnosis of decompensated heart failure is the criterion standard in the Collins et al analysis. What does it mean that the patient has decompensated heart failure, and what is the best way to extrapolate this to a criterion standard for the purposes of diagnostic research? Because ADHERE data are based on medical record review, we do not know how the clinicians decided the patients' diagnosis, whether there was a standard evaluation of all patients, or even how the definition may have varied at each participating center. This lack of a criterion standard is a fundamental issue that produces bias in diagnostic research that is hard to adjust for.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar, 11Knottnerus J.A. van Weel C. Muris J.W.M. Evidence base of clinical diagnosis: evaluation of diagnostic procedures.BMJ. 2002; 324: 477-480Crossref PubMed Scopus (289) Google Scholar In practice, the diagnosis of heart failure is usually based on clinical criteria. Echocardiography and radionucleotide and other ancillary laboratory and radiographic tests may be used to assess cardiac function in patients who present with symptoms of heart failure, but how they were used in ADHERE patients is unknown. Presumably, some diagnoses were made based on clinical criteria, and in other cases ancillary tests suggesting cardiac dysfunction influenced the final discharge diagnosis. If the criterion standard (diagnosis of heart failure at hospital discharge) was based on the results of ancillary and functional tests, then only those patients for whom this evaluation was performed would be identified (verification bias), and the relevance to ED practice is uncertain.10Mower W.R. Evaluating bias and variability in diagnostic test results.Ann Emerg Med. 1999; 33: 85-91Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar If the diagnosis of congestive heart failure is in part determined by the chest radiograph (incorporation bias), then it is not valid to evaluate the sensitivity of the imaging study for the diagnosis. This type of circular reasoning, in which the chest radiograph is used in practice to establish the discharge diagnosis of congestive heart failure and then later analyzed for its accuracy to detect the same criterion standard (discharge diagnosis), results in overly optimistic estimates of sensitivity. Ultimately, the accuracy of the discharge diagnosis in ADHERE cannot be confirmed. The evaluation of false-negative chest radiographs not only assumes the discharge diagnosis of congestive heart failure (the criterion standard) is correct but also that the ED diagnosis is incorrect, an assumption not supported by any data. The authors' analysis fails to consider the possibility that a change in the patient's disease occurred during the hospitalization. The ED and the hospital discharge diagnosis may be different, and both may be correct. The patients who were “missed” in the ED were often diagnosed with a condition that may develop into heart failure as a complication of the underlying disease process (eg, arrhythmias or myocardial ischemia). It is likely that in some cases in which the ED diagnosis does not match the discharge diagnosis, the heart failure was not present during the ED evaluation, resulting in the misclassification of ED radiograph interpretations as false-negative. In ADHERE, the hospital discharge diagnosis of congestive heart failure is not just the criterion standard but is simultaneously the means to identify the convenience sample of enrolled cases. This methodology creates an additional selection bias pertinent to the Collins et al research questions because ADHERE does not include all ED patients with congestive heart failure. ED patients discharged home after ED treatment or those identified with heart failure in the ED but without verification of their disease or change in diagnosis by the admitting physician are not included. Thus, not only does ADHERE not capture a representative sample of the hospitals' discharged heart failure patients but also the registry does not capture the EDs' population of heart failure patients. If the study involves a selected sample, do the results really apply to our patients and our practice? The test is not assessed in the patients (all ED patients with heart failure) for whom we will use the results.12Jaeschke R. Guyatt G. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 389-391Crossref PubMed Scopus (1022) Google Scholar, 13Jaeschke R. Guyatt G.H. Sackett D.L. Users' guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? Evidence-Based Medicine Working Group.JAMA. 1994; 271: 703-707Crossref PubMed Scopus (2133) Google Scholar, 14[No authors listed]. How to read clinical journals, II: to learn about a diagnostic test. Can Med Assoc J. 1981;124:703-720.Google Scholar Failure to include all ED patients with an ED diagnosis of congestive heart failure makes the accuracy of the results sought in the Collins et al research question (the sensitivity and false negative rate of the ED chest radiograph to detect the hospital discharge diagnosis congestive heart failure) and the premise of their article (the prevalence of negative chest radiographs in ED patients with congestive heart failure) unknowable. To accurately answer these questions, we need an unbiased sample of all ED patients across the spectrum of the disease. It is not clear what information we will be able to learn from the ADHERE database. It can provide some information about the care and outcomes of the selected patients identified at hospital discharge with decompensated heart failure. The prevalence of negative ED chest radiographs cannot be accurately determined with this database. Collins et al1Collins S. Lindsell C.J. Storrow A.B. et al.Prevalence of negative chest radiography in the emergency department patient with decompensated heart failure.Ann Emerg Med. 2006; 47: 13-18Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar dutifully record many of their study's limitations but then offer a conclusion that is predicated on the assumption that none of the limitations matter.15Schriger D.L. Suggestions for improving the reporting of clinical research: the role of narrative.Ann Emerg Med. 2005; 45: 437-443Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar If we made different assumptions based on the spectrum of patients treated in the ED and different assumptions about the accuracy of the diagnosis based on differential evaluation, the true number of patients “missed” based on their initial ED radiograph likely would not be 1 in 5 but would be far more infrequent. So what does the emergency physician do with this information? The initial ED chest radiography might be insensitive to the discharge diagnosis in a convenience sample of patients admitted to the hospital. However, the radiograph is a simple test that provides useful clinical information. It not only helps in defining the disease in patients with clinical signs of heart failure but also can reveal complicating features and detect other etiologies associated with a vague or unclear presentation. Although it is important to consider the diagnosis of heart failure even if there are no classic radiograph signs, the alternative diagnoses that provide chest discomfort and dyspnea or hypoxia are equally important. As with any test, one needs to consider whether a test's result will change patient outcomes before ordering it. Until there is better research to suggest differently, I will continue to order a simple chest radiograph and not feel angst that I am missing clinically important cases of heart failure because the radiograph is “negative.”

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Misleading Negative Chest Radiographs: Should We ADHERE to the Conclusions?