The Levels of Evidence and Their Role in Evidence-Based Medicine
2011; Lippincott Williams & Wilkins; Volume: 128; Issue: 1 Linguagem: Inglês
10.1097/prs.0b013e318219c171
ISSN1529-4242
AutoresPatricia B. Burns, Rod J. Rohrich, Kevin C. Chung,
Tópico(s)Digital Imaging in Medicine
ResumoAs the name suggests, evidence-based medicine is about finding evidence and using that evidence to make clinical decisions. A cornerstone of evidence-based medicine is the hierarchical system of classifying evidence. This hierarchy is known as the levels of evidence. Physicians are encouraged to find the highest level of evidence to answer clinical questions. Several articles published in plastic surgery journals concerning evidence-based medicine topics have touched on this subject.1–6 Specifically, previous articles have discussed the lack of higher level evidence in Plastic and Reconstructive Surgery and the need to improve the evidence published in the Journal. Before that can be accomplished, it is important to understand the history behind the levels and how they should be interpreted. This article focuses on the origin of levels of evidence, their relevance to the evidence-based medicine movement, and the implications for the field of plastic surgery and the everyday practice of plastic surgery. HISTORY OF LEVELS OF EVIDENCE The levels of evidence were originally described in a report by the Canadian Task Force on the Periodic Health Examination in 1979.7 The report's purpose was to develop recommendations on the periodic health examination and base those recommendations on evidence in the medical literature. The authors developed a system of rating evidence (Table 1) when determining the effectiveness of a particular intervention. The evidence was taken into account when grading recommendations. For example, a grade A recommendation was given if there was good evidence to support a recommendation that a condition be included in the periodic health examination. The levels of evidence were further described and expanded by Sackett8 in an article on levels of evidence for antithrombotic agents in 1989 (Table 2). Both systems place randomized controlled trials at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias. Randomized controlled trials are given the highest level because they are designed to be unbiased and have less risk of systematic errors. For example, by randomly allocating subjects to two or more treatment groups, these types of studies also randomize confounding factors that may bias results. A case series or expert opinion is often biased by the author's experience or opinions, and there is no control of confounding factors.Table 1: Canadian Task Force on the Periodic Health Examination's Levels of EvidenceTable 2: Levels of Evidence from SackettMODIFICATION OF LEVELS Since the introduction of levels of evidence, several other organizations and journals have adopted variations of the classification system. Diverse specialties are often asking different questions, and it was recognized that the type and level of evidence needed to be modified accordingly. Research questions are divided into the following categories: treatment, prognosis, diagnosis, and economic/decision analysis. For example, Table 3 shows the levels of evidence developed by the American Society of Plastic Surgeons for prognosis9 and Table 4 shows the levels developed by the Centre for Evidence-Based Medicine for treatment.10 The two tables highlight the types of studies that are appropriate for the question (prognosis versus treatment) and how quality of data is taken into account when assigning a level. For example, randomized controlled trials are not appropriate when looking at the prognosis of a disease. The question in this instance is, "What will happen if we do nothing at all?" Because a prognosis question does not involve comparing treatments, the highest evidence would come from a cohort study or a systematic review of cohort studies. The levels of evidence also take into account the quality of the data. For example, in the chart from the Centre for Evidence-Based Medicine, a poorly designed randomized controlled trial has the same level of evidence as a cohort study.Table 3: Levels of Evidence for Prognostic StudiesTable 4: Levels of Evidence for Therapeutic StudiesA grading system that provides strength of recommendations based on evidence has also changed over time. Table 5 shows the Grade Practice Recommendations developed by the American Society of Plastic Surgeons. The grading system provides an important component in evidence-based medicine and assists in clinical decision making. For example, a strong recommendation is given when there is level I evidence and consistent evidence from level II, III, and IV studies available. The grading system does not degrade lower level evidence when deciding recommendations if the results are consistent.Table 5: Grade Practice RecommendationsINTERPRETATION OF LEVELS Many journals assign a level to the articles they publish, and authors often assign a level when submitting an abstract to conference proceedings. This allows the reader to know the level of evidence of the research, but the designated level of evidence does always guarantee the quality of the research. It is important that readers not assume that level I evidence is always the best choice or appropriate for the research question. This concept will be very important for all of us to understand as we evolve into the field of evidence-based medicine in plastic surgery. By design, our designated surgical specialty will always have important articles that may have a lower level of evidence because of the level of innovation and technique articles that are needed to move our surgical specialty forward. Although randomized controlled trials are often assigned the highest level of evidence, not all randomized controlled trials are conducted properly, and the results should be scrutinized carefully. Sackett8 stressed the importance of estimating types of errors and the power of studies when interpreting results from randomized controlled trials. For example, a poorly conducted randomized controlled trial may report a negative result because of low power when in fact a real difference exists between treatment groups. Scales such as the Jadad scale have been developed to judge the quality of randomized controlled trials.11 Although physicians may not have the time or inclination to use a scale to assess quality, there are some basic items that should be taken into account. Items used for assessing randomized controlled trials include randomization, blinding, a description of the randomization and blinding process, a description of the number of subjects who withdrew or dropped out of the study, the confidence intervals around study estimates, and a description of the power analysis. For example, Bhandari et al.12 published an article assessing the quality of surgical randomized controlled trials. The authors evaluated the quality of randomized controlled trials reported in the Journal of Bone and Joint Surgery from 1988 to 2000. Articles with a score of greater than 75 percent were deemed high quality, and 60 percent of the articles had a score less than 75 percent. The authors identified 72 randomized controlled trials during this time period, and the mean score was 68 percent. The main reason for the low-quality score was lack of appropriate randomization, blinding, and a description of patient exclusion criteria. Another article found the same quality score of articles in the Journal of Bone and Joint Surgery with a level 1 rating compared with level 2.13 Therefore, one should not assume that level 1 studies are of higher quality than level 2 studies. A resource for surgeons to use when appraising levels of evidence are the users' guides published in the Canadian Journal of Surgery14,15 and the Journal of Bone and Joint Surgery.16 Similar articles that are not specific to surgery have been published in the Journal of the American Medical Association.17,18 PLASTIC SURGERY AND EVIDENCE-BASED MEDICINE The field of plastic surgery has been slow to adopt evidence-based medicine. This was demonstrated in an article examining the level of evidence of articles published in Plastic and Reconstructive Surgery.19 The authors assigned levels of evidence to articles published in Plastic and Reconstructive Surgery over a 20-year period. The majority of studies (93 percent in 1983) were level IV or V, which denotes case series and case reports. Although the results were disappointing, there was some improvement over time. By 2003, there were more level I studies (1.5 percent) and fewer level IV and V studies (87 percent). A recent analysis looked at the number of level I studies in five different plastic surgery journals from 1978 to 2009. The authors defined level I studies as randomized controlled trials and meta-analyses and restricted their search to these studies. The number of level I studies increased from one in 1978 to 32 by 2009.20 From these results, we see that the field of plastic surgery is improving the level of evidence but still has a long way to go, especially in improving the quality of studies published. For example, approximately one-third of the studies involved double blinding, but the majority did not randomize subjects, describe the randomization process, or perform a power analysis. Power analysis is another area of concern in plastic surgery. A review of the plastic surgery literature found that the majority of published studies have inadequate power to detect moderate to large differences between treatment groups.21 Regardless of the level of evidence for a study, if the study is underpowered, the interpretation of results is questionable. Although the goal is to improve the overall level of evidence in plastic surgery, this does not mean that all lower level evidence should be discarded. Case series and case reports are important for hypothesis generation and can lead to more controlled studies. In addition, in the face of overwhelming evidence to support a treatment, such as the use of antibiotics for wound infections, there is no need for a randomized controlled trial. CLINICAL EXAMPLES USING LEVELS OF EVIDENCE To understand how the levels of evidence work and aid the reader in interpreting levels, we provide some examples from the plastic surgery literature. The examples also show the peril of medical decisions based on results from case reports. An association was hypothesized between lymphoma and silicone breast implants based on case reports.22–27 The level of evidence for case reports, depending on the scale used, is IV or V. These case reports were used to generate the hypothesis that a possible association existed. Because of these results, several large retrospective cohort studies from the United States, Canada, Denmark, Sweden, and Finland were conducted.28–32 The level of evidence for a retrospective cohort study is II. All of these studies had many years of follow-up for a large number of patients. Some of the studies found an elevated risk and others found no risk for lymphoma. None of the studies reached statistical significance. Therefore, higher level evidence from cohort studies does not provide evidence of any risk of lymphoma. Finally, a systematic review was performed that combined the evidence from the retrospective cohorts.27 The results found an overall standardized incidence ratio of 0.89 (95 percent confidence interval, 0.67 to 1.18). Because the confidence interval includes 1, the results indicate there is no increased incidence. The level of evidence for the systematic review is I. Based on the best available evidence, there is no association between lymphoma and silicone implants. This example shows how studies with a low level of evidence were used to generate a hypothesis, which then led to higher level evidence that disproved the hypothesis. This example also demonstrates that randomized controlled trials are not feasible for rare events such as cancer and emphasizes the importance of observational studies for a specific study question. A case-control study is a better option and provides higher level evidence for testing the prognosis of the long-term effect of silicone breast implants. Another example is the injection of epinephrine in fingers. Based on case reports before 1950, physicians were advised that epinephrine injection can result in finger ischemia.33 We see in this example that level IV or V evidence was accepted as fact and incorporated into medical textbooks and teaching. However, not all physicians accepted this evidence and were performing injections of epinephrine into the fingers, with no adverse effects on the hand. Obviously, it was time for higher level evidence to resolve this issue. An in-depth review of the literature from 1880 to 2000 by Denkler33 identified 48 cases of digital infarction, of which 21 had been injected with epinephrine. Further analysis found that the addition of procaine to the epinephrine injection was the cause of the ischemia.34 The procaine used in these injections included toxic acidic batches that were recalled in 1948. In addition, several cohort studies found no complications from the use of epinephrine in the fingers and hand.35–37 The results from these cohort studies increased the level of evidence. Based on the best available evidence from these studies, the hypothesis that epinephrine injection will harm fingers was rejected. This example highlights the biases inherent in case reports. It also shows the risk when spurious evidence is handed down and integrated into medical teaching. OBTAINING THE BEST EVIDENCE We have established the need for randomized controlled trials to improve evidence in plastic surgery but have also acknowledged the difficulties, particularly with randomization and blinding. Although randomized controlled trials may not be appropriate for many surgical questions, well-designed and well-conducted cohort or case-control studies could boost the level of evidence. Many of the current studies tend to be descriptive and lack a control group. The way forward seems clear. Plastic surgery researchers need to consider using a cohort or case-control design whenever a randomized controlled trial is not possible. If designed properly, the level of evidence for observational studies can approach or surpass those from a randomized controlled trial. In some instances, observational studies and randomized controlled trials have yielded similar results.38 If enough cohort or case-control studies become available, the prospect of systematic reviews of these studies will increase, which will increase overall evidence levels in plastic surgery. CONCLUSIONS The levels of evidence are an important component of evidence-based medicine. Understanding the levels and why they are assigned to publications and abstracts helps the reader to prioritize information. This is not to say that all level IV evidence should be ignored and all level I evidence accepted as fact. The levels of evidence provide a guide, and the reader needs to be cautious when interpreting these results. ACKNOWLEDGMENTS This work was supported in part by a Midcareer Investigator Award in Patient-Oriented Research (K24 AR053120) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (to K.C.C.).
Referência(s)