Artigo Revisado por pares

Evidence–based Neonatology

2006; American Academy of Pediatrics; Volume: 7; Issue: 9 Linguagem: Inglês

10.1542/neo.7-9-e474

ISSN

1526-9906

Autores

Richard A. Polin, John M. Lorenz, David Bateman,

Tópico(s)

Health Sciences Research and Education

Resumo

After completing this article, readers should be able to: From the time physicians enter medical school to the time they retire from medicine, they are guided by the principle, “Primum Non Nocere,” first do no harm. Although the origins of that phrase are uncertain, it has been widely attributed to Hippocrates who said, “I will prescribe regimens for the good of my patients according to my ability and my judgment and never do harm to anyone.” Fundamental to this concept of being a good physician is the acquisition of medical knowledge to provide the highest quality care. Unlike the practice of neonatology 20 years ago, clinicians today are bombarded with incredible amounts of advice and information (scientific meetings, lectures, journals, peers, and the Internet) and ultimately must decide when and if to incorporate a change into clinical practice. In the 1990s, the phrase “evidence-based medicine” became the watchword for the most reliable source of new information. Meta-analyses often are considered the gold standard because an investigator has taken the time to review a topic critically and used a statistical test to confirm or refute the hypothesis. Although meta-analyses are important, they are an imperfect tool and represent only one of several sources of information for the practicing neonatologist. This review critically examines the strengths and weaknesses of new information and provides a primer on commonly used statistical methods in evidence-based medicine. Table 1 lists common definitions for terms used in this article.If there is a challenge for practicing clinicians in the new millennium, it is filtering and assimilating the enormous amount of information received on a daily basis. New therapies are constantly being recommended for neonates who have a wide variety of problems. Such recommendations range from changes in care practices to prevent physiologic derangements (eg, covering preterm infants in the delivery room to decrease heat loss) to new therapies for life-threatening or disabling conditions (eg, probiotics to prevent necrotizing enterocolitis). Clinicians, however, should be cautious before incorporating a new treatment into their standard practice. For example, in the early 1950s, oxygen was administered freely to treat a variety of conditions, ranging from apnea to prematurity to birth asphyxia. Oxygen was recommended for apnea because of “observations” in the 1940s that supplemental oxygen could improve periodic breathing. Thousands of preterm infants were maintained on incubators containing more than 50% oxygen, resulting in an increase in retinopathy of prematurity (ROP). That practice was discontinued in 1954 after the first multicenter, randomized clinical trial in neonatal medicine demonstrated that curtailment of this practice reduced the incidence of cicatricial ROP. Unfortunately, the practice of restricting supplemental oxygen to less than 40% was adopted (but not tested) and led to an increase in neonatal mortality and spastic diplegia. It is concerning that neonatologists now are faced with the decision of whether to limit the amount of oxygen used for resuscitation in term and near-term infants based on a limited amount of data on long-term outcomes.When deciding whether to incorporate a new treatment into practice, several clinical questions must be addressed:There are four common sources of new information for practitioners: 1) wise clinicians, 2) observational studies, 3) randomized clinical trials, and 4) systematic reviews. Each has advantages and disadvantages and has an important role in the educational process.Much of what most physicians know has been learned from “wise clinicians.” That process begins in medical school and extends through residency and postresidency years. A “wise clinician” needs only to be a little smarter than the person he or she is teaching. For the intern, it might mean a second- or third-year resident; for the senior resident, it might be a fellow or an attending physician. There are two major advantages to learning via this approach. First, wise clinicians can provide valuable information about the subtleties of care not evident in a clinical trial. For example, the literature says that Mycoplasma infections can be treated with erythromycin, but the wise clinician may teach that the compliance of individuals taking erythromycin is suboptimal because of abdominal discomfort. Second, learning from wise clinicians is instantaneous and only requires close proximity of the teacher and student. On the other hand, there are two important disadvantages to this form of education. Wise clinicians are not always so “wise,” and even when they know the correct information, they may not be able to convey it effectively. In addition, the spoken word is more likely to be misinterpreted, passing along erroneous “new knowledge.”In observational studies, clinical practice decisions are made by clinicians and patients based on potential or perceived risks and benefits. The health status of the recipients of the interventions then is observed. If data are not available for a comparison group, the study is descriptive. If data are collected for a comparison group not receiving the intervention of interest, the study is analytical. Analytical studies are classified further by the directionality of the study. In a cohort study, the exposure is the starting point, and exposed and unexposed (or treated and untreated) groups are followed over time to ascertain the health outcome hypothesized to be affected by the treatment. Time can be in the past (retrospective cohort study) or in the present and future (prospective cohort study). In the case-control study, directionality is reversed, and the outcome is the starting point. Subjects with the outcome of interest are identified, unaffected controls are selected, and a history of exposure to the treatment is sought in both groups. Subjects with the health outcome of interest are cases and subjects without the outcome are controls. In the cross-sectional study, the exposure and outcome are ascertained at the same time within a study population.Observational studies have an important role determining the incidence and natural course of a disease or health state and in generating hypotheses about pathogenesis or treatment response. A cause-and-effect relation can be hypothesized based on an observation of an association. This hypothesis can be tested in a randomized clinical trial (RCT). Observational analytical studies are generally less costly, more timely, and include a broader range of patients than do randomized, controlled trials. The latter aspect increases the likelihood that the study results can be generalized to the population to whom the study is purported to apply. The major disadvantage, of course, is that the nonrandom allocation of the intervention increases the likelihood of incorporating bias and confounding into apparent treatment effects.Cohort studies are the best analytical study design for determining the incidence and natural course of a disease or health state. The prospective cohort study is the most appropriate nonexperimental research design to evaluate an intervention when a randomized, controlled trial is not possible, the outcome is relatively common, and the interval between the intervention and outcome is relatively short. Furthermore, the properly designed and executed prospective cohort study is the analytical study type least susceptible to bias. The possibility of selection bias is minimized if the study is population-based, ie, the study includes all members of a population (usually restricted to a geographic area) at risk for the outcome. Controls should be contemporary. The use of historical controls in assessments of intervention is very risky. Among other problems, patients treated in the present tend to receive a variety of new interventions other than the one under investigation, and it is hard to “tease out” the singular effect of the treatment under study. A prospective design and execution of cohort studies minimizes ascertainment bias. Direct and contemporary determination of the intervention under study is preferable to data obtained from subject recall or existing medical records. Criteria for the intervention should be objective, strictly and uniformly applied, and (if possible) quantifiable. Quantification of the intervention allows possible investigation of a dose-response relationship. Finally, the outcome should be identified prospectively, and an a priori hypothesis about its relationship to the treatment should be stated. There is room, however, for studies that do not have an a priori hypothesis but are truly hypothesis-generating. Criteria for assigning outcome should be objective, reliable, and (if possible) quantifiable. If all elements of subjectivity cannot be eliminated, evaluators who are blinded to the exposure should determine outcome. When the reliability of the determination of outcome is imperfect, levels of assurance of the outcome (eg, definite, probable, suspect) should be assigned.To reduce the possibility of confounding, known and potential confounding variables should be measured and their effects appropriately controlled by the experimental design or in analysis of the data. It is not necessary for an extraneous variable to be statistically significantly associated with exposure and outcome for it to confound a relationship. More attention should be paid to the extent to which the odds ratio or relative risk is attenuated by the hypothesized confounder than to statistical significance. At the same time, care must be taken to avoid controlling for variables that are in a causal chain between the exposure and the outcome, which could mask an actual relationship between the exposure and outcome. The possibility of confounding never can be excluded completely in observational studies; it is always possible that unknown confounders exist or that known confounders are not available or adequately described in the data set.When the outcome is relatively rare or the interval between the exposure and outcome is relatively long, cohort studies may not be feasible or cost-effective. In these circumstances, case-control studies yield more timely information with less resource expenditure. In general, case-control studies are more vulnerable to biases than are cohort studies. Ideally, all cases from the source population should be included in a case-control study. The process used to identify cases should be described, and criteria for inclusion or exclusion should be clearly specified. Unlike cohort studies, the control group in a case-control study includes only a sample of individuals without the outcome from the source population. Thus, the case-control study’s validity is greatly affected by the methods of choosing a control group. The control group should be representative of the portion of the source population at risk for the outcome. Controls should resemble cases in all respects except for their exposure to the intervention under study, which must be free to vary. In general, cases and controls are matched for variables known to increase the risk of the outcome. A good test of the comparability of cases and controls is similarity (in cases and controls) of the prevalence of risk factors for the outcome that were unmatched in the design and that are independent of the intervention. However, “overmatching” is a potential risk. If cases and controls are matched for a variable that is very closely related to the exposure of interest, the relationship between the exposure and outcome may be obscured. Case-control studies are vulnerable to bias and incompleteness in ascertainment of exposure because exposure status usually is determined retrospectively. As for all study designs, documentation of exposure should be objective, reliable, and (if possible) quantified. The investigators determining the exposure should be blinded to the subject’s outcome, and methods to detect exposure should be equally rigorous for cases and controls. Methods to achieve comparable exposure information on cases and controls include having interviewers and record reviewers blind to case or control status and not revealing the central hypothesis of the study to interviewed subjects.Because of their limitations, observational studies rarely should direct clinical practice. Neonatology is replete with examples where the establishment of clinical practice was based on observational studies that subsequent RCTs proved were ineffective or even harmful. In certain situations, observational studies can be used to direct clinical practice, but four criteria must be met. First, this should be limited to situations in which a randomized, controlled trial is unethical, not feasible, or not cost-effective (ie, so expensive compared with the value of the information to be gained that it is not worth conducting). Second, the observational study should be optimally designed and executed to address the issue. Descriptive observational studies and cross-sectional studies very rarely are sufficient to inform treatment decisions. Third, if the interval between the intervention and outcome is short or few other variables affect the risk of the outcome and these variables are well known (and easily controlled in the study design or analysis), the likelihood that the observed association is due to confounding decreases. Fourth, additional evidence should support the interpretation that the association between the intervention and the health outcome is very likely to be causal. Six criteria frequently are cited to judge the probability that an association emerging in observational research is causal: 1) The exposure is known with certainty to have preceded the outcome, 2) The magnitude of the association is large, 3) A dose-response relationship can be demonstrated between the intervention and the outcome, 4) The intervention causes a specific outcome and not many outcomes, 5) There is consistency among studies, and 6) Biologic plausibility for the association (ie, known molecular mechanisms, studies in animals, knowledge of routes of exposure) supports the reasonableness of the causal association posited.The RCT is considered the “gold standard” in the evaluation of efficacy of new interventions. A prospectively designed, carefully executed control trial (with sufficient statistical power to detect a clinically important difference in the outcome) offers the clinician a high degree of reassurance that the conclusions of the study are valid. Randomization is the most effective method of minimizing biases and confounding. In addition, RCTs permit application of statistical theory based on random sampling. However, there are two disadvantages to RCTs. First, if expected differences in study outcome have been incorrectly estimated, the study may not be large enough to answer the questions posed. The “Hawthorne effect” may be an important variable in determining the adequacy of study size predictions. The “Hawthorne effect” was described initially during a research project (1927 to 1932) at the Hawthorne Plant of the Western Electric Company in Cicero, Illinois. The major finding of the study was that almost regardless of the experimental manipulation employed, the production of the workers seemed to improve. In other words, the workers were pleased to receive the attention of the investigators and, therefore, improved productivity. The Hawthorne effect may be evident in clinical trials in the neonatal intensive care unit because of the attention paid by nurses and physicians to the care of control infants, leading to better outcomes in the control group. Second, even RCTs are susceptible to potential selection, performance, ascertainment, and exclusion biases (Table 1). It is important to acknowledge that even randomized trials are themselves not always definitive, particularly when small.Over the last 20 years, meta-analyses (quantitative systematic reviews) have become one of the most valued sources of information. Meta-analyses increase the statistical power lacking in smaller trials and allow clinicians greater security in accepting or rejecting treatment differences based on the trials. Although this method of analysis is used in a wide variety of specialties, it was applied first to perinatology. Within neonatology, the Cochrane Neonatal Review Group (CNRG) is a repository for these meta-analyses (http://www.update-software.com/Abstracts/NEONATALAbstractindex.htm). More than 115 meta-analyses are currently listed in the Cochrane Collaboration. Unlike qualitative reviews, quantitative systematic reviews have a prospectively designed protocol, a comprehensive and explicit search strategy, strict criteria for inclusion of studies, and standard definitions for outcomes.Similar to the other sources of information, meta-analyses have several disadvantages. First, results may be statistically significant, but not clinically significant. For example, the CNRG analysis of the benefit of intravenous immunoglobulin (IvIg) for preventing nosocomial sepsis (19 studies) notes that IvIg decreases the incidence of hospital-acquired infections by 3% to 4%. Although that reduction is statically significant (P<0.02), it is difficult to justify the cost of IvIg for such a minimal clinical benefit. Second, the results of a systematic review may not apply to a specific nursery or set of infants. For example, if the incidence of bronchopulmonary dysplasia in an individual nursery is extremely low, it may not justify the use of intramuscular vitamin A three times weekly (a proven effective intervention). Third, meta-analyses incorporate the biases of the original studies and add the additional biases of study selection and publication bias. The reader of a meta-analysis must depend on the reviewers’ ability to screen all relevant articles. Furthermore, negative result studies often are more difficult to publish and may never be accepted for publication. Fourth, meta-analyses, almost by definition, are plagued by heterogeneity. An example is the meta-analysis of nasal intermittent positive pressure ventilation (NIPPV) versus nasal continuous positive air pressure (NCPAP) following extubation. The conclusion is that NIPPV is effective in preventing failure of extubation after a period of mechanical ventilation in preterm infants younger than 37 weeks’ gestation. However, the criteria for offering rescue treatment appeared to be at clinicians’ discretion, and the proportions who were offered rescue NIPPV varied between the three trials. Therefore, the outcome of endotracheal intubation had a different meaning for each trial. In spite of little variation in enrollment criteria, infants were extubated in two of the studies at a median age of less than 1 week, whereas infants in the third study were extubated at a median age of 18.5 and 21 days in the two groups. In addition, the three trials used different levels of NCPAP and devices to provide NCPAP. Finally, the reader of a meta-analysis always should have a healthy degree of skepticism. Although the Cochrane Collaboration is a source of high-quality systematic reviews, a review of published meta-analyses in 1987 and 1992 reported that only 28% of 86 studies reviewed satisfied criteria believed to be important for content and reporting Moreover, a recent study reported only fair agreement (kappa=0.35) of the results of published meta-analyses based on smaller trials with single RCTs comprised of 1,000 subjects or more published in major journals. LeLorier and associates concluded if there had been no subsequent RCT, the meta-analysis would have led to the adoption of an ineffective treatment in 32% of cases. This finding may reflect chance, publication bias (favoring the publication of small studies with positive results), differences in the expertise of caregivers between small and large trials, or differences in patient risk or other factors.Many of the measures commonly used to summarize the results of analyses can be understood with reference to a 2×2 table (Fig. 1). In this table, the columns define the presence or absence of a condition (eg, a disease or outcome) and the rows define the presence or absence of an intervention or risk factor possibly associated with that condition. Data that fit into a 2×2 table are called categorical data, specifically binary categorical data because the variables in the columns and rows have only two alternatives, positive and negative. Another name is “count data” because the cell numbers are made up of “counts” or the numbers of individuals who possess or lack the characteristics indicated by the variables.The chi-square test often is used to decide whether there is any association between the disease and the risk factor. The chi-square test is used with count data. The chi-square test determines whether the observed cell values in a table differ from their expected values. The larger the chi-square value, the greater the differences between observations and expectations. Small differences produce small values, which is expected if there is no association between risk factor and disease. Larger differences are less likely due to chance and imply a possible association between risk factor and disease. For a 2×2 table, a chi-square value that exceeds 3.84 has at least a 95% chance of reflecting a real association between risk factor and disease. However, the chi-square value by itself does not indicate whether the association is positive or negative or due to a hidden cause (eg, a confounder).Some statisticians argue that the chi-square value obtained from a 2×2 table slightly overestimates the actual probability of an association. The Yates’ correction sometimes is used to compensate for this possibility. Fisher’s exact test, which is based on the binomial distribution, generally is used to analyze 2×2 tables when the expected value of one or more of the cells is less than 5.The term “risk” refers to the probability of some event happening. In a 2×2 table, risk is expressed best in conditional language: “given the presence of a risk factor, the probability of a disease is… .” Using the notation in Figure 1, risk is equivalent to: \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{Risk}{=}\mathrm{a}/(\mathrm{a}{+}\mathrm{b})\] \end{document}where “a” is the number of individuals with the disease who have the risk factor in question and “a+b” is the total number of individuals who are risk factor-positive (with and without the disease).This differs from the odds that an event will occur, in which the denominator is merely the alternative to a positive event: \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{Odds}{=}\mathrm{a/b}\] \end{document}The differences between “risk” and “odds” can be made clearer using the language of baseball. In baseball, the “risk” or probability that a 0.300 hitter will reach base on a hit is 3/10, given a legitimate plate appearance. The odds that he will get a hit are 3/7.Relative risk (RR) compares the risk or chance of an occurrence (eg, getting a disease) given the presence of a risk factor (R′=a/(a+b)) to the chance of an occurrence given the absence of the risk factor (R″=c/(c+d)). This is expressed as: \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{RR}{=}\mathrm{a}/(\mathrm{a}{+}\mathrm{b}){\div}\mathrm{c}/(\mathrm{c}{+}\mathrm{d})\] \end{document}RR is used in the type of study where the presence or absence of a risk factor is defined in advance (eg, exposure to a toxin) and data are collected over time to see if a particular disease or outcome develops. This type of study design is called “prospective” and is the foundation of a randomized, controlled trial. Although RR can be calculated for a retrospective cohort study, it cannot be calculated for a case-control study.RR is an attractive measure because it is easy to explain and interpret. An RR of approximately 1 means that the particular outcome is no more likely to be associated with the exposure than with the lack of exposure. An RR larger than 1 means that the outcome is more likely to occur with the exposure than without the exposure. An RR of less than 1 means the outcome is more likely to be associated with absence of exposure.Several variations on the presentation of RR may be used. Risk difference (RD) is the difference between the probability of the event with and without the exposure. Using the notation in the 2×2 table in Figure 1, \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{RD}{=}(\mathrm{a}/(\mathrm{a}{+}\mathrm{b})){-}(\mathrm{c}/(\mathrm{c}{+}\mathrm{d}))\] \end{document}RDs may be positive or negative. A positive RD suggests that there is a positive association between risk factor and disease; a negative RD suggests that the association is the presence of the disease and absence of the risk factor. If RD is close to 0, there is likely to be no association. RDs cannot be larger than 1 or smaller than −1.If a prospective trial shows that the use of a drug (the exposure) produces improvement in the disease (the outcome) compared with the use of placebo (no exposure), the trial provides evidence in favor of a potentially helpful medication. However, if the medication is expensive or has a high rate of complications, persons who took the drug but whose disease did not improve might be harmed more than if they had not received the drug. This scenario suggests the need to weigh the relative benefits of exposure to the medication.One approach to the process is to calculate the number of subjects who will need to be given the medication for each subject who will actually benefit compared with the number who will improve on the placebo. The “number needed to treat” is the inverse of the RD. The calculated value should be rounded up to the nearest whole number. \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{NNT}{=}1/\mathrm{RD}\] \end{document}An example of the use of a 2×2 table is shown in Figure 2.The odds ratio (OR) usually is defined as the ratio of the odds of a risk factor being present given the presence of a disease to the odds of the risk factor being present given no disease: \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[\mathrm{OR}{=}(\mathrm{a}/\mathrm{c}){\div}(\mathrm{b}/\mathrm{d})\] \end{document}The OR can be calculated for both case-control and cohort studies. When the outcome is relatively rare, the OR obtained from a case-control study approximates the value of RR obtained from a prospective study. An example is shown in Figure 3.Confidence intervals are used commonly to quantify the uncertainty of the estimated value of an effect measure. A 95% confidence interval reflects 95% certainty that the true value of the measure lies within the bounds of the interval. Remember that the true value is fixed; it is the interval, based on the estimate of the true value using the data in a particular sample, that varies among different data samples. Wide confidence intervals mean the estimation is broad and imprecise. Confidence intervals for ratio measures (OR, RR) that include 1 indicate uncertainty about the existence of an association between risk and outcome.The 95% confidence intervals for RR and the OR in Examples 1 and 2 have been calculated. In the first example (RR), the interval given is (1.002, 5.186). Based on the sample chosen for the study, this interval has a 95% chance of including the true RR. The interval does not include 1 (although not by much), meaning that there is a 95% certainty that the association between outcome and risk factor is significant. This corresponds to the P value (<0.05) for the chi-square analysis.As noted previously, meta-analysis is used to summarize the results of a series of similar studies, each of which has a similar binary outcome associated with a single binary risk factor presented as an OR. Each of the collection of 2×2 tables that form these studies, when given an appropriate weight, may be considered strata of a single larger study.The Mantel-Haenszel method is a specialized type of chi-square test that often is used to estimate the pooled OR for all strata.When outcome data are represented by numbers measured on a continuous scale (eg, weight, distance, or blood pressure values), different types of analyses are required.Student’s t-test is used to determine whether there is a significant difference between the average (mean) values of a continuous variable in two groups. “Student,” whose real name was William Gossett (1876 to 1937), developed the t-test as part of his work at the Guinness brewery in Dublin. Gossett took the pseudonym “Student” because Guinness, fearful that its trade secrets would be divulged, forbade publication by any of its employees.Following is an example of the use of the t-test in learning whether the average height of 8th-grade boys and girls differs in a particular school. The 8th grade in this school contains nearly 1,000 students, and there are insufficient resources or time to measure all of them. One approach is to measure the heights of a random sample of 8th graders and compare the average values of boys and girls, although the two averages are likely to

Referência(s)
Altmetric
PlumX