Carta Acesso aberto Revisado por pares

Are Unadjusted Analyses of Clinical Trials Inappropriately Biased Toward the Null?

2009; Lippincott Williams & Wilkins; Volume: 40; Issue: 3 Linguagem: Inglês

10.1161/strokeaha.108.532051

ISSN

1524-4628

Autores

David M. Kent, Thomas A Trikalinos, Michael D. Hill,

Tópico(s)

Meta-analysis and systematic reviews

Resumo

HomeStrokeVol. 40, No. 3Are Unadjusted Analyses of Clinical Trials Inappropriately Biased Toward the Null? Free AccessEditorialPDF/EPUBAboutView PDFView EPUBSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessEditorialPDF/EPUBAre Unadjusted Analyses of Clinical Trials Inappropriately Biased Toward the Null? David M. Kent, Thomas A. Trikalinos and Michael D. Hill David M. KentDavid M. Kent From the Center for Predictive Medicine Research (D.M.K., T.A.T.), the Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Mass; and Foothills Hospital (M.D.H.), University of Calgary, Calgary, AB, Canada. , Thomas A. TrikalinosThomas A. Trikalinos From the Center for Predictive Medicine Research (D.M.K., T.A.T.), the Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Mass; and Foothills Hospital (M.D.H.), University of Calgary, Calgary, AB, Canada. and Michael D. HillMichael D. Hill From the Center for Predictive Medicine Research (D.M.K., T.A.T.), the Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, Mass; and Foothills Hospital (M.D.H.), University of Calgary, Calgary, AB, Canada. Originally published22 Jan 2009https://doi.org/10.1161/STROKEAHA.108.532051Stroke. 2009;40:672–673Other version(s) of this articleYou are viewing the most recent version of this article. Previous versions: January 22, 2009: Previous Version 1 See related article, pages 888–894.One of the delights of clinical practice has proven to be a major nuisance for clinical research: patients are nonidentical. Indeed, patients have multiple characteristics that influence the likelihood of the outcome of a disease, which can make it difficult in the extreme to accurately discern the effects of therapy from casual clinical experience or even careful observational studies. Randomization, a process by which patients are assigned to a treatment arm by chance rather than by choice, was a brilliant innovation that has made possible causal inferences regarding a treatment's effect. Although randomization is not perfect in practice, it is remarkably effective at ensuring the comparability between treatment groups, so much so that it has almost tricked us into thinking that patient differences in outcome risks have been rendered irrelevant in the context of clinical trials.However, randomization only ensures similarity of the outcome risks between treatment groups; it does nothing to mitigate the between-patient differences in outcome risks within treatment groups. These differences can lead to clinically important differences in treatment effects across patients such that the summary results of a trial may not apply to all, or even most, patients in the trial.1,2 Heterogeneity of risk can create an even more fundamental, and even less appreciated, problem with the summary results of clinical trials: Even in the absence of any heterogeneity of treatment effect (ie, when all patients get an identical treatment benefit), and in the absence of confounding and bias, risk heterogeneity can still play a very mischievous role such that an unadjusted (crude) analysis may be both inefficient and yield an inaccurate estimate of the summary treatment effect.In this issue of Stroke, Gray et al perform one of the more comprehensive analyses of the effects of risk adjustment on statistical power and sample size requirements using the unique Optimizing Acute Stroke Trials (OAST) database. Using 23 different trials that provide data on baseline characteristics, and have a nonneutral treatment effect, they find a consistent increase in the statistical power or (alternatively) a consistent decrease in the sample size required comparing risk-adjusted analysis with conventional (unadjusted) analysis.This study adds to the growing literature showing that risk-adjusted analyses can make trials more efficient, reducing the required sample size on the order of 15% to 30%.3–9 This effect is not widely understood and has been attributed to an increase in "precision" or a reduction in variance. However, across these studies, there has also been a consistent change in the magnitude of the estimated treatment effect in the risk-adjusted compared with the crude analysis; the risk-adjusted OR,5,6,8,9 or hazard ratio,7 always shows a larger treatment effect than the crude analysis.Given these results, one might expect that routine risk adjustment of clinical trial results will be taken up immediately, because pharmaceutical companies (and even academics) are not exactly well known for ignoring trial costs and even less for biasing their trials toward the null. Yet, there remain barriers to routine risk-adjusted analyses, which are more complex and less transparent. A crude analysis relies simply on counting those with and without the outcome in each arm of the study and reporting the ratio; how could this be biased? On the other hand, the results of a risk-adjusted analysis are conditional on the selected covariates; how can we trust study results when the outcome of the analysis depends on the particular variables the investigators decide to control for? Surely this must increase, not decrease, the opportunity for bias.In fact, reporting a crude OR or a crude hazard ratio to summarize a treatment effect is arguably inappropriate, because these measures have a property referred to as noncollapsibility. That is, the OR for the total cohort will not be a weighted average of the stratum-specific ORs.10 This is true even in the simplest example when all patients, regardless of risk, experience a consistent treatment effect. Such an example is shown in the Table, which depicts a trial enrolling patients who belong to 4 different risk/severity strata for whom treatment yields a consistent improvement of their odds of a good outcome by 50% (ie, a uniform OR of 1.5 for all risk strata). Surprisingly, if one calculates the OR for the overall results, one would find that treatment increased the odds of a good outcome by only 38%. This represents an underestimation of the within-stratum treatment effect of almost 25%. Table. Unadjusted Summary Results Underestimate Stratum-Specific Odds Ratios Even When Treatment Benefit Is UniformRisk/Severity StrataGood Functional OutcomeORControl RateTreatment RateSevere stroke20.0%27.3%1.540.0%50.0%1.560.0%69.2%1.5Mild stroke80.0%85.7%1.5Summary of trial result50.0%58.1%1.38The property of noncollapsibility is related to the nonlinearity of the OR. It has been shown mathematically that, in the presence of heterogeneity, the crude OR will always be more conservative (ie, closer to 1) than the within strata OR.11,12 If, for example, the outcome rates in the Table represented bad outcomes instead of good outcomes, and the within-strata treatment effect was 0.75 (a 25% reduction in the odds of the outcome), the crude OR would be 0.79 (a 21% reduction in the odds of the outcome). Risk adjustment corrects this "foreshortening" of the crude OR-based treatment effect by comparing like to like.When is noncollapsibility important to consider? First, it is only an issue when the measure of effect is nonlinear like with OR and hazard ratio; thus, when relative risk is the effect measure, it is not an issue. Second, when outcome rates within all strata are low, the effect will be negligible. However, when the outcome rates within one or more strata are high, as is typical in stroke, this effect can be substantial. Outcome rates within strata as high as those shown in the Table are not unusual for stroke. According to the Stroke-Thrombolytic Predictive Instrument (TPI),13 the probability of a good outcome (modified Rankin Scale ≤1) would be approximately 80% for patients who are male and 60 years old with a National Institutes of Health Stroke Severity score of 5 or 6. Indeed, among patients enrolled in thrombolytic trials, the expected control outcome rate in the quintile of patients with the best prognosis is approximately 70%.13 Still, even in trials in which the average outcome rates are relatively low, as in many cardiovascular trials, the presence of a group at high risk for the outcome can cause the crude OR to be conservative compared with the risk-adjusted OR.When the crude OR differs from the risk-adjusted OR, which one should be preferred? From the perspective of trial efficiency, it has been demonstrated consistently that risk adjustment leads to diminished sample size requirements. In terms of transparency, some might argue that using the group results is the most simple and understandable approach. However, this point is certainly arguable, because the treatment effect estimated based on group averages is conditional on the degree of heterogeneity in the sample. Even where all patients get the same treatment effect (as defined by the OR), a large, simple, broadly inclusive clinical trial will counterintuitively yield a more modest treatment effect estimate than the average of several clinical trials targeted to specific risk groups. The risk-adjusted analysis also has the advantage of estimating the more clinically relevant patient-level effect size; in simulations, which permit one to specify a "true" treatment effect, results show that a crude analysis will consistently underestimate this "true" effect, whereas adjusted analyses are more accurate.6,7On the other hand, real life is more complicated than our example and other simulations, and one cannot automatically ascribe discrepancies between the risk-adjusted and crude effects to noncollapsibility alone. After all, despite randomization, residual imbalances across treatment arms may persist for both observed and unknown factors alike. While risk adjusting rebalances for the known factors, its effect in any given trial on the myriad unknown factors that influence outcomes remains beyond scrutiny as does the influence of these unknown factors on the treatment effect. Although this should not systematically introduce new biases, it would seem that using nonlinear measures of effect, which avoids the issues of noncollapsibility and thus the need to risk-adjust, may be the best choice, but this may not always be an available option, especially in time-to-event analyses.The problem of noncollapsibility in effect measures used in clinical trials remains underappreciated, its causes buried rather deeply in the literature. We have found that even experienced statisticians are frequently surprised and delighted when confronted with the paradox shown in the Table. However, can this statistical parlor trick really have important consequences in the outcome of clinical trials; can it explain in part the lack of progress in stroke trials? Although poor translation in stroke therapeutics clearly has other important causes,14 it is becoming apparent that inefficiencies in effect measures may not be trivial and that failure to account for risk heterogeneity when using nonlinear effect measures may be important.The opinions in this editorial are not necessarily those of the editors or of the American Heart Association.Source of FundingDrs Kent and Trikalinos are partially supported by a grant of the National Institute of Health (NIH/NCRR 1UL1 RR025752).DisclosuresNone.FootnotesCorrespondence to David M. Kent, MD, MS, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, #63, 800 Washington Street, Boston, MA 02111. E-mail [email protected] References 1 Kent DM, Hayward RA. Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. JAMA. 2007; 298: 1209–1212.CrossrefMedlineGoogle Scholar2 Kent DM, Hayward RA. When averages hide individual differences in clinical trials. Am Scientist. 2007; 95: 60–68.CrossrefGoogle Scholar3 Gray LJ, Bath P, Collier T. Should stroke trials adjust for functional outcome for baseline prognostic factors? Stroke. 2009; 40: 888–894.LinkGoogle Scholar4 Choi SC. Sample size in clinical trials with dichotomous endpoints: use of covariables. Journal of Biopharmaceutical Statistics. 1998; 8: 367–375.CrossrefMedlineGoogle Scholar5 Johnston KC, Connors AF, Wagner DP, Haley EC. Risk adjustment effect on stroke clinical trials. Stroke. 2004; 35: e43–e45.LinkGoogle Scholar6 Hernandez AV, Steyerberg EW, Butcher I, Mushkudiani N, Taylor GS, Murray GD, Marmarou A, Choi SC, Lu J, Habbema JDF, Maas AIR. Adjustment for strong predictors of outcome in traumatic brain injury trials: 25% reduction in sample size requirements in the IMPACT study. J Neurotrauma. 2006; 23: 1295–1303.CrossrefMedlineGoogle Scholar7 Hernandez AV, Eijkemans MJC, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: how much does prespecified covariate adjustment increase power? Ann Epidemiol. 2006; 16: 41–48.CrossrefMedlineGoogle Scholar8 Hernandez AV, Steyerberg EW, Habbema JDF. Clinical trials with dichotomous end-points: covariate adjustment increases power and potentially reduces sample size. J Clin Epidemiol. 2004; 57: 454–460.CrossrefMedlineGoogle Scholar9 Steyerberg EW, Bossuyt PMM, Lee KL. Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics? Am Heart J. 2000; 139: 745–751.CrossrefMedlineGoogle Scholar10 Greenland S. Interpretation and choice of effect measures in epidemiologic analysis. Am J Epidemiol. 1987; 125: 761–768.CrossrefMedlineGoogle Scholar11 Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika. 1984; 71: 431–444.CrossrefGoogle Scholar12 Doi M, Nakamura T, Yomamoto E. Conservative tendency of the crude odds ratio. J Japan Statist Soc. 2001; 31: 53–65.CrossrefGoogle Scholar13 Kent DM, Selker HP, Ruthazer R, Bluhmki E, Hacke W. The stroke–thrombolytic predictive instrument: a predictive instrument for intravenous thrombolysis in acute ischemic stroke. Stroke. 2006; 37: 2957–2962.LinkGoogle Scholar14 Savitz SI, Fisher M. Future of neuroprotection for acute stroke: in the aftermath of the SAINT trials. Ann Neurol. 2007; 61: 396–402.CrossrefMedlineGoogle Scholar Previous Back to top Next FiguresReferencesRelatedDetailsCited By Choi Y, Fineberg M and Kassavou A (2023) Effectiveness of Remote Interventions to Improve Medication Adherence in Patients after Stroke: A Systematic Literature Review and Meta-Analysis, Behavioral Sciences, 10.3390/bs13030246, 13:3, (246) Reitz K, Althouse A, Forman D, Zuckerbraun B, Vodovotz Y, Zamora R, Raffai R, Hall D and Tzeng E (2023) MetfOrmin BenefIts Lower Extremities with Intermittent Claudication (MOBILE IC): randomized clinical trial protocol, BMC Cardiovascular Disorders, 10.1186/s12872-023-03047-8, 23:1 Dwivedi A (2023) How to Write Statistical Analysis Section in Medical Research, Journal of Investigative Medicine, 10.1136/jim-2022-002479, 70:8, (1759-1770), Online publication date: 1-Dec-2022. Reitz K, Althouse A and Sperry J (2021) Randomized Controlled Trials: Informing Clinical Practice for Traumatically Injured Patients Trauma Induced Coagulopathy, 10.1007/978-3-030-53606-0_40, (679-692), . Cook J, Julious S, Sones W, Hampson L, Hewitt C, Berlin J, Ashby D, Emsley R, Fergusson D, Walters S, Wilson E, MacLennan G, Stallard N, Rothwell J, Bland M, Brown L, Ramsay C, Cook A, Armstrong D, Altman D and Vale L (2019) Practical help for specifying the target difference in sample size calculations for RCTs: the DELTA2 five-stage study, including a workshop, Health Technology Assessment, 10.3310/hta23600, 23:60, (1-88) Spiegel R, Donnelly J and Radecki R (2019) Over-EXTENDing the Window for Thrombolytic Therapy in Cerebrovascular Accident, Annals of Emergency Medicine, 10.1016/j.annemergmed.2019.07.010, 74:3, (457-461), Online publication date: 1-Sep-2019. Huls G, Suciu S, Wijermans P, Kicinski M and Lübbert M (2019) 10-day vs 5-day decitabine: equivalence cannot be concluded, The Lancet Haematology, 10.1016/S2352-3026(19)30024-9, 6:4, (e177), Online publication date: 1-Apr-2019. Su Y and Lee W (2016) False Appearance of Gene–Environment Interactions in Genetic Association Studies, Medicine, 10.1097/MD.0000000000002743, 95:9, (e2743), Online publication date: 1-Mar-2016. Kent D, Dahabreh I, Ruthazer R, Furlan A, Reisman M, Carroll J, Saver J, Smalling R, Jüni P, Mattle H, Meier B and Thaler D (2016) Device Closure of Patent Foramen Ovale After Stroke, Journal of the American College of Cardiology, 10.1016/j.jacc.2015.12.023, 67:8, (907-917), Online publication date: 1-Mar-2016. Egbewale B, Lewis M and Sim J (2014) Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study, BMC Medical Research Methodology, 10.1186/1471-2288-14-49, 14:1, Online publication date: 1-Dec-2014. Ciolino J, Martin R, Zhao W, Jauch E, Hill M and Palesch Y (2013) Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data, Journal of Biopharmaceutical Statistics, 10.1080/10543406.2013.834912, 23:6, (1383-1402), Online publication date: 2-Nov-2013. Chan W and Redelmeier D (2013) Authors' Reply, The American Journal of Cardiology, 10.1016/j.amjcard.2012.09.004, 111:2, (303-304), Online publication date: 1-Jan-2013. Roetzheim R, Freund K, Corle D, Murray D, Snyder F, Kronman A, Jean-Pierre P, Raich P, Holden A, Darnell J, Warren-Mears V and Patierno S (2012) Analysis of combined data from heterogeneous study designs: an applied example from the patient navigation research program, Clinical Trials, 10.1177/1740774511433284, 9:2, (176-187), Online publication date: 1-Apr-2012. Saver J (2011) Optimal End Points for Acute Stroke Therapy Trials, Stroke, 42:8, (2356-2362), Online publication date: 1-Aug-2011. March 2009Vol 40, Issue 3 Advertisement Article InformationMetrics https://doi.org/10.1161/STROKEAHA.108.532051PMID: 19164784 Originally publishedJanuary 22, 2009 Keywordsclinical trialsstatistical analysisPDF download Advertisement

Referência(s)