Carta Revisado por pares

The Utility of Futility

2005; Lippincott Williams & Wilkins; Volume: 36; Issue: 11 Linguagem: Inglês

10.1161/01.str.0000185722.99167.56

ISSN

1524-4628

Autores

Bruce Levin,

Tópico(s)

Optimal Experimental Design Methods

Resumo

HomeStrokeVol. 36, No. 11The Utility of Futility Free AccessEditorialPDF/EPUBAboutView PDFView EPUBSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessEditorialPDF/EPUBThe Utility of Futility Bruce Levin, PhD Bruce LevinBruce Levin From the Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY. Originally published13 Oct 2005https://doi.org/10.1161/01.STR.0000185722.99167.56Stroke. 2005;36:2331–2332Other version(s) of this articleYou are viewing the most recent version of this article. Previous versions: October 13, 2005: Previous Version 1 See related article, pages 2410–2414.In this issue, Palesch et al1 discuss the single-arm, phase II futility study design and illustrate how its use might have avoided 3 large (and costly) but negative phase III therapeutic trials for ischemic stroke patients. The authors offer strong arguments to support their conclusion that use of this design as a strategy in phase II development "could permit the testing of a wider array of promising treatments at a fraction of the cost of taking all treatments directly to phase III trials." In a nutshell, they argue that there is utility in futility testing.Although common in early-phase oncology trials, the futility study (single- or double-armed) may be less familiar to readers of this journal, and careful scrutiny of the design, especially of the formulation of null and alternative hypotheses, is worthwhile. Briefly, in a futility study, the null hypothesis states that the experimental therapy is sufficiently promising to warrant definitive, phase III testing, whereas the alternative hypothesis states that the experimental therapy lacks the prespecified superiority. Thus, the futility design reverses the logical status of null and alternative hypotheses as most often formulated in the traditional efficacy design. Whereas in the latter design, sufficient evidence is required to declare a therapeutic effect statistically significant, in the futility design, there is a presumption of benefit, and sufficient evidence is required to declare a significant shortfall from that benefit, such that it would be futile to proceed to large-scale testing with the given therapy. The authors argue that this formulation, with its null presumption of benefit, is appropriate in phase II research on the grounds that of the 2 types of error that can be committed—declaring a truly superior therapy futile or declaring a truly nonsuperior therapy worthy of continued testing—the former is the more important. One should therefore view it as a type I error with appropriate control of the error rate (α). This the futility design accomplishes.I suspect the main attraction of the authors' proposal will be the relatively small number of patients required to conduct the experiment in comparison to the sample sizes required for the traditional phase III randomized, controlled trial. Indeed, the authors illustrate 5-fold, 10-fold, and even larger potential reductions in sample size. Three key elements serve to achieve this efficiency. The first is the one-sided nature of the hypotheses (the null states there is a real therapeutic benefit; the alternative denies this directional superiority). The second key element is the use of somewhat more liberal values of alpha, eg, 0.10 (1-tailed), compared with the conventional 0.05 level (2-tailed, or 0.025 1-tailed). This is arguably appropriate for phase II testing. The third key element, which is the most influential and perhaps the most controversial, is the use of only a single (experimental) arm. For this, one needs a clear clinical notion of benefit, based on historical-control data, to quantify the "minimally worthwhile improvement" required for the single-arm futility design.The authors focus on the single-arm design, but there is no intrinsic reason why the futility design cannot be applied with a concurrent control arm. Indeed, the ongoing NINDS-funded QALS study of high-dose coenzyme Q10 in patients with amyotrophic lateral sclerosis2 uses a 2-arm futility design. Although this is not the appropriate forum to debate the pros and cons of single-arm studies, the following points may help to avoid some pitfalls of the single-arm futility design:Sample size plays an important role in statistical power, as always, but here statistical power refers to the probability that a therapy that truly does not achieve the minimally worthwhile improvement (eg, something no better than a placebo) is actually deemed futile. Because a therapy will be deemed futile only if its outcomes are significantly worse than the prespecified minimally worthwhile improvement, care should be taken to use sample sizes sufficiently large so that the point at which significance (futility) is declared is no worse than the (historical control) placebo effect. If this were not the case, one could find oneself in the situation of failing to declare a therapy futile (and therefore worthy of continued testing) whose success rate is actually worse than the (historical control) placebo success rate. This embarrassment can be most easily avoided by requiring statistical power of 50% or more to declare a placebo-rate therapy futile.If the minimally worthwhile improvement is set too high, truly beneficial therapies may be deemed futile—an obvious point, perhaps, but worth keeping in mind. In the same vein, the authors point out that when designing a phase II futility study, investigators should choose a value of the minimally worthwhile improvement as close as possible to the "clinically meaningful effect size" they would use in the future phase III trial to provide a reasonable test of the futility hypothesis. This is sage advice, because if the minimally worthwhile improvement is set too low, a larger sample size would be required to keep the same rejection region of futility, or, if the sample size remained the same, as mentioned above, the chances increase of failing to reject the null hypothesis with a therapy that would otherwise have been ruled out as futile.Using a single-arm design risks a version of type I error that is not encompassed in the type I error rate (α level) of the futility design. If the placebo success rate in the current study population would be much lower than the historical control value used to determine the minimally worthwhile improvement for the futility study—because, say, the current population is at higher risk—then it is entirely possible for a treatment that would be truly worthwhile for further study in this population to be deemed futile in the single-arm futility study because the historical control rate is too high. In the opposite direction, use of a historical control success rate that is too low compared with what would be true in the current study population—because, say, the historical control studies are out of date and patients generally do better today than they used to—could lead to inefficacious treatments (for this population) being brought to phase III testing, thus thwarting the purpose of the phase II futility design. Some confidence that the historical control data apply to the current study population would thus appear necessary to avoid ambiguities in the interpretation of the study.This provocative article raises larger issues. One knotty problem is this: should researchers and funding agencies devote the time and resources to screen out futile therapies at all or should we move directly from early-phase research to more definitive phase III testing? In other words, are we wasting precious time in our search for effective therapies to conduct futility studies? If the patient pool and funding resources are each adequate, there may be some merit to proceeding directly to phase III testing. However, in disease domains where patients are rare, when funding is limited, or both, some organized screening program such as the authors suggest may be the more prudent and cost-effective approach.Viewing the futility design as a screening tool leads us to ask, in the language of the screening paradigm, what are the positive and negative predictive values of futility testing? If we define a "negative" result as a finding of futility and a "positive" result as a finding of "nonfutility" (more precisely, a failure to reject the null hypothesis of superiority), then negative predictive value refers to the proportion of all therapies deemed futile that are truly less efficacious than the minimally worthwhile improvement, and positive predictive value refers to the proportion of all therapies deemed "nonfutile" that truly exceed the minimally worthwhile improvement. Interestingly, the authors' collection of examples shows a high negative predictive value (of a finding of futility) but only a 1-in-3 positive predictive value (of a finding of nonfutility). Insofar as there is a high prior likelihood of futility (for neuroprotective agents at this point in time), a not-so-high sensitivity of 1−α=0.90 may still yield a high negative predictive value. However, positive predictive value may remain low. For example, an alpha of 0.10 and power (1−β=specificity) of 0.85 implies that a result of "nonfutility" may increase the prior odds on a worthwhile improvement only 6-fold, because (1−α)/β=0.90/0.15=6. If the prior odds on a worthwhile improvement are less than one to 6, the odds of bringing a truly worthwhile therapy into the phase III trial will still be less than 50/50. That may or may not be deemed acceptable, but my point is that, as attractive as the futility design is, it does not assure the identification of minimally worthwhile therapies.The futility design is set squarely within the paradigm of statistical hypothesis testing, yet it can be argued that its purpose is also to select which of several therapies to bring forward to phase III testing. Indeed, the NINDS Parkinson Disease Network is currently conducting just such a selection process using the futility testing approach.3 When viewed this way, another paradigm comes to mind, to wit, the "statistical selection" paradigm. At least 2 ongoing NINDS-funded phase II studies are using such techniques to select among several therapeutic doses, the previously mentioned QALS trial and the Phase 2B Study of Tenecteplase (TNK) in Acute Stroke (TNK-S2B). Statistical selection procedures also have much to offer in terms of reduced sample sizes, because they are less concerned with testing null hypotheses under tight control of type I error and more concerned with selecting the best of several competing therapies with a high probability of correct selection when there is a truly best therapy. The textbook by Bechhofer et al4 is a good source of information on such techniques.It is gratifying that the field of statistics continues to provide methods as innovative as that of Palesch et al. The authors are to be congratulated for their stimulating contribution.The opinions expressed in this editorial are not necessarily those of the editors or of the American Heart Association.FootnotesCorrespondence to Bruce Levin, PhD, Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, Room 626a, New York, NY 10032. E-mail [email protected] References 1 Palesch Y, Tilley BC, Sackett DL, Johnston KC, Woolson R. Applying a phase II futility study design to therapeutic stroke trials. Stroke. 2005; 36: 2410–2414.LinkGoogle Scholar2 Levy G, Kaufmann P, Buchsbaum R, Montes J, Barsdorf A, Arbing R, Battista V, Zhou X, Mitsumoto H, Levin B, Thompson JLP. A two-stage design for a phase II clinical trial of coenzyme Q10 in ALS. Neurology. In press.Google Scholar3 Elm JJ, Goetz CG, Ravina B, Shannon K, Wooten GF, Tanner CM, Palesch YY, Huang P, Guimaraes P, Kamp C, Tilley BC, Kieburtz K; NET-PD Investigators. A responsive outcome for Parkinson's disease neuroprotection futility studies. Ann Neurol. 2005; 57: 197–203.CrossrefMedlineGoogle Scholar4 Bechhofer RE, Santner TJ, Goldsman DM. Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons. New York: John Wiley & Sons; 1995.Google Scholar Previous Back to top Next FiguresReferencesRelatedDetailsCited By Gonzalez N, Jiang H, Lyden P, Song S, Schlick K, Dumitrascu O, Quintero-Consuegra M, Toscano J, Liebeskind D, Restrepo L, Rao N, Hinman J, Alexander M, Schievink W, Piantadosi S and Saver J (2020) Encephaloduroarteriosynangiosis (EDAS) revascularization for symptomatic intracranial atherosclerotic steno-occlusive (ERSIAS) Phase-II objective performance criterion trial, International Journal of Stroke, 10.1177/1747493020967256, 16:6, (701-709), Online publication date: 1-Aug-2021. Piantadosi S (2020) Highly efficient clinical trial designs for reliable screening of under-performing treatments: Application to the COVID-19 Pandemic, Clinical Trials, 10.1177/1740774520940227, 17:5, (483-490), Online publication date: 1-Oct-2020. Karvanen J and Sillanpää M (2016) Prioritizing covariates in the planning of future studies in the meta-analytic framework, Biometrical Journal, 10.1002/bimj.201600067, 59:1, (110-125), Online publication date: 1-Jan-2017. Mantell J, Cooper D, Exner T, Moodley J, Hoffman S, Myer L, Leu C, Bai D, Kelvin E, Jennings K, Stein Z, Constant D, Zweigenthal V, Cishe N and Nywagi N (2016) Emtonjeni—A Structural Intervention to Integrate Sexual and Reproductive Health into Public Sector HIV Care in Cape Town, South Africa: Results of a Phase II Study, AIDS and Behavior, 10.1007/s10461-016-1562-z, 21:3, (905-922), Online publication date: 1-Mar-2017. (2015) Pioglitazone in early Parkinson's disease: a phase 2, multicentre, double-blind, randomised trial, The Lancet Neurology, 10.1016/S1474-4422(15)00144-1, 14:8, (795-803), Online publication date: 1-Aug-2015. Rogatko A and Piantadosi S (2015) Problems with Constructing Tests to Accept the Null Hypothesis Interdisciplinary Bayesian Statistics, 10.1007/978-3-319-12454-4_4, (49-54), . Levin B (2015) The futility study—Progress over the last decade, Contemporary Clinical Trials, 10.1016/j.cct.2015.06.013, 45, (69-75), Online publication date: 1-Nov-2015. Berry J, Cudkowicz M and Shefner J (2014) Predicting success: Optimizing phase II ALS trials for the transition to phase III, Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 10.3109/21678421.2013.838969, 15:1-2, (1-8), Online publication date: 1-Mar-2014. Moore C, Schenkman M, Kohrt W, Delitto A, Hall D and Corcos D (2013) Study in Parkinson Disease of Exercise (SPARX): Translating high-intensity exercise from animals to humans, Contemporary Clinical Trials, 10.1016/j.cct.2013.06.002, 36:1, (90-98), Online publication date: 1-Sep-2013. Hart T and Bagiella E (2012) Design and Implementation of Clinical Trials in Rehabilitation Research, Archives of Physical Medicine and Rehabilitation, 10.1016/j.apmr.2011.11.039, 93:8, (S117-S126), Online publication date: 1-Aug-2012. Gordon P (2009) A placebo arm is not always necessary in clinical trials of amyotrophic lateral sclerosis, Muscle & Nerve, 10.1002/mus.21354, 39:6, (858-860), Online publication date: 1-Jun-2009. Simmons Z (2009) Can we eliminate placebo in ALS clinical Trials?, Muscle & Nerve, 10.1002/mus.21358, 39:6, (861-865), Online publication date: 1-Jun-2009. Kaufmann P, Thompson J, Levy G, Buchsbaum R, Shefner J, Krivickas L, Katz J, Rollins Y, Barohn R, Jackson C, Tiryaki E, Lomen-Hoerth C, Armon C, Tandan R, Rudnicki S, Rezania K, Sufit R, Pestronk A, Novella S, Heiman-Patterson T, Kasarskis E, Pioro E, Montes J, Arbing R, Vecchio D, Barsdorf A, Mitsumoto H and Levin B (2009) Phase II trial of CoQ10 for ALS finds insufficient evidence to justify phase III, Annals of Neurology, 10.1002/ana.21743, 66:2, (235-244), Online publication date: 1-Aug-2009. Gordon P, Cheung Y, Levin B, Andrews H, Doorish C, Macarthur R, Montes J, Bednarz K, Florence J, Rowin J, Boylan K, Mozaffar T, Tandan R, Mitsumoto H, Kelvin E, Chapin J, Bedlack R, Rivner M, Mccluskey L, Pestronk A, Graves M, Sorenson E, Barohn R, Belsh J, Lou J, Levine T, Saperstein D, Miller R, Scelsa S and The Combination Drug Selection Tria (2009) A novel, efficient, randomized selection trial comparing combinations of drug therapy for ALS, Amyotrophic Lateral Sclerosis, 10.1080/17482960802195632, 9:4, (212-222), Online publication date: 1-Jan-2008. Kaufmann P and Muntoni F (2007) Issues in SMA clinical trial design, Neuromuscular Disorders, 10.1016/j.nmd.2006.12.001, 17:6, (499-505), Online publication date: 1-Jun-2007. Cummings J (2006) Challenges to demonstrating disease‐modifying effects in Alzheimer's disease clinical trials, Alzheimer's & Dementia, 10.1016/j.jalz.2006.07.001, 2:4, (263-271), Online publication date: 1-Oct-2006. November 2005Vol 36, Issue 11 Advertisement Article InformationMetrics https://doi.org/10.1161/01.STR.0000185722.99167.56PMID: 16224096 Manuscript receivedAugust 15, 2005Manuscript acceptedAugust 15, 2005Originally publishedOctober 13, 2005 Keywordsfutility studiesstrokePDF download Advertisement

Referência(s)
Altmetric
PlumX