Carta Revisado por pares

Answering important questions reliably—GISSI Heart Failure, a factorially designed trial with composite (co)primary outcome measures

2004; Elsevier BV; Volume: 6; Issue: 5 Linguagem: Inglês

10.1016/j.ejheart.2004.06.003

ISSN

1879-0844

Autores

Nick Freemantle, Melanie Calvert,

Tópico(s)

Statistical Methods in Clinical Trials

Resumo

In 1984, Yusuf et al. 1 declared a need for large, simple randomised trials which they claimed would be capable of answering important clinical questions with a high degree of reliability. At that time, we awaited the completion and publication of ISIS-1 2 and the potential of such large-scale simple trials was yet to be established. In this issue of the journal the rationale and design for the latest large trial from the highly respected GISSI group 3, the GISSI Heart Failure study (GISSI-HF), is published, examining in a 2×2 factorial design the effectiveness of n−3 polyunsaturated fatty acids (n−3 PUFA) and rosuvastatin in moderate to severe heart failure. The justification for asking these important clinical questions is clear and the design of the trial is unambiguously stated, however, the design also raises important questions currently under wider debate and prompts us to question: how far has the potential for large simple trials been realised in practice= Factorially designed trials allow investigators to examine the effects of multiple treatments in a single trial and, crucially, the way effects are modified when treatments are used in combination (known statistically as interactions). Even when there is good reason to presume that the effects of two different treatments are unlikely to interact, it is always good practice to undertake a statistical test for interaction, examining the extent to which the effects of treatment differ in the presence of an alternative study treatment 4. Reassuringly, the protocol for GISSI-HF specifies that a test for interaction should be undertaken. It is well established that there is no such thing as a free lunch, even in statistics, and this is true of factorially designed trials. The trialists suggest that because they are asking two separate clinical questions from two separate randomisations, there is no need to adjust the primary outcomes for multiplicity between treatments, although the primary analyses consider the same outcomes assessed in largely the same population, and the investigators acknowledge the need to test statistically for an interaction and thus acknowledge the potential for a lack of independence between the treatments. Whilst it is common practice to ignore this multiplicity in factorially designed trials, it does mean that, since there are two tests and two opportunities for a positive outcome, the overall critical p value for the study (to be split amongst analyses) is not the conventional p=0.05, but instead approximately p=0.1. Within each treatment comparison there are two identified co-primary outcome measures; mortality and the composite of mortality and cardiovascular hospitalisation. It is hard to resist the conclusion that, in seeking a positive result, the trialists are hoping to achieve a mortality benefit, but are also including a softer and more common outcome (cardiovascular hospitalisation) alongside mortality which is more likely to achieve a significant outcome 5. The available statistical power is divided between the co-primary outcome measures to account for multiple testing 6 with most statistical power allocated to mortality (p=0.045) and the remainder allocated to the composite outcome. A more appropriate way to preserve an overall α spend of 5% (e.g., a conventional critical p value of 0.05) accounting for multiplicity of treatments may have been to divide this to 2.5% for each randomization, and further divide this between the dual primary outcomes (e.g., 2% for mortality and 0.5% for the composite outcome). Although a very useful discipline, particularly in a regulatory context, the arbitrary nature of dividing statistical power between primary outcome measures, the so-called α spending approach, is highlighted by the CAPRICORN study of carvedilol in patients with left ventricular dysfunction after acute myocardial infarction 7. Because the event rate was lower than anticipated at the planning stage, the Data Safety and Monitoring Committee recommended to the Trial Steering Committee that they consider changing the primary outcome measure from all cause mortality. The Steering Committee responded to this advice, by creating co-primary outcome measures, allocating the majority of the statistical power available to the new co-primary composite of all cause mortality and cardiovascular hospital admission to which they allocated a critical p value of 0.045. The remaining statistical power (p=0.005) was allocated to all cause mortality. In the event, the composite outcome of cardiovascular hospitalisation and all cause mortality did not show benefit for carvedilol (p=0.3), and the 3% difference in all cause mortality was not statistically significant given the allocation of statistical power between the co-primary outcome measures (with a p value for all cause mortality of p=0.03). It is ironic that, had the Steering Committee not taken the advice of the Data Safety and Monitoring Board, the trial would have been statistically significant. In fact, the correct interpretation of the results of CAPRICORN is that it provides a neutral result. It is salutary to recall that there is nothing magical about the conventional critical p value of 0.05, and indeed it is normal for licensing purposes to require two statistically significant phase 3 trials, giving a combined p value p<0.001. In a trial such as GISSI-HF with 7000 subjects a clinically important result should be supported by strong statistical evidence, and we should perhaps not be impressed unduly by effects that only just achieve conventional levels of statistical significance. Very strong statistical evidence for a primary outcome will not be affected qualitatively by the niceties of α spending, since p values of, say, p<0.001 on any of the outcomes would be statistically convincing. Furthermore, the interpretation of composite outcomes can be problematic. How would we interpret the results of GISSI-HF should the trial not prove significant on mortality, but prove significant on the composite outcome measure= Although GISSI-HF would be a positive trial, the victory would be hollow, since the only interpretation of the composite would be through the individual components 5. Yusuf et al. 1 suggested the importance of all cause mortality as an outcome measure, and there remains much to commend that point of view. Finally, whilst we recognise the importance of assessing the impact of n−3 PUFA and rosuvastatin on all cause mortality and cardiovascular morbidity GISSI-HF, like the GISSI-Prevenzione trial, does not attempt to assess the impact of therapy on patients health-related quality of life 8. The pleiotropic effects of statins and n−3 PUFA are increasingly recognised and whilst treatment with statins has been shown to be associated with a significant reduction in cardiovascular events in high-risk patients 9–12, the impact of statin use on cognition, mood, behaviour, and quality of life remains controversial 13 and the impact of n−3 PUFA on such outcomes is unknown. Findings from the Heart Protection study indicated no impact of simvastatin on cognition 12 although the limited generalizability of such findings has been suggested 14. Assessment of the impact of a treatment on both quality and quantity of life is particularly important in patients with chronic disease such as heart failure 15 and can be used to generate quality adjusted life years which may be used as a measure of clinical effectiveness and used in cost–utility analysis 16. Assessment of quality of life using a generic instrument such as the Euroqol EQ-5D 17,18 requires limited resources and is increasing recognised by health-policy makers such as NICE who state in their guidance to manufacturers and sponsors submitting technologies for appraisal: "Quality of Life data are generally regarded as more relevant in the treatment of chronic illness, but their collection is desirable in most circumstances." 19. This latest trial from the GISSI group addresses multiple important questions within a single trial population 4. Here, the trialists achieve support from a pool of industrial sponsors, which appears to allow the investigation of a less commercial treatment (n−3 PUFA) with one that is more likely to be of interest to industry (rosuvastatin). This sensible strategy may avoid potential bias, such as the inappropriately positive conclusions in support of a sponsor's product which has been noted elsewhere 20. The usefulness of large simple trials is now well established, and GISSI Heart Failure is a very good example. Challenges remain, particularly in avoiding being unduly impressed by marginally significant p values or indeed being impressed by p values at the expense of clinical importance! Composite clinical endpoints are a particular challenge to interpretation 5, and steps must be taken to avoid undue sponsor influence on the science and reporting of clinical trials 20. We await the results of this important trial with interest.

Referência(s)