Revisão Revisado por pares

Why Olanzapine Beats Risperidone, Risperidone Beats Quetiapine, and Quetiapine Beats Olanzapine: An Exploratory Analysis of Head-to-Head Comparison Studies of Second-Generation Antipsychotics

2006; American Psychiatric Association; Volume: 163; Issue: 2 Linguagem: Inglês

10.1176/appi.ajp.163.2.185

ISSN

1535-7228

Autores

Stephan Heres, John M. Davis, Katja Maino, Elisabeth Jetzinger, Werner Kissling, Stefan Leucht,

Tópico(s)

Electroconvulsive Therapy Studies

Resumo

Back to table of contents Previous article Next article Reviews and OverviewsFull AccessWhy Olanzapine Beats Risperidone, Risperidone Beats Quetiapine, and Quetiapine Beats Olanzapine: An Exploratory Analysis of Head-to-Head Comparison Studies of Second-Generation AntipsychoticsStephan Heres, M.D., John Davis, M.D., Katja Maino, M.D., Elisabeth Jetzinger, M.D., Werner Kissling, M.D., and Stefan Leucht, M.D.Stephan Heres, M.D., John Davis, M.D., Katja Maino, M.D., Elisabeth Jetzinger, M.D., Werner Kissling, M.D., and Stefan Leucht, M.D.Published Online:1 Feb 2006https://doi.org/10.1176/appi.ajp.163.2.185AboutSectionsView articleAbstractPDF/EPUB ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InEmail View articleAbstractOBJECTIVE: In many parts of the world, second-generation antipsychotics have largely replaced typical antipsychotics as the treatment of choice for schizophrenia. Consequently, trials comparing two drugs of this class—so-called head-to-head studies—are gaining in relevance. The authors reviewed results of head-to-head studies of second-generation antipsychotics funded by pharmaceutical companies to determine if a relationship existed between the sponsor of the trial and the drug favored in the study's overall outcome. METHOD: The authors identified head-to-head comparison studies of second-generation antipsychotics through a MEDLINE search for the period from 1966 to September 2003 and identified additional head-to-head studies from selected conference proceedings for the period from 1999 to February 2004. The abstracts of all studies fully or partly funded by pharmaceutical companies were modified to mask the names and doses of the drugs used in the trial, and two physicians blinded to the study sponsor reviewed the abstracts and independently rated which drug was favored by the overall outcome measures. Two authors who were not blinded to the study sponsor reviewed the entire report of each study for sources of bias that could have affected the results in favor of the sponsor's drug. RESULTS: Of the 42 reports identified by the authors, 33 were sponsored by a pharmaceutical company. In 90.0% of the studies, the reported overall outcome was in favor of the sponsor's drug. This pattern resulted in contradictory conclusions across studies when the findings of studies of the same drugs but with different sponsors were compared. Potential sources of bias occurred in the areas of doses and dose escalation, study entry criteria and study populations, statistics and methods, and reporting of results and wording of findings. CONCLUSIONS: Some sources of bias may limit the validity of head-to-head comparison studies of second-generation antipsychotics. Because most of the sources of bias identified in this review were subtle rather than compelling, the clinical usefulness of future trials may benefit from minor modifications to help avoid bias. The authors make a number of concrete suggestions for ways in which potential sources of bias can be addressed by study initiators, peer reviewers of studies under consideration for publication, and readers of published studies.A scientific debate about the effectiveness of second-generation antipsychotics, compared to conventional antipsychotics, has been going on for several years. Although all questions have not as yet been answered, second-generation antipsychotics are now defined as the gold standard in most aspects of treatment, at least in highly industrialized countries. As a result, so-called head-to-head comparisons, i.e., randomized, controlled clinical trials with two or more active second-generation antipsychotic comparators, have become increasingly important as new drugs enter the market.Somewhat confusing is the fact that different trials comparing the same two drugs have had contradictory conclusions (1, 2). This effect may not be totally unrelated to the funding sources of the trials. Conflicts of interest arising from a pharmaceutical company's sponsorship of clinical trials of a drug it manufactures are obvious (3), and the association of funding and conclusions is found in numerous medical specialties (4). In this article, we present a summary of head-to-head comparison studies in psychiatry in which we focus on various aspects of potential bias that may arise from such conflicts of interest. To our knowledge, this work is the first examination of potential bias related to study sponsorship of head-to-head comparison studies of antipsychotic medications. We also examined the association of the conclusions of head-to-head comparison studies with the source of funding. Consequently this study is not a review or a meta-analysis in which the efficacy or tolerability of different second-generation antipsychotics is examined but an exploratory approach to clarifying partly contradictory study results in the field of schizophrenia treatment.MethodSearch StrategyWe searched MEDLINE (1966–September 2003) for randomized, controlled trials comparing the second-generation antipsychotics aripiprazole, amisulpride, clozapine, olanzapine, quetiapine, risperidone, sertindole, and ziprasidone. The search terms were paired combinations of the second-generation antipsychotics and the term "rand*" (for "random," "randomized," etc.). We excluded reviews, meta-analyses, reports focused solely on laboratory or electrophysiological data, trials with combined drug treatment, and reports on patient populations with diagnoses other than schizophrenia or schizoaffective disorder. Reports on drug efficacy were considered to be the primary publication of a trial, unless the abstract stated otherwise. Secondary publications were excluded in order to avoid multiple inclusions of the source trial in the analysis. We also screened proceedings of selected conferences for the period from 1999 to February 2004. The conference reports we reviewed were limited to materials from events attended by members of our work group.Blinded Rating of AbstractsOn the basis of the hypothesis that funding by a pharmaceutical company may influence the outcome of a trial, we checked the reports for information on sponsorship by a "profit-making organization." The abstract of each study was modified to mask the names and doses of the drugs used in the trial, and two physicians (a psychiatrist [K.M.] and an internist [E.J.]), both of whom were blinded to the funding source for the trial and were not involved in the design of the evaluation, read the complete abstract and rated which drug was favored in the overall conclusion. The ratings were made on a 6-point scale proposed by Gilbert et al. (5) and previously used in studies evaluating the association of funding and conclusions in drug trials (4, 6). The scoring method is described in the footnote to Figure 1. For blinding, the second-generation antipsychotic names in the abstracts were replaced by "DRUG A" and "DRUG B" ("DRUG A" was not always the sponsor's drug and vice versa), and the total dose/dose range was replaced by "x." A separate sensitivity analysis that included only peer-reviewed publications was carried out. Two-sided binomial sign tests were used to test the hypothesis of potential influence of the sponsor on the study outcome, and Cohen's kappa was used for measuring interrater reliability. Statistical significance was defined at an alpha level of <0.05.Identifying Potential Sources of BiasThe trial reports were read independently by two authors who were not blinded to the sponsor of the trial (S.H., S.L.) to identify potential sources of bias that could have influenced the results in favor of the sponsor's drug. We focused on several factors that have been discussed as potential sources of bias, including features of study design, dose ranges, titration schedules, statistics, reporting of results, and wording of findings (4, 7, 8). If the conclusions of the two reviewers differed, consensus was achieved by discussion. The second author (J.D.) checked and approved the findings. As a reference for dose ranges, we used the following range recommendations included in the American Psychiatric Association Practice Guideline for the Treatment of Patients With Schizophrenia, second edition (9): 10–30 mg/day of aripiprazole, 150–600 mg/day of clozapine, 10–30 mg/day of olanzapine, 300–800 mg/day of quetiapine, 2–8 mg/day of risperidone, 120–200 mg/day of ziprasidone, and 5–20 mg/day of haloperidol. For amisulpride, we used the following dose ranges suggested in the drug company's product information: 400–800 mg/day for acutely ill patients and 50–300 mg/day for patients with predominantly negative symptoms.ResultsSearch ResultsFrom 146 publications found in the MEDLINE search, we excluded 61 reviews, 22 reports of additional data from previously published trials or preliminary results, 17 reports of laboratory or electrophysiological data, five reports of add-on therapy with other drugs, four reports on alternative diagnoses, 11 reports of studies that did not include a direct head-to-head comparison, and one report on combined antipsychotic treatment, which left 25 publications for analysis. The complete trial report for one of the 25 publications could not be obtained, and that study was excluded. Thirteen conference presentations of head-to-head drug comparisons were identified, and during the analysis, another four publications and one report in press were identified, for a total of 42 trial reports. Of the 42 reports, 32 were fully or partly funded by a pharmaceutical company that manufactured one of the drugs used in the trial (1, 2, 10–39). One of the 42 studies was conducted with supplemental funding from a pharmaceutical company, although the acquisition and reporting of the data were implemented with no input from the company (40); this study was not included in the blinded rating of abstracts, but it was included in the analysis of sources of bias. Nine of the 42 studies were not funded by a pharmaceutical company (41–49). Two reports of sponsored studies did not include an abstract (10, 36). Thus, 30 trials were included in the blinded rating of study abstracts.Sponsorship and Outcome as Reported in Study AbstractsAccording to the ratings by the two physicians, the overall outcome reported in the study abstracts was in favor of the sponsor's drug in 90.0% of the abstracts (N=27 of 30) (p<0.001, binomial sign test) (Figure 1). For each abstract, the scores of the two raters were the same or differed by only 1 point, and the two raters did not differ in whether the outcome was judged to be in favor of the sponsor's drug (a score of 4, 5, or 6) or the comparator (a score of 1, 2, or 3). According to the criteria of Landis and Koch (50), the interrater agreement was "moderate" (kappa=0.44, p≤0.001) for the numeric rating and "almost perfect" (kappa=1.0, p<0.001) for the outcome category. Figure 1 shows the distribution of the scores for both raters. In the sensitivity analysis that included only the abstracts that underwent peer review (N=21), the result was virtually identical, with 90.5% (19 of 21) rated as having an outcome in favor of the sponsor's drug (p<0.001, binomial sign test). The interrater agreement was "substantial" (kappa=0.61, p<0.001) for the numeric rating and "almost perfect" (kappa=1.0, p<0.001) for the outcome category. Table 1 summarizes the ratings for studies comparing pairs of drugs by whether one or the other manufacturer sponsored the study. Only three of these 21 reports did not favor the sponsor's drug. These pairwise comparisons revealed contrasting outcomes, depending on the sponsor of the study, although the outcomes were derived from trials involving the same two drugs.Possible Effects of Sponsorship on Trial Outcome and ReportingTwo authors who were not blinded to the sponsor of the trial reviewed the study reports and identified potential sources of bias in the following areas: dose and dose escalation, entry criteria and study population, statistics and methods, and reporting and wording of results. The characteristics of the individual trials and the potential sources of bias are summarized in a separate table (Data Supplement 1) available from the first author and available with the online version of this article at http://ajp.psychiatryonline.org. We identified potential sources of bias as debatable or clear. For example, in several instances, we identified debatable sources of bias in dose ranges for risperidone, for which the appropriate range may still be arguable. We identified clear sources of bias in instances involving obviously inappropriate choices of dose, design, reporting, etc. We emphasize that although at least some of the biases we identified seemed very obvious, our analysis remains speculative, and there is no proof that the factors we identified really influenced the results. The biases we identified are described in the following sections.Doses and Dose EscalationDose ranges and dose escalation are crucial factors that potentially influence trial outcome. In numerous trials, dose ranges are scheduled according to the manufacturer's package insert, which is problematic with antipsychotic drugs. For example, in trials with risperidone, doses up to 10 mg/day or even 12 mg/day are frequently possible in flexible titration schedules, although this dosing level may diminish both the efficacy and tolerability of the drug. After the introduction of risperidone to the market, several studies in the mid-1990s yielded evidence of an optimal dose range of 4–8 mg/day, with an increasing risk of extrapyramidal side effects at higher doses without any gain in efficacy (51, 52). At the time of the earliest studies included in this summary (1), these data were presumably not yet accessible, but in more recent trials, the dose ranges should have been adapted to maintain a fair level of comparison. Trials that did not include the 4-mg/day dose, recently referred to as the advisable dose (53), and trials that allowed doses of up to 12 mg/day (10, 12, 34, 37, 40) are problematic. Choosing 4 mg/day as the lower limit of the dose range is also problematic, as downward dose adjustment in case of side effects is not possible. Although a dose range of 2–6 mg/day was used in trials sponsored by the manufacturer (2, 18), and even lower doses were used in elderly patients with schizophrenia in trials sponsored by the manufacturer (19, 21), competitors consistently used higher doses.Dose ranges are also problematic in comparisons involving other drugs. Dose ranges of clozapine, especially in trials that included patients with treatment-resistant schizophrenia, often appear to be too strictly limited (53), resulting in relatively low mean daily doses (<400 mg/day) (13, 14, 39). These levels are in contrast to data revealing that doses up to 600 mg/day (54) or even 900 mg/day (55, 56) of clozapine proved highly efficacious in treatment-resistant schizophrenia. In comparisons involving olanzapine, the upper limit of the dose range is often set at 15 mg/day (16, 20, 38), thus excluding the most effective 20-mg/day dose. Use of this limited dose range possibly reduces olanzapine's efficacy and may result in a misleading conclusion of the competitor's therapeutic superiority or equality. The optimum dose range of amisulpride in patients with predominantly negative symptoms ranges from 50 to 300 mg/day (57), but in a study comparing amisulpride with another antipsychotic, it should have been ensured that the patients did not have significant positive symptoms at study entry because higher amisulpride doses (400–800 mg/day) are necessary for treatment of positive symptoms (30).Finding the optimum dose escalation schedules for both compounds in a study is difficult and may be another source of bias (2, 12, 16, 18–20, 24, 28, 34, 40, 58). In some cases, the bias may derive from the fact that titration is mandatory for some drugs (risperidone, clozapine, sertindole), while the comparator (for example, olanzapine) does not require a stepwise dose escalation. Slow titration can prolong the time to the full onset of the therapeutic effect of a drug, and the optimal dose of the comparator may be reached earlier. This difference plays a major role in studies evaluating efficacy over a brief period of time. On the other hand, side effects might be more likely to appear with fast dose escalation. The attempt to escape the escalation problem by using a fixed-dose regimen raises other problems. Studies with fixed-dose regimens lack naturalistic plausibility because the unrealistic limits imposed do not reflect the therapeutic flexibility required in the treatment of schizophrenia (16, 23, 32, 33, 44, 45).Entry Criteria and Study PopulationBecause the second-generation antipsychotics became available on the market one by one over the last decade, a trial's entry criteria with respect to previous drug treatment have to be chosen carefully. Risperidone had been in use for more than 5 years when newer drugs such as amisulpride (32, 37), quetiapine (24, 29), olanzapine (17), sertindole (11), and ziprasidone (10) became comparators in trials. Exclusion of patients who previously were nonresponders to risperidone or any other comparator (16) is seldom explicitly stated in reports of head-to-head trials, although this feature could have a critical effect on observations of the efficacy of or response to antipsychotic treatment.For trials involving schizophrenic patients with predominantly negative symptoms, questions about the accurate definition of the study population may be raised. Even if appropriate scales for measuring negative symptoms, such as the negative syndrome subscale of the Positive and Negative Syndrome Scale (PANSS) or the Scale for the Assessment of Negative Symptoms, are applied, there is still the need for information on positive symptoms, as they might also be present at study entry. An entry criterion of a difference of 6 points between the PANSS negative and positive subscale scores may ensure that subjects have a predominance of negative symptoms, but it leaves room for speculation about the effect of positive symptoms if baseline information about positive symptoms is not presented (30). Correspondingly, in trials involving patients with treatment-resistant illness, transparent criteria for inclusion and exclusion of participants are also required (54), although no universally accepted definition of treatment-resistant schizophrenia exists (59). Studies in which antipsychotic treatment nonresponse and intolerance are allowed as alternative entry criteria (14) may have results that are difficult to interpret. If results derived from such studies are presented in terms of efficacy in treatment-resistant patients, even if the study is not explicitly focused on this population, misunderstandings are foreseeable (13).Statistics and MethodsIn recent years, studies with a noninferiority design have become a reasonable alternative to placebo-controlled trials for comparison of the efficacy of antipsychotic agents (60). In a study designed to prove a drug's superiority over an active comparator, large sample sizes are usually required. However, equivalence can be shown in a one-sided noninferiority design with less effort, depending on the predicted threshold for equivalence, although it is important to note that in a noninferiority design with a narrow range of equivalence, the sample size required may exceed that necessary for a superiority design. Consequently, a basic requirement is to define a priori the extent of the difference between the treatments that is considered acceptable for declaring noninferiority (61). It seems very arguable to assume an equivalent antipsychotic efficacy of a drug at a threshold of just over 60% of the treatment effect achieved by the active comparator as measured by the reduction in the PANSS total score (10) or the PANSS negative subscale score (30). Other equivalence thresholds yield findings of more clinical relevance, but the thresholds differ between comparable studies (28, 32, 37, 39).For multiple comparisons, such as those that occur with the use of test batteries in cognition studies, an adjustment for multiple testing may be necessary, but no generally accepted approach toward this statistical problem exists. One work group may confuse the reader by applying an adjustment for multiple testing in one study (18, 20) and not in a comparable trial (19). In some studies, the application of an adjustment was not explicitly mentioned or adequately discussed, despite the presence of multiple comparisons (1, 16, 31, 62).Another source of potential bias is a study design in which an acute-phase trial of up to 8 weeks is followed by a continuation phase of up to 12 months that is focused on long-term maintenance of the treatment effect. After the acute phase, patients who are nonresponders are discontinued from the study and only those who meet the response criteria are included in the maintenance phase (63). This design may be acceptable for relapse studies but leads to problems in response trials. Selecting only responders for continuation in a trial that is focused on response (as measured, for example, with the mean reduction of the PANSS score from baseline to endpoint) as well as further improvement alters the study population radically, necessitating careful interpretation of the results in the follow-up (10).Reporting and Wording of ResultsA complete disclosure of all results of the head-to-head comparison would appear to be mandatory but is not always provided. Results favoring the drug manufactured by the sponsor are often presented in detail, and unfavorable results often are mentioned in a brief sentence at the very end of the report's results section or not mentioned at all (1, 12). Accordingly, the report's authors may choose to present only data from observed cases or only data from a last-observation-carried-forward analysis, depending on the resulting outcomes. If the last-observation-carried-forward design showed no significant difference between drugs, the results from the observed cases may be displayed in detail and presented as a significant outcome of the study (11). The relevant population for evaluation of the primary outcome should be stated a priori in the protocol and made transparent to the reader.Furthermore, reporting of adverse events seems to be selective (34, 36, 38, 62), and the corresponding level of significance for comparisons of rates of adverse events may not be consistently stated (21, 29). Information on side effects that are very likely to occur, such as sedation and weight gain with olanzapine (15, 64) or elevation of prolactin levels with amisulpride (28), may be lacking. In addition, in reports of extrapyramidal symptoms, detailed information on the mean daily dose of anticholinergic medication and the number of patients who received at least one dose of anticholinergic medication should be provided. If this information is omitted, the reported frequency of occurrence of extrapyramidal symptoms gives only a vague impression of the likelihood of these side effects (23, 28).Poster Reports and Multiple PublishingPhrasing of abstracts is difficult, because much information has to be made transparent to the reader in only a few lines. Although the abstracts of many head-to-head studies adhere to widely accepted structural standards (65), the results stated are often highly selective. For example, in the abstract of one study (29), a significant difference in rates of extrapyramidal symptoms that favored the sponsor's drug is reported in detail, but the side effects unfavorable to the drug were mentioned without corresponding levels of significance. Preliminary results of trials are often presented as poster reports at conferences. Presentation of multiple poster reports on the same trial with different first authors can lead to the impression that independent studies have been conducted (10, 66). If data from a previously published trial are later used as the basis for reports focusing on subpopulations or secondary objectives, the abstracts of the later studies should contain a cross-reference to disclose the source of the data at a glance (62–64, 67, 68). Stand-alone publication of data deriving from another trial without a reference to the earlier trial gives the impression that separate trials have been conducted (18, 19).DiscussionThe first part of our analysis revealed a clear link between sponsorship and study outcome as reported in the abstract, as 90.0% of the abstracts were rated as showing an overall superiority of the sponsor's drug. This finding is in accordance with numerous previous reports of a similar effect in other medical fields (3, 4, 6, 69). Even more striking were our findings for pair-wise comparison of different trials that examined the effects of the same two drugs (Table 1). We found that different comparisons of the same two antipsychotic drugs led to contradictory overall conclusions, depending on the sponsor of the study. On the basis of these contrasting findings in head-to-head trials, it appears that whichever company sponsors the trial produces the better antipsychotic drug. This peculiar result led us to take a closer look at various design and reporting features. Indeed, a number of potential reasons for the association between drug-company-sponsored trials and favorable results were identified.Limitations to Our ApproachA first limitation is that we did not retrieve all trials that were presented at conferences. Because no databases for such presentations exist, we were limited to the posters from conferences attended by members of our work group. The conference presentations we included are therefore not necessarily representative of all conference publications. We did not, however, want to exclude this material completely, because conferences are an important way for companies to distribute information. We made no selection among the available reports. The main limitation of our exploratory analysis is that it must remain speculative by nature. Although in some cases—for example, the trial in which the optimal risperidone dose of 4 mg/day was explicitly excluded (10)—it is quite obvious that the factor we identified may have biased the results, there is no proof that it really did. Only a "remake" of the study factoring out the source of bias could test the impact. Furthermore, other readers may have different opinions, especially about the more subtle potential sources of bias. Finally, we emphasize that most of the identified factors were indeed rather subtle and did not reflect an attempt by the drug trial sponsors to intentionally misinterpret their findings or to willfully mislead readers.Benefit From Industry-Sponsored TrialsIn many respects the industry-sponsored studies included in our review met high methodological standards (26, 27) and often surpassed non-industry-sponsored trials in the quality of research methods (6, 70). Industry-independent studies are not necessarily free of bias and are often too underpowered to find statistically significant differences or to allow any generalization (46, 47, 71). In our review, the sample size per group of the nine studies not funded by a for-profit organization ranged from nine to 113 patients. Other factors that contribute to the excellent methodological standards of industry-sponsored trials are valid central randomization, the high quality of data acquisition and management, regular auditing processes, and the pharmaceutical company's researchers' detailed knowledge about the drug (6, 70). There is also no doubt that the development of the second-generation antipsychotics was a major step forward. For the first time antipsychotic drugs with clearly defined dose ranges were made available, while the optimum dose, even of the standard conventional antipsychotic haloperidol, is still in doubt. Industry-organized trials also markedly improved our knowledge about general clinical questions such as medication switching strategies (72), the treatment of patients with refractory disorders (34), and the overall effectiveness of new and conventional antipsychotics for treatment of negative symptoms (73). However, if all studies by drug companies report positive outcomes, the findings may lose credibility.Suggestions for Potential ImprovementGiven the unique opportunities of industry for organizing methodologically sound, large-scale trials, the association between outcome and sponsor found in the rating of abstracts in our study is unsatisfactory. We believe, however, that in the case of many of the problematic points raised in the Results section, relatively simple measures could improve the situation to an appreciable extent.Sponsorship and outcome as reported in the abstractOur results show that reading only the abstract of a study is insufficient for a complete understanding of the study findings. However, lack of time makes it difficult even for scientific experts to read all trial reports in detail. Therefore, peer reviewers of studies being considered for publication should pay close attention to the conclusions stated in study abstracts. Overall, we found that the structure of the abstracts in the current review adhered to widely accepted standards (65), but the selection of the results and the phrasing used to convey the results needed to be carefully scrutinized. To avoid bias in this crucial section of trial reporting, we suggest that peer reviewers verify whether the abstract really summarizes the overall results of the trial in a balanced way. Detailed guidelines in this area for peer reviewers would be useful.Dose and dose escalationIn head-to-head trials, dose ranges and escalation schemes have a major effect on the outcome. To avoid potential bias, study initiators could ask the competitor to provide

Referência(s)