Editorial Acesso aberto Revisado por pares

Transferability/generalizability deserves more attention in ‘retest’ studies in Diagnosis and Prognosis

2015; Elsevier BV; Volume: 68; Issue: 3 Linguagem: Inglês

10.1016/j.jclinepi.2015.01.007

ISSN

1878-5921

Autores

Peter Tugwell, J. André Knottnerus,

Tópico(s)

Statistical Methods in Epidemiology

Resumo

Diagnosis: One of the basic concepts we teach in clinical epidemiology is to insist on a test-retest evaluation for diagnostic sensitivity/specificity studies and similarly for studies of prognosis, with most arguing for the retest being implemented in a new cohort of individuals [[1]Knottnerus J.A. Muris J.W. Assessment of the accuracy of diagnostic tests.J Clin Epidemiol. 2003; 56: 1118-1128Abstract Full Text Full Text PDF PubMed Scopus (227) Google Scholar]. There has been insufficient attention to the case-mix of the ‘retest’ – Justice [[2]Justice A.C. Covinsky K.E. Berlin J.A. Assessing the generalizability of prognostic information.Ann Intern Med. 1999; 130: 515-524Crossref PubMed Scopus (918) Google Scholar] delineates two different equally important components: ‘retest’ study or studies needs to address not only a] reproducibility but also b] ‘transportability’, what we would call generalizability. In the article in this issue, Debray et al develop a pragmatic framework for doing this and demonstrate how to address both of these two different aspects of the retest study - by evaluating their corresponding case-mix differences. They illustrate this as a three-step framework with a prediction model for diagnosing deep venous thrombosis using three validation samples with varying case-mix. JCE would welcome more studies testing out this model. There are three other articles on diagnostic test and prognosis methods: Diagnostic Tests Indirect Comparisons: Direct within–study head to head comparisons are often not available either for diagnosis or therapy. There is an increasing literature supporting the use of indirect comparisons in therapy [3Haines T.P. Hill A.M. Inconsistent results in meta-analyses for the prevention of falls are found between study-level data and patient-level data.J Clin Epidemiol. 2011; 64: 154-162Abstract Full Text Full Text PDF PubMed Scopus (12) Google Scholar, 4Tan S.H. Cooper N.J. Bujkiewicz S. Welton N.J. Caldwell D.M. Sutton A.J. Novel presentational approaches were developed for reporting network meta-analysis.J Clin Epidemiol. 2014; 67: 672-680Abstract Full Text Full Text PDF PubMed Scopus (34) Google Scholar]. Now address this for assessing diagnostic tests but caution its use Leeflang et al. In a cohort of 32 studies of ovarian reserve they found that comparative results of test accuracy obtained through indirect comparisons are often not consistent with those obtained through direct comparisons. Even with individual patient data it was not possible to achieve concordance. Likelihood ratios are an important metric for the clinical use of diagnostic tests. Different results are often found for different subgroups so consensus is needed on how to assess the magnitude and importance of this heterogeneity/diversity. Cohen et al propose that the Cochran Q test be used–they show data from a reanalysis of six articles that showed within-study heterogeneity in diagnostic accuracy to demonstrate that compared to the confidence interval approach, the Cochran's Q test performs better overall and can better detect a twofold difference in likelihood ratio in studies with at least 300 participants. Modeling methods to better estimate the prognosis of patients are evolving. As Quantin et al point out, simulated data suggest that baseline data are insufficient and models should include flexible analyses of time-varying covariates and time-dependent effects in clinical prognostic studies. These authors demonstrate it's utility in a cohort of patients with multiple sclerosis where updating the number of recent attacks during follow-up significantly improves prognosis of development of MS disability in the next 2 years. A pair of articles address the issue of how to reduce the confusion that arises from the fact that medical journals permit a whole range of different metrics for effect size. We should surely be making it as easy as possible for health professionals to compare and contrast the clinical/substantive importance of the relative and absolute effect size. Mirzazadeh reviewed 55 HIV studies of testing and counseling with 473 effect sizes reported and found that 5 different metrics were used Pre-Post Proportion (70.6%), Odds Ratio (14.0%), Mean Difference (10.2%), Risk Ratio (4.4%), and Relative Risk Reduction (0.9%). Despite the low proportion they argue that the Relative Risk Reduction is the easiest to interpret and Link to doi: http://dx.doi.org/10.1016/j.jclinepi.2014.10.003 apply in making health decisions. They provide an algorithm for converting the other 4 measures to Relative Risk Reductions. In an accompanying commentary Busse and Guyatt agree that the Relative Risk Reduction is more interpretable that metrics that use odds, but also argue that, as recommended by GRADE [[5]Guyatt G.H. Oxman A.D. Santesso N. Helfand M. Vist G. Kunz R. et al.GRADE guidelines: 12. Preparing summary of findings tables-binary outcomes.J Clin Epidemiol. 2013; 66: 158-172Abstract Full Text Full Text PDF PubMed Scopus (555) Google Scholar], for the results to be useful for health decisions, the presentation of any relative effect size such as the Relative Risk Reduction, must be accompanied by an Absolute Risk Reduction to assess trade-offs of desirable and undesirable treatment effects, and because relative effects result in inflated impressions of treatment effect. Interrupted Time Series are starting to attract more attention in clinical epidemiology. We have 2 articles in this issue and another in press [[6]Jandoc R, Burden AM, Mamdani M, Lévesque LE, Cadarette SM. Interrupted time series analysis in drug utilization research is increasing: a systematic review and recommendations. J Clin Epidemiol. In pressGoogle Scholar]. Fretheim et al extend their previous analysis of one study to 9 datasets to assess the concordance between interrupted time series and pragmatic trials. These authors found concordance [overlap of 95% confidence limits] in 8 of the 9 data sets. This supports the conclusion that impact evaluations of health system interventions should routinely use time series analysis methods if the data is available, when evaluating intervention effects, whether randomization is feasible or not. In a second paper on interrupted time series looking at the effect of guidelines on statin use Bijlsma et al. show data to support the argument that when the individuals are different at different time points, the confounding variable termed the ‘birth cohort’ [because birth cohorts may differ in their perception of preventive measures, may differ physiologically, or may differ in prescription and adherence culture] is of major importance; including this variable changed the effect as much as by two thirds. This month there are also three manuscripts on systematic reviews with implications for systematic review organisations such as AHRQ, Cochrane, Campbell, EPPI, HTA organisations: Prioritising of updating systematic reviews so that the conclusions are current and clinically useful, is a major challenge that has been identified in previous JCE papers [7Tsertsvadze A. Maglione M. Chou R. Garritty C. Coleman C. Lux L. Bass E. Balshem H. Moher D. Updating comparative effectiveness reviews: current efforts in AHRQ's Effective Health Care Program.J Clin Epidemiol. 2011; 64: 1208-1215Abstract Full Text Full Text PDF PubMed Scopus (32) Google Scholar, 8Chung M. Newberry S.J. Ansari M.T. Yu W.W. Wu H. Lee J. Suttorp M. Gaylor J.M. Motala A. Moher D. Balk E.M. Shekelle P.G. Two methods provide similar signals for the need to update systematic reviews.J Clin Epidemiol. 2012; 65: 660-668Abstract Full Text Full Text PDF PubMed Scopus (30) Google Scholar, 9Nasser M. Ueffing E. Welch V. Tugwell P. An equity lens can ensure an equity-oriented approach to agenda setting and priority setting of Cochrane Reviews.J Clin Epidemiol. 2013; 66: 511-521Abstract Full Text Full Text PDF PubMed Scopus (32) Google Scholar] Cates et al describe how, without additional resources, they used a variety of pragmatic approaches to prioritise 30 out of 270 Cochrane Airways Reviews for updating. As pointed out by in Boers [[10]Boers M. Kirwan J.R. Wells G. Beaton D. Gossec L. d’Agostino M. et al.Developing core outcome measurement sets for clinical trials: OMERACT Filter 2.0.J Clin Epidemiol. 2014; 67: 745-753Abstract Full Text Full Text PDF PubMed Scopus (601) Google Scholar] and Williamson [[11]Williamson P.R. Altman D.G. Blazeby J.M. Clarke M. Devane D. Gargon E. et al.Developing core outcome sets for clinical trials: issues to consider.Trials. 2012; 13: 132Crossref PubMed Scopus (1126) Google Scholar] there are number of reasons to argue for core sets of pre-specified patient-important outcomes that include allowing the results of one study to be compared with another, allowing results to be combined in meta-analyses as well as guarding against selective outcome reporting in trials. In this issue Smith et al surveyed 788 Cochrane Reviews that included 6127 outcomes. They found that core outcome sets are rarely used in Cochrane Reviews. The need for versus the wastage of replication of systematic reviews is attracting increasing attention amongst the systematic review organisations and journals. [[12]Siontis K.C. Hernandez-Boussard T. Ioannidis J.P. Overlapping meta-analyses on the same topic: survey of published studies.BMJ. 2013; 347: f4501Crossref PubMed Scopus (157) Google Scholar] In this issue Lucenteforte et al report that almost a quarter of 153 systematic reviews on interventions for myocardial infarction were found to be multiple and over 30% of these gave different quantitative estimates of effect. Although reassuring when independent reviews are consistent, there are legitimate questions as to ‘research wastage [[13]Ioannidis J.P. Hlatky M.A. Khoury M.J. Macleod M.R. Moher D. Schulz K.F. Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis.Lancet. 2014; 383: 166-175Abstract Full Text Full Text PDF PubMed Scopus (983) Google Scholar] when there are multiple reviews of exactly the same question. Lucenteforte found that an important number of those with differing results were due to poor methodology; this needs to be addressed since it may be clinically detrimental in delaying adoption of the useful interventions. The review by Hind et al provides a nice documentation of higher recruitment rates in treatment than in prevention trials; in studies evaluating the same interventions [evaluated metformin monotherapy or exercise for the prevention or treatment of type 2 diabetes], the percentage of people randomised from those screened averaged 5-6% and 5% in 50 prevention studies compared with 43-51% for treatment. This needs repeating in other conditions but has major logistic implications for those planning new trials. Wong et al are to be congratulated on their detailed methodological quality review of the over 30 published candidate generic and condition-specific Health Related Quality of Life instruments that have been validated for use in patients with colorectal cancer. This is a nice practical demonstration of the application of the COSMIN criteria [[14]Mokkink L.B. Terwee C.B. Patrick D.L. Alonso J. Stratford P.W. Knol D.L. et al.The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes.J Clin Epidemiol. 2010; 63: 737-745Abstract Full Text Full Text PDF PubMed Scopus (2613) Google Scholar] that are achieving increasing acceptance. Finally, is Journalology a new field of investigation for clinical epidemiology? Journalology (the scientific process of writing for publication, manuscript peer reviewing, and scientific journal editing and publishing) is a new emerging discipline for which clinical epidemiology is at the vanguard. The review by Galipeau et al shows that there is little known about the effectiveness of peer-review training programs and no studies were found on training of journal editors. They make a plea for more and better quality primary studies on training for authors, peer reviewers, and editors, as well as exploring and studying new methods for training professionals in all areas of journalology.

Referência(s)