Artigo Acesso aberto Revisado por pares

Systematic Review and Evidence Integration for Literature-Based Environmental Health Science Assessments

2014; National Institute of Environmental Health Sciences; Volume: 122; Issue: 7 Linguagem: Inglês

10.1289/ehp.1307972

ISSN

1552-9924

Autores

Andrew A. Rooney, Abee L. Boyles, Mary S. Wolfe, John R. Bucher, Kristina A. Thayer,

Tópico(s)

Climate Change and Health Impacts

Resumo

Vol. 122, No. 7 ResearchOpen AccessSystematic Review and Evidence Integration for Literature-Based Environmental Health Science Assessments Andrew A. Rooney, Abee L. Boyles, Mary S. Wolfe, John R. Bucher, and Kristina A. Thayer Andrew A. Rooney , Abee L. Boyles , Mary S. Wolfe , John R. Bucher , and Kristina A. Thayer Published:1 July 2014https://doi.org/10.1289/ehp.1307972Cited by:39AboutSectionsPDF Supplemental Materials ToolsDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InReddit AbstractBackground: Systematic-review methodologies provide objectivity and transparency to the process of collecting and synthesizing scientific evidence in reaching conclusions on specific research questions. There is increasing interest in applying these procedures to address environmental health questions.Objectives: The goal was to develop a systematic-review framework to address environmental health questions by extending approaches developed for clinical medicine to handle the breadth of data relevant to environmental health sciences (e.g., human, animal, and mechanistic studies).Methods: The Office of Health Assessment and Translation (OHAT) adapted guidance from authorities on systematic-review and sought advice during development of the OHAT Approach through consultation with technical experts in systematic review and human health assessments, as well as scientific advisory groups and the public. The method was refined by considering expert and public comments and through application to case studies.Results and Discussion: Here we present a seven-step framework for systematic review and evidence integration for reaching hazard identification conclusions: 1) problem formulation and protocol development, 2) search for and select studies for inclusion, 3) extract data from studies, 4) assess the quality or risk of bias of individual studies, 5) rate the confidence in the body of evidence, 6) translate the confidence ratings into levels of evidence, and 7) integrate the information from different evidence streams (human, animal, and “other relevant data” including mechanistic or in vitro studies) to develop hazard identification conclusions.Conclusion: The principles of systematic review can be successfully applied to environmental health questions to provide greater objectivity and transparency to the process of developing conclusions.Citation: Rooney AA, Boyles AL, Wolfe MS, Bucher JR, Thayer KA. 2014. Systematic review and evidence integration for literature-based environmental health science assessments. Environ Health Perspect 122:711–718; http://dx.doi.org/10.1289/ehp.1307972IntroductionSystematic-review methodologies increase the objectivity and transparency in the process of collecting and synthesizing scientific evidence on specific questions. The product of a systematic review can then be used to inform decisions, reach conclusions, or identify research needs. There is increasing interest in applying the principles of systematic review to questions in environmental health [European Food Safety Authority (EFSA) 2010; National Research Council (NRC) 2011, 2013a; Rhomberg et al. 2013; Woodruff and Sutton 2011].Although systematic-review methodologies are well established in clinical medicine to assess data for reaching health care recommendations [Agency for Healthcare Research and Quality (AHRQ) 2013; Guyatt et al. 2011a; Higgins and Green 2011; Viswanathan et al. 2012], these approaches are most developed for human clinical trials, and therefore, typically consider small data sets of similar study design in developing conclusions. Questions in environmental health require the evaluation of a broader range of relevant data including experimental animal and mechanistic studies as well as observational human studies. Also, there is a need to integrate data from multiple evidence streams (human, animal, and “other relevant data” including mechanistic or in vitro studies) in order to reach conclusions regarding potential health effects from exposure to substances in our environment.The National Toxicology Program (NTP) Office of Health Assessment and Translation (OHAT) conducts literature-based evaluations to assess the evidence that environmental chemicals, physical substances, or mixtures (collectively referred to as “substances”) cause adverse health effects and provides opinions on whether these substances may be of concern given levels of current human exposure (Bucher et al. 2011). Building on a history of rigorous and objective scientific review, OHAT has been working to incorporate systematic-review procedures in its evaluations since 2011 through a process that has included adoption of current practice, as well as methods development (Birnbaum et al. 2013; NTP 2012a, 2012b, 2013e). Here we explain the framework developed by OHAT that uses procedures to integrate multiple evidence streams including observational human study findings, experimental animal toxicology results, and other relevant data in developing hazard identification conclusions or state-of-the-science evaluations regarding health effects from exposure to environmental substances. The seven-step framework outlines methods to increase transparency and consistency in the process, but it also presents opportunities to increase efficiencies in data management and data display that facilitate the process of reaching and communicating hazard identification conclusions.MethodsIn 2011, OHAT began exploring systematic-review methodology as a means to enhance transparency and increase efficiency in summarizing and synthesizing findings from studies in its literature-based health assessments. OHAT used a multipronged strategy to develop the OHAT Approach, working with advisors to adapt and extend existing methods from clinical medicine and obtaining input from technical experts and the public on early drafts (see Supplemental Material, Table S1). The methods-development process is described in detail in Supplemental Material (“Process for developing the OHAT Approach,” pp. 2–7). In brief, OHAT reviewed guidance from authoritative systematic-review groups (AHRQ 2013; Guyatt et al. 2011a; Higgins and Green 2011) in developing an initial draft and sought additional advice through web-based discussions and consultation with technical experts, the NTP Executive Committee, the NTP Board of Scientific Counselors, and the public (NTP 2012a, 2012b, 2013b, 2013c, 2013d, 2013e). The resulting OHAT Approach has been refined based on the input received and through application to case studies.ResultsThe OHAT framework is a flexible seven-step process (Figure 1) tailored to the complexity of the research question. It includes all of the recommended elements for conducting and reporting a systematic review [outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement (Moher et al. 2009)]. The specific procedures for performance of each step are described in a detailed protocol developed for each evaluation (NTP 2013a, 2013f).Figure 1 The OHAT Approach for systematic review and evidence integration for literature-based environmental health science assessments.Step 1: Problem Formulation and Protocol DevelopmentPrior to conducting an evaluation, the scope and focus of the topic is defined through consultation with subject-matter experts. For OHAT, the objective is typically to identify a potential health hazard or assess the state of the science in order to identify research needs on topics of importance to environmental health. The objectives of the evaluation must be clearly stated, including the key questions to be addressed. The evaluation is structured to answer these key questions that guide the systematic-review process for the literature search, study selection, data extraction, and synthesis. The questions define the populations, exposures, comparators, outcomes, timings, and settings of interest (PECOTS) eligibility criteria for the evaluation (e.g., see discussion in AHRQ 2013). PECOTS is the environmental equivalent of AHRQ’s PICOTS expansion of the original PICO approach developed for clinical evaluations that focuses on interventions rather than exposures, and did not initially include timing or setting in the inclusion criteria (Whitlock et al. 2010).A concept document (or brief proposal) and a specific, detailed protocol for OHAT evaluations are developed through an iterative process in which information is obtained by outreach to federal partners, technical experts, and the public and through consultation with the NTP Board of Scientific Counselors (NTP 2013g). Through this process, the protocol is developed a priori, and guidance in the protocol forms the basis for scientific judgments throughout the evaluation. However, it is important to acknowledge that the protocol can be modified to address unanticipated issues that might arise while conducting the review (e.g., see Food and Drug Administration 2010; Khan et al. 2001). Revisions to the protocol are documented and justified with notation of when in the process the revisions were made.Step 2: Search for and Select Studies for InclusionSearch for studies. A comprehensive search of the primary scientific literature is performed. The search covers multiple databases (including, but not limited to, PubMed, TOXNET, Scopus, and Embase) with sufficient details of the search strategy documented in the protocol such that it could be reproduced. The protocol also lists the dates of the search, frequency of updates, and any limits placed on the search (e.g., language, date of publication). The protocol establishes requirements for consideration of data from meeting abstracts or other unpublished sources. If a study that may be critical to the evaluation has not been peer reviewed and the authors agree to make all study materials available, the NTP will have it peer reviewed by independent scientists with relevant expertise. The peer-review requirement assures that studies considered in the evaluation have been reviewed by subject-matter experts, and the information from this review would be available in step 4 when evaluating individual study quality.Select studies for inclusion. All references identified in the search are screened for relevance to the key question(s) of the evaluation based on the PECOTS eligibility criteria established when formulating the problem in step 1. The protocol establishes criteria for including or excluding references based on, for example, applicable outcomes, relevant exposures, and types of studies. These criteria contain sufficient detail to develop an inclusion and exclusion checklist in order to limit the use of scientific judgment during the literature-selection process. If major limitations in a specific study type or design for addressing the question are known in advance (e.g., unreliable methods to assess exposure or health outcome), the basis for excluding those studies must be described a priori in the protocol.The protocol also outlines the specific plans for reviewing studies for inclusion, resolving conflicts between reviewers, and documenting the reasons that studies were excluded. Two reviewers independently screen all references at the title and abstract level and resolve differences by reaching agreement through discussion. References that meet the inclusion criteria are retrieved for full text review, as are those with insufficient information to determine eligibility from just the title and abstract. Procedures for full text review are tailored to the scope of the review and follow procedures established in the protocol. Creating a flow diagram to show the number of references retrieved, duplicates removed, and studies excluded as references move through the screening process is one of several required elements for reporting based on the PRISMA statement (Liberati et al. 2009; Moher et al. 2009) that we have included in this framework.Step 3: Extract Data from StudiesRelevant data from individual studies selected for inclusion are extracted or copied from the publication to a database to facilitate critical evaluation of the results, including data summary and display using separate data collection forms for human, animal, and in vitro studies. For each study, one member of the evaluation team performs the data extraction, and quality assurance procedures are undertaken as specified in the protocol (e.g., review and confirmation by another team member). Following completion of an evaluation, the data extracted and summarized will be made publicly available in the NTP Chemical Effects in Biological Systems (CEBS) database (NTP 2014a).Step 4: Assess the Quality or Risk of Bias of Individual StudiesDespite the critical importance of assessing the credibility of individual studies when developing literature-based evaluations, the meaning of the term “quality” varies widely across the fields of systematic review, toxicology, and public health (see discussion in Viswanathan et al. 2012). Broadly defined, study quality includes a) reporting quality (how well or completely a study was reported); b) internal validity or risk of bias (how credible the findings are based on the design and apparent conduct of a study); and c) external validity or directness and applicability (how well a study addresses the topic under review) (see Cochrane Collaboration 2013 for detailed definitions). Study quality assessment tools that mix different aspects of study quality or provide a single summary score are discouraged (Balshem et al. 2011; Higgins and Green 2011; Liberati et al. 2009; Viswanathan et al. 2012).The OHAT risk-of-bias tool adapts guidance from the AHRQ (Viswanathan et al. 2012). Individual risk-of-bias questions are designated as applicable only to certain types of study designs (e.g., human controlled trials, experimental animal studies, cohort studies, case–control studies, cross-sectional studies, case series or case reports), with a subset of the questions applying to each study design (Table 1).Table 1 OHAT risk-of-bias questions.Bias categories and questionsApplicable study designsSelection biasWas administered dose or exposure level adequately randomized? Randomization requires that each human subject or animal had an equal chance of being assigned to any study group, including controls (e.g., use of random number table or computer generated randomization).ExA,a HCTbWas allocation to study groups adequately concealed? Allocation concealment requires that research personnel do not know which administered dose or exposure level is assigned at the start of a study. Human studies also require that allocation be concealed from human subjects prior to entering the study. Note: a) a question under performance bias addresses blinding of personnel and human subjects to treatment during the study; b) a question under detection bias addresses blinding of outcome assessors.ExA, HCTWere the comparison groups appropriate? Comparison group appropriateness refers to having similar baseline characteristics between the groups aside from the exposures and outcomes under study.Coh,c CaC,d CrSeConfounding biasDid the study design or analysis account for important confounding and modifying variables? Note: a parallel question under detection bias addresses reliability of the measurement of confounding variables.AllfDid researchers adjust or control for other exposures that are anticipated to bias results?AllPerformance biasWere experimental conditions identical across study groups?ExADid researchers adhere to the study protocol?AllWere the research personnel and human subjects blinded to the study group during the study? Blinding requires that study scientists do not know which administered dose or exposure level the human subject or animal is being given (i.e., study group). Human studies require blinding of the human subjects when possible.ExA, HCTAttrition/exclusion biasWere outcome data complete without attrition or exclusion from analysis? Attrition rates are required to be similar and uniformly low across groups with respect to withdrawal or exclusion from analysis.ExA, HCT, Coh, CaC, CrSDetection biasWere the outcome assessors blinded to study group or exposure level? Blinding requires that outcome assessors do not know the study group or exposure level of the human subject or animal when the outcome was assessed.AllWere confounding variables assessed consistently across groups using valid and reliable measures? Consistent application of valid, reliable, and sensitive methods of assessing important confounding or modifying variables is required across study groups. Note: a parallel question under selection bias addresses whether design or analysis account for confounding.AllCan we be confident in the exposure characterization? Confidence requires valid, reliable, and sensitive methods to measure exposure applied consistently across groups.AllCan we be confident in the outcome assessment? Confidence requires valid, reliable, and sensitive methods to assess the outcome and the methods should be applied consistently across groups.AllSelective reporting biasWere all measured outcomes reported?AllOtherWere there no other potential threats to internal validity (e.g., statistical methods were appropriate)? On a project-specific basis, additional questions for other potential threats to internal validity can be added and applied to study designs as appropriate.Additional items as applicable by study designThe OHAT risk-of-bias questions are applied to evaluate the risk of bias of studies on an outcome basis. The study design types to which each risk-of-bias question applies are given in the right-hand column. Answering “yes” indicates lower risk of bias, whereas “no” indicates higher risk of bias for that question. Risk-of-bias ratings are developed by answering each applicable question with one of four options (definitely low, probably low, probably high, or definitely high risk of bias). Abbreviations: CaC, case–control; CaS, case series; Coh, prospective or retrospective cohort; CrS, cross-sectional; ExA, experimental animal; HCT, human controlled trial. aExA studies are controlled exposure studies; nonhuman animal observational studies could be evaluated using the design features of observational human studies such as CrS study design. bHCTs are carried out in humans using a controlled exposure, including randomized controlled trials and non-randomized experimental studies. cCoh studies include prospective studies that follow subjects free of disease over time or retrospective studies of subjects with prior information available. dCaC studies enroll subjects based on their disease status and compare exposures across the groups. eCrS studies are conducted at one point in time and include population surveys with individual data [e.g., National Health and Nutrition Examination Survey (NHANES)] and population surveys with aggregate data (i.e., air pollution exposure estimated by ZIP code). fAll applies to ExA, HCT, Coh, CaC, and CrS studies, as well as other study design types such as case reports or CaS studies that lack a comparison group within the study. Published tools do not address risk-of-bias criteria for animal studies because risk-of-bias tools, as with systematic-review methods in general, have been focused on guidelines for clinical medicine. OHAT evaluates risk of bias in experimental animal studies using criteria similar to those applied to human randomized controlled trials, because these study designs are similar in their ability to control timing and dose of exposure and to minimize the impact of confounding factors. Using the same set of questions for all study types, including experimental animal studies, allows for comparison of particular risk-of-bias issues across a body of evidence and facilitates comparison of the strengths and weaknesses of different bodies of evidence.All references are independently assessed for risk of bias for each outcome of interest by two reviewers who answer all of the applicable questions with one of four options (definitely low, probably low, probably high, or definitely high risk of bias) (CLARITY Group at McMaster University 2013) following prespecified criteria detailed in the protocol. Before proceeding with the risk-of-bias assessment, OHAT recommends evaluating a small subset of studies as a “pilot” to clarify how the protocol-specific criteria will be applied through dialogue among subject matter experts and reviewers. During completion of the risk-of-bias assessment for the full set of studies, discrepancies between the reviewers are resolved by reaching agreement through discussion.Step 5: Rate the Confidence in the Body of EvidenceFor each outcome, the confidence in the body of evidence is rated by considering the strengths and weaknesses of a collection of studies with similar study design features. Ratings reflect confidence that the study findings accurately reflect the true association between exposure and effect including aspects of external validity (or directness and applicability) for the studies. The OHAT method is based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group guidelines (GRADE 2014), which have been adopted by the Cochrane Collaboration (Schünemann et al. 2012) and AHRQ approaches (Balshem et al. 2011; Lohr 2012), which are conceptually very similar. The OHAT method uses four descriptors to indicate the level of confidence in the separate bodies of evidence (Table 2). In the context of identifying research needs, a conclusion of “high confidence” indicates that further research is very unlikely to change the confidence in the apparent relationship between exposure to the substance and the outcome. Conversely, a conclusion of “very low confidence” suggests that further research is very likely to impact confidence in the apparent relationship. Human and nonhuman animal data are considered separately throughout Steps 5 and 6. Conclusions developed in the subsequent steps of the approach are based on the evidence with the highest confidence.Table 2 Confidence ratings in the bodies of evidence.Confidence ratingDefinitionHigh confidence (++++)High confidence in the association between exposure to the substance and the outcome. The true effect is highly likely to be reflected in the apparent relationship.Moderate confidence (+++)Moderate confidence in the association between exposure to the substance and the outcome. The true effect may be reflected in the apparent relationship.Low confidence (++)Low confidence in the association between exposure to the substance and the outcome. The true effect may be different from the apparent relationship.Very low confidence (+)Very low confidence in the association between exposure to the substance and the outcome. The true effect is highly likely to be different from the apparent relationship.For each outcome, studies are given an initial confidence rating that reflects the presence or absence of key study-design features (Figure 1, step 5). Then studies that have the same number of features are considered together as a group to begin the process of rating confidence in a body of evidence for that outcome. The initial rating of each group is downgraded for factors that decrease confidence and upgraded for factors that increase confidence in the results. Confidence across all studies with the same outcome is then assessed by considering the ratings for all groups of studies with that outcome, and the highest rating for that outcome moves forward.Although confidence ratings for each outcome are developed for groups of studies, the number of studies constituting the group will vary, and in some cases this group may be represented by only one study. Therefore, it is worth noting that a single well-conducted study may provide evidence of toxicity or a health effect associated with exposure to the substance in question [e.g., see Germolec (2009) and Foster (2009) for explanations of the NTP levels of evidence for determination of “toxicity” for individual studies]. If a sufficient body of very similar studies is available, a quantitative meta-analysis may be completed to generate an overall estimate of effect, but this is not required. Finally, confidence conclusions are developed across multiple outcomes for those outcomes that are biologically related.It is recognized that the scientific judgments involved in developing these confidence ratings are inherently subjective. A key advantage of the systematic-review process for this step and throughout an evaluation is that it provides a framework to document and justify the decisions made, and thereby provides for greater transparency in the scientific basis of judgments made in reaching conclusions.Initial confidence set by key features of study design for each outcome. An initial confidence rating is determined by the ability of the study design to address causality as reflected in the confidence that exposure preceded and was associated with the outcome (Figure 1, step 5). This ability is reflected in the presence or absence of four key study-design features that determine initial confidence ratings, and studies are differentiated based on whether a) the exposure to the substance is controlled; b) the exposure assessment represents exposures occurring prior to development of the outcome; c) the outcome is assessed on the individual level (i.e., not population aggregate data); and d) a comparison or control group is used within the study. The first key feature, “controlled exposure,” reflects the ability of experimental studies in humans and animals to largely eliminate confounding by randomizing allocation of exposure. Therefore, these studies will usually have all four features and receive an initial rating of “high confidence.” Observational studies do not have controlled exposure and are differentiated by the presence or absence of the three remaining study-design features. For example, prospective cohort studies usually have all three remaining features and receive an initial rating of “moderate confidence,” whereas a case report may have only one key feature and receive an initial rating of “very low confidence” (see Supplemental Material, Table S2, for key features for standard study designs and discussion, pp. 9–11). The presence or absence of these study-design features capture and discriminate studies on an outcome-specific basis (e.g., experimental, prospective) but do not replace consideration of risk of bias elements or external validity in other steps.Downgrade confidence rating. Five properties of the body of evidence (risk of bias, unexplained inconsistency, indirectness, imprecision, and publication bias) are considered to determine if the initial confidence rating should be downgraded (Figure 1, step 5). For each of the five properties, a judgment is made and documented regarding whether there are substantial issues that decrease the confidence rating in each aspect of the body of evidence for the outcome. Factors that would downgrade confidence by one versus two levels are specified in the protocol. The reasons for downgrading confidence may not fit neatly into a single property of the body of evidence. If the decision to downgrade is borderline for two properties, the body of evidence is downgraded once to account for both partial concerns. Similarly, the body of evidence is not downgraded twice for what is essentially the same limitation that could be considered applicable to more than one property of the body of evidence.Risk of bias of the body of evidence. Risk-of-bias criteria were described in step 4 in which study-quality issues for individual studies are evaluated on an outcome-specific basis. In step 5, the previous risk-of-bias assessments for individual studies now serve as the basis for an overall risk-of-bias conclusion for the entire body of evidence. Downgrading for risk of bias should reflect the entire body of studies; therefore, the decision to downgrade should be applied conservatively. The decision to downgrade should be reserved for cases for which there is substantial risk of bias across most of the studies composing the body of evidence (Guyatt et al. 2011e).Unexplained inconsistency. Inconsistency, or large variability in the magnitude or direction of estimates of effect across studies that cannot be explained, reduces confidence in the body of evidence. Large inconsistency across studies should be explored, preferably through a priori hypotheses that might explain the heterogeneity.Indirectness. Indirectness can refer to external validity or indirect measures of the health outcome. Indirectness can lower confidence in the body of evidence when the population, exposure, or outcome(s) measured differs from the population, exposure, or outcome(s) that is of most interest. Concerns about directness could apply to the relationship between a) a measured outcome and a health effect (i.e., upstream biomarker of a health effect); b) the route of exposure and the typical human exposure; c) the study population and the population of interest (Guyatt et al. 2011c; Lohr 2012); d) the timing of the exposure relative to the appropriate biological window to affect the outcome; or e) the timing of outcome assessment and the length of time required after an exposure for development of the outcome (Viswanathan et al. 2012).The administered dose or exposure level is not considered a factor under indirectness for developing a confidence rating for the purpose of hazard identification. Although exposure level is an important factor in considering the relevance of study findings to human health effects at known human exposure levels, in the OHAT evaluation process, this consideration occurs after hazard identification as part of reaching a “level of concern” conclusio

Referência(s)