Linkage-Disequilibrium-Based Binning Affects the Interpretation of GWASs
2012; Elsevier BV; Volume: 90; Issue: 4 Linguagem: Inglês
10.1016/j.ajhg.2012.02.025
ISSN1537-6605
AutoresAndrea Christoforou, Michael Dondrup, Morten Mattingsdal, Manuel Mattheisen, Sudheer Giddaluru, Markus M. Nöthen, Marcella Rietschel, Sven Cichon, Srdjan Djurovic, Ole A. Andreassen, Inge Jonassen, Vidar M. Steen, Pål Puntervoll, Stéphanie Le Hellard,
Tópico(s)Epigenetics and DNA Methylation
ResumoGenome-wide association studies (GWASs) are critically dependent on detailed knowledge of the pattern of linkage disequilibrium (LD) in the human genome. GWASs generate lists of variants, usually SNPs, ranked according to the significance of their association to a trait. Downstream analyses generally focus on the gene or genes that are physically closest to these SNPs and ignore their LD profile with other SNPs. We have developed a flexible R package (LDsnpR) that efficiently assigns SNPs to genes on the basis of both their physical position and their pairwise LD with other SNPs. We used the positional-binning and LD-based-binning approaches to investigate whether including these "LD-based" SNPs would affect the interpretation of three published GWASs on bipolar affective disorder (BP) and of the imputed versions of two of these GWASs. We show how including LD can be important for interpreting and comparing GWASs. In the published, unimputed GWASs, LD-based binning effectively "recovered" 6.1%–8.3% of Ensembl-defined genes. It altered the ranks of the genes and resulted in nonnegligible differences between the lists of the top 2,000 genes emerging from the two binning approaches. It also improved the overall gene-based concordance between independent BP studies. In the imputed datasets, although the increases in coverage (>0.4%) and rank changes were more modest, even greater concordance between the studies was observed, attesting to the potential of LD-based binning on imputed data as well. Thus, ignoring LD can result in the misinterpretation of the GWAS findings and have an impact on subsequent genetic and functional studies. Genome-wide association studies (GWASs) are critically dependent on detailed knowledge of the pattern of linkage disequilibrium (LD) in the human genome. GWASs generate lists of variants, usually SNPs, ranked according to the significance of their association to a trait. Downstream analyses generally focus on the gene or genes that are physically closest to these SNPs and ignore their LD profile with other SNPs. We have developed a flexible R package (LDsnpR) that efficiently assigns SNPs to genes on the basis of both their physical position and their pairwise LD with other SNPs. We used the positional-binning and LD-based-binning approaches to investigate whether including these "LD-based" SNPs would affect the interpretation of three published GWASs on bipolar affective disorder (BP) and of the imputed versions of two of these GWASs. We show how including LD can be important for interpreting and comparing GWASs. In the published, unimputed GWASs, LD-based binning effectively "recovered" 6.1%–8.3% of Ensembl-defined genes. It altered the ranks of the genes and resulted in nonnegligible differences between the lists of the top 2,000 genes emerging from the two binning approaches. It also improved the overall gene-based concordance between independent BP studies. In the imputed datasets, although the increases in coverage (>0.4%) and rank changes were more modest, even greater concordance between the studies was observed, attesting to the potential of LD-based binning on imputed data as well. Thus, ignoring LD can result in the misinterpretation of the GWAS findings and have an impact on subsequent genetic and functional studies. Over the past decade, genome-wide association studies (GWASs) have revolutionized the analysis of human complex genetic traits. By scanning hundreds of thousands of genetic variants, typically SNPs, in hundreds or thousands of individuals, they search for the variant(s) that associate with a particular disease or trait. Critical to the development and evolution of GWASs has been the creation of the International HapMap Project,1International HapMap ConsortiumA haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4767) Google Scholar which has cataloged the common patterns of human genetic variation, including the linkage disequilibrium (LD) between SNPs. Knowledge of this LD, or nonrandom association of alleles at multiple loci, has made it possible to identify informative subsets of SNPs (i.e., "tagging SNPs") that capture the bulk of genome-wide variation and has resulted in affordable genome-wide genotyping. To date, almost 1,000 GWASs have been published and have tested hundreds of human traits and reported thousands of significant associations (Catalog of Published Genome-Wide Association Studies2Hindorff L.A. Sethupathy P. Junkins H.A. Ramos E.M. Mehta J.P. Collins F.S. Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.Proc. Natl. Acad. Sci. USA. 2009; 106: 9362-9367Crossref PubMed Scopus (3049) Google Scholar). Previously known associations have been confirmed, and new candidates have been implicated.3Manolio T.A. Genomewide association studies and assessment of the risk of disease.N. Engl. J. Med. 2010; 363: 166-176Crossref PubMed Scopus (1116) Google Scholar However, a general sense of disappointment lingers because GWASs have fallen short of the initial expectation that they would unravel the genetic basis of complex traits.4Bondy B. Genetics in psychiatry: Are the promises met?.World J. Biol. Psychiatry. 2011; 12: 81-88Crossref PubMed Scopus (25) Google Scholar, 5Gershon E.S. Alliey-Rodriguez N. Liu C. After GWAS: Searching for genetic risk for schizophrenia and bipolar disorder.Am. J. Psychiatry. 2011; 168: 253-256Crossref PubMed Scopus (160) Google Scholar Recent analyses reveal that a large proportion of the "missing heritability"5Gershon E.S. Alliey-Rodriguez N. Liu C. After GWAS: Searching for genetic risk for schizophrenia and bipolar disorder.Am. J. Psychiatry. 2011; 168: 253-256Crossref PubMed Scopus (160) Google Scholar, 6Stranger B.E. Stahl E.A. Raj T. Progress and promise of genome-wide association studies for human complex trait genetics.Genetics. 2011; 187: 367-383Crossref PubMed Scopus (393) Google Scholar can be explained by a polygenic model that considers all GWAS SNPs simultaneously,7Gibson G. Hints of hidden heritability in GWAS.Nat. Genet. 2010; 42: 558-560Crossref PubMed Scopus (220) Google Scholar, 8Davies G. Tenesa A. Payton A. Yang J. Harris S.E. Liewald D. Ke X. Le Hellard S. Christoforou A. Luciano M. et al.Genome-wide association studies establish that human intelligence is highly heritable and polygenic.Mol. Psychiatry. 2011; 16: 996-1005Crossref PubMed Scopus (456) Google Scholar, 9Lee S.H. Wray N.R. Goddard M.E. Visscher P.M. Estimating missing heritability for disease from genome-wide association studies.Am. J. Hum. Genet. 2011; 88: 294-305Abstract Full Text Full Text PDF PubMed Scopus (691) Google Scholar but these studies provide no clues about the identity of the susceptibility variants or the underlying biology of the trait.6Stranger B.E. Stahl E.A. Raj T. Progress and promise of genome-wide association studies for human complex trait genetics.Genetics. 2011; 187: 367-383Crossref PubMed Scopus (393) Google Scholar Thus, much attention has been given to uncovering and characterizing this "missing" or "hidden" heritability.6Stranger B.E. Stahl E.A. Raj T. Progress and promise of genome-wide association studies for human complex trait genetics.Genetics. 2011; 187: 367-383Crossref PubMed Scopus (393) Google Scholar, 10Cantor R.M. Lange K. Sinsheimer J.S. Prioritizing GWAS results: A review of statistical methods and recommendations for their application.Am. J. Hum. Genet. 2010; 86: 6-22Abstract Full Text Full Text PDF PubMed Scopus (441) Google Scholar In a conventional GWAS, each SNP is considered separately (the "single-marker" approach), resulting in a list of variants ranked according to the statistical significance of their association to the trait (i.e., their p value).11McCarthy M.I. Abecasis G.R. Cardon L.R. Goldstein D.B. Little J. Ioannidis J.P. Hirschhorn J.N. Genome-wide association studies for complex traits: Consensus, uncertainty and challenges.Nat. Rev. Genet. 2008; 9: 356-369Crossref PubMed Scopus (2108) Google Scholar The "top hits" are typically reported, and the relevance of each finding, as well as the focus of future work, is primarily based on the functional unit(s), namely gene(s), implicated by the associated SNP. Furthermore, gene-based methods are increasingly being applied as complementary approaches to the analysis of GWAS data. These methods take the gene instead of the individual SNP as the basic unit of association and thus allow aggregation of SNPs of smaller effect, potentially increasing power and reducing the multiple-testing burden.12Bergen S.E. Balhara Y.P. Christoforou A. Cole J. Degenhardt F. Dempster E. Fatjó-Vilas M. Khedr Y. Lopez L.M. Lysenko L. et al.Summaries from the XVIII World Congress of Psychiatric Genetics, Athens, Greece, 3-7 October 2010.Psychiatr. Genet. 2011; 21: 136-172Crossref PubMed Scopus (6) Google Scholar, 13Wang K. Li M. Hakonarson H. Analysing biological pathways in genome-wide association studies.Nat. Rev. Genet. 2010; 11: 843-854Crossref PubMed Scopus (617) Google Scholar, 14Wang K. Li M. Bucan M. Pathway-based approaches for analysis of genomewide association studies.Am. J. Hum. Genet. 2007; 81: 1278-1283Abstract Full Text Full Text PDF PubMed Scopus (688) Google Scholar They enable the incorporation of biological knowledge for greater insight into the mechanisms underlying the trait and are essential for subsequent pathway-based approaches.13Wang K. Li M. Hakonarson H. Analysing biological pathways in genome-wide association studies.Nat. Rev. Genet. 2010; 11: 843-854Crossref PubMed Scopus (617) Google Scholar Gene-based methods also facilitate direct comparison of independent studies because they are unaffected by allelic heterogeneity and potential differences in SNP coverage and LD patterns.15Neale B.M. Sham P.C. The future of association studies: Gene-based analysis and replication.Am. J. Hum. Genet. 2004; 75: 353-362Abstract Full Text Full Text PDF PubMed Scopus (524) Google Scholar The success of both single-marker and gene-based approaches is critically dependent on the correct assignment of SNPs to genes. At the single-marker level, the aim is to identify the gene(s) that the associated SNP is tagging. At the gene level, the aim is to attribute all SNPs tagging a particular gene to that gene. Although LD can span hundreds of kilobases,16Hinds D.A. Stuve L.L. Nilsen G.B. Halperin E. Eskin E. Ballinger D.G. Frazer K.A. Cox D.R. Whole-genome patterns of common DNA variation in three human populations.Science. 2005; 307: 1072-1079Crossref PubMed Scopus (973) Google Scholar, 17Lawrence R. Evans D.M. Morris A.P. Ke X. Hunt S. Paolucci M. Ragoussis J. Deloukas P. Bentley D. Cardon L.R. Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants.Genome Res. 2005; 15: 1503-1510Crossref PubMed Scopus (25) Google Scholar when GWAS results emerge, the SNPs of interest are typically assigned to the nearest gene or transcript within a specified distance.14Wang K. Li M. Bucan M. Pathway-based approaches for analysis of genomewide association studies.Am. J. Hum. Genet. 2007; 81: 1278-1283Abstract Full Text Full Text PDF PubMed Scopus (688) Google Scholar In turn, genes are typically represented only by the SNPs that are physically located within the transcribed region or predefined flanking region.13Wang K. Li M. Hakonarson H. Analysing biological pathways in genome-wide association studies.Nat. Rev. Genet. 2010; 11: 843-854Crossref PubMed Scopus (617) Google Scholar It is not systematically taken into consideration that an associated SNP might be in high LD with another SNP (genotyped or not) located hundreds of kilobases away in a different gene or that a genotyped SNP positioned outside the defined boundaries of a gene is tagging that gene. Here, we show that ignoring LD discards valuable information and potentially leads to the incorrect localization of the association signal and might mislead the interpretation of GWAS data. We have therefore developed a flexible R package (LDsnpR) that systematically assigns SNPs to genes (or relevant predefined genome "bins") by using SNP association results (e.g., p values), bin definitions, and precalculated pairwise LD data (e.g., r2 values) provided by the user (Figure S1, available online). By default, LDsnpR assigns a SNP to a bin if that SNP is located within the physical boundaries of that bin (i.e., the "positional-binning" approach). Then, as a unique feature of this package, the user has the option of also assigning a genotyped SNP to a bin if that SNP is in high pairwise LD with another SNP (genotyped or not) located within the physical boundaries of that bin (i.e., "LD-based-binning" approach). Although a genotyped SNP cannot be assigned to a particular gene more than once, it can be assigned to more than one gene. As proof of principal, we used LDsnpR to assess the impact of the LD-based-binning approach (versus the positional-binning approach) on the results of three published GWASs on bipolar disorder (BP), each unimputed and genotyped on a different platform. The three GWASs are (1) the UK-based Wellcome Trust Case Control Consortium (WTCCC) BP GWAS,18Wellcome Trust Case Control ConsortiumGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.Nature. 2007; 447: 661-678Crossref PubMed Scopus (7760) Google Scholar (2) the Norwegian Thematically Organized Psychosis (TOP) BP GWAS,19Djurovic S. Gustafsson O. Mattingsdal M. Athanasiu L. Bjella T. Tesli M. Agartz I. Lorentzen S. Melle I. Morken G. Andreassen O.A. A genome-wide association study of bipolar disorder in Norwegian individuals, followed by replication in Icelandic sample.J. Affect. Disord. 2010; 126: 312-316Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholar and (3) a German BP GWAS20Cichon S. Mühleisen T.W. Degenhardt F.A. Mattheisen M. Miró X. Strohmaier J. Steffens M. Meesters C. Herms S. Weingarten M. et al.Bipolar Disorder Genome Study (BiGS) ConsortiumGenome-wide association study identifies genetic variation in neurocan as a susceptibility factor for bipolar disorder.Am. J. Hum. Genet. 2011; 88: 372-381Abstract Full Text Full Text PDF PubMed Scopus (210) Google Scholar (Table 1). Each GWAS had been previously approved by the relevant local research ethics committees, and all participants had provided written informed consent.18Wellcome Trust Case Control ConsortiumGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.Nature. 2007; 447: 661-678Crossref PubMed Scopus (7760) Google Scholar, 19Djurovic S. Gustafsson O. Mattingsdal M. Athanasiu L. Bjella T. Tesli M. Agartz I. Lorentzen S. Melle I. Morken G. Andreassen O.A. A genome-wide association study of bipolar disorder in Norwegian individuals, followed by replication in Icelandic sample.J. Affect. Disord. 2010; 126: 312-316Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholar, 20Cichon S. Mühleisen T.W. Degenhardt F.A. Mattheisen M. Miró X. Strohmaier J. Steffens M. Meesters C. Herms S. Weingarten M. et al.Bipolar Disorder Genome Study (BiGS) ConsortiumGenome-wide association study identifies genetic variation in neurocan as a susceptibility factor for bipolar disorder.Am. J. Hum. Genet. 2011; 88: 372-381Abstract Full Text Full Text PDF PubMed Scopus (210) Google Scholar In addition, we assessed the impact of LD-based binning on imputed versions of the TOP and German GWASs, in which ungenotyped markers had been statistically inferred11McCarthy M.I. Abecasis G.R. Cardon L.R. Goldstein D.B. Little J. Ioannidis J.P. Hirschhorn J.N. Genome-wide association studies for complex traits: Consensus, uncertainty and challenges.Nat. Rev. Genet. 2008; 9: 356-369Crossref PubMed Scopus (2108) Google Scholar on the basis of LD from different reference panels (i.e., HapMap Phase III for TOP; HapMap Phase III and 1,000 Genomes211000 Genomes ConsortiumA map of human genome variation from population-scale sequencing.Nature. 2010; 467: 1061-1073Crossref PubMed Scopus (5934) Google Scholar for German) (Table 2).Table 1Study Descriptions and Summary of Coverage for Positional-Binning and LD-Based-Binning Approaches for Original, Unimputed DatasetsWTCCCaThe UK-based Wellcome Trust Case Control Consortium (WTCCC) BP GWAS.17TOPbThe Norwegian Thematically Organized Psychosis (TOP) BP GWAS.18GermancA German BP GWAS.19Sample size (cases/controls)1,868/2,938198/336682/1,300Platform usedAffymetrix 500KAffymetrix6.0Illumina HumanHap550v3Number of post-QC SNPs for binning468,648615,396511,978Binning dataPositional binningLD-based binningDifferencedPercentages indicate percent increase or decrease from positional to LD-based binning.Positional binningLD-based binningDifferencedPercentages indicate percent increase or decrease from positional to LD-based binning.Positional binningLD-based binningDifferencedPercentages indicate percent increase or decrease from positional to LD-based binning.Number of genes coveredeEnsembl 54 (May 2009) genes (total N = 36,693) tagged by at least one SNP.30,610 (83.4%)33,443 (91.1%)2,833 (9.3%)31,823 (86.7%)33,905 (92.4%)2,082 (6.5%)31,708 (86.4%)33,861 (92.3%)2,153 (6.8%)Number of post-QC SNPs binned237,869 (50.8%)277,534 (59.2%)39,665 (16.7%)307,949 (50.0%)363,570 (59.1%)55,621 (18.1%)272,914 (53.3%)308,634 (60.2%)35,720 (13.1%)Number of SNPs binned to only 1 gene199,752 (84.0%)178,544 (64.3%)21,208 (10.6%)259,223 (84.2%)234,036 (64.4%)25,187 (9.7%)228,098 (83.6%)209,458 (67.9%)18,640 (8.2%)Number of SNPs binned to ten or more135 (0.057%)2,537 (0.91%)2,402174 (0.057%)3,106 (0.85%)2,932141 (0.052%)2,072 (0.67%)1,931Mean number of SNPs per bin (median)9.4 (4)15.2 (10)6.6 (4)11.7 (5)19.4 (13)8.4 (6)10.5 (5)15.4 (10)5.6 (4)Range (min–max)1–5141–5150–871–6871–7010–1121–6551–6650–64Number of genes with only one SNP4,830 (15.8%)1,531 (4.6%)3,299 (68.3%)3,604 (11.3%)992 (2.9%)2,612 (72.5%)3,647 (11.5%)595 (1.8%)3,052 (83.7%)The following abbreviation is used: QC, quality control.a The UK-based Wellcome Trust Case Control Consortium (WTCCC) BP GWAS.17Lawrence R. Evans D.M. Morris A.P. Ke X. Hunt S. Paolucci M. Ragoussis J. Deloukas P. Bentley D. Cardon L.R. Genetically indistinguishable SNPs and their influence on inferring the location of disease-associated variants.Genome Res. 2005; 15: 1503-1510Crossref PubMed Scopus (25) Google Scholarb The Norwegian Thematically Organized Psychosis (TOP) BP GWAS.18Wellcome Trust Case Control ConsortiumGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.Nature. 2007; 447: 661-678Crossref PubMed Scopus (7760) Google Scholarc A German BP GWAS.19Djurovic S. Gustafsson O. Mattingsdal M. Athanasiu L. Bjella T. Tesli M. Agartz I. Lorentzen S. Melle I. Morken G. Andreassen O.A. A genome-wide association study of bipolar disorder in Norwegian individuals, followed by replication in Icelandic sample.J. Affect. Disord. 2010; 126: 312-316Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholard Percentages indicate percent increase or decrease from positional to LD-based binning.e Ensembl 54 (May 2009) genes (total N = 36,693) tagged by at least one SNP. Open table in a new tab Table 2Study Descriptions and Summary of Coverage for Positional-Binning and LD-Based-Binning Approaches for Imputed DatasetsTOPaThe Norwegian Thematically Organized Psychosis (TOP) BP GWAS.18 ImputedbImputation details: the Norwegian TOP dataset was imputed according to the ENIGMA protocol with the use of MACH imputation software38 and HapMap Phase III (CEU) as the reference panel. The German dataset was imputed with IMPUTE2 software39 and the 1,000 Genomes Project (Pilot 1, CEU) and HapMap Phase III (CEU) as reference panels.GermancA German BP GWAS.19 ImputedbImputation details: the Norwegian TOP dataset was imputed according to the ENIGMA protocol with the use of MACH imputation software38 and HapMap Phase III (CEU) as the reference panel. The German dataset was imputed with IMPUTE2 software39 and the 1,000 Genomes Project (Pilot 1, CEU) and HapMap Phase III (CEU) as reference panels.Sample size (cases/controls)198/336657/1,308Imputation reference panelHapMap Phase III (CEU)1,000 Genomes (pilot 1, CEU) and HapMap Phase III (CEU)Post-QC SNPs for binning992,1614,825,148Binning dataPositional binningLD-based binningDifferencedPercentages indicate percent increase or decrease from positional to LD-based binning.Positional binningLD-based binningDifferencedPercentages indicate percent increase or decrease from positional to LD-based binning.Number of genes coveredeEnsembl 54 (May 2009) genes (total N = 36,693) tagged by at least one SNP.33,242 (90.6%)34,193 (93.2%)951 (2.9%)32,116 (87.5%)32,259 (87.9%)143 (0.4%)Number of post-QC SNPs binned521,720 (52.6%)612,316 (61.7%)90,596 (17.4%)2,394,441 (49.6%)2,613,493 (54.2%)219,052 (9.1%)Number of SNPs binned to only one gene431,808 (43.5%)367,671 (37.1%)64,137 (14.9%)1,979,660 (41.0%)1,855,413 (38.5%)124,247 (6.3%)Number of SNPs binned to ten or more267 (0.03%)7,967 (0.8%)7,7001,272 (0.03%)16,807 (0.3%)15,535Mean number of SNPs per bin (median)19.3 (9)35.9 (25)17.1 (12)91.6 (44)130.6 (84)39.5 (26)Range (min–max)1–1,0461–1,0620–2141–5,5701–5,5730–573Number of genes with only one SNP1,795 (5.4%)651 (1.9%)1,144 (63.7%)241 (0.8%)208 (0.6%)33 (13.7%)The following abbreviation is used: QC, quality control.a The Norwegian Thematically Organized Psychosis (TOP) BP GWAS.18Wellcome Trust Case Control ConsortiumGenome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.Nature. 2007; 447: 661-678Crossref PubMed Scopus (7760) Google Scholarb Imputation details: the Norwegian TOP dataset was imputed according to the ENIGMA protocol with the use of MACH imputation software38Li Y. Willer C.J. Ding J. Scheet P. Abecasis G.R. MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes.Genet. Epidemiol. 2010; 34: 816-834Crossref PubMed Scopus (1491) Google Scholar and HapMap Phase III (CEU) as the reference panel. The German dataset was imputed with IMPUTE2 software39Howie B.N. Donnelly P. Marchini J.A. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.PLoS Genet. 2009; 5: e1000529Crossref PubMed Scopus (2788) Google Scholar and the 1,000 Genomes Project (Pilot 1, CEU) and HapMap Phase III (CEU) as reference panels.c A German BP GWAS.19Djurovic S. Gustafsson O. Mattingsdal M. Athanasiu L. Bjella T. Tesli M. Agartz I. Lorentzen S. Melle I. Morken G. Andreassen O.A. A genome-wide association study of bipolar disorder in Norwegian individuals, followed by replication in Icelandic sample.J. Affect. Disord. 2010; 126: 312-316Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholard Percentages indicate percent increase or decrease from positional to LD-based binning.e Ensembl 54 (May 2009) genes (total N = 36,693) tagged by at least one SNP. Open table in a new tab The following abbreviation is used: QC, quality control. The following abbreviation is used: QC, quality control. BP is a severe complex psychiatric disorder that shows high heritability (60%–80%) but for which clear genetic risk factors remain elusive.4Bondy B. Genetics in psychiatry: Are the promises met?.World J. Biol. Psychiatry. 2011; 12: 81-88Crossref PubMed Scopus (25) Google Scholar Although several GWASs on BP have been performed (Catalog of Published Genome-Wide Association Studies2Hindorff L.A. Sethupathy P. Junkins H.A. Ramos E.M. Mehta J.P. Collins F.S. Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.Proc. Natl. Acad. Sci. USA. 2009; 106: 9362-9367Crossref PubMed Scopus (3049) Google Scholar), the findings have shown little overlap at both the SNP and gene levels. Also, only a handful of SNPs have achieved genome-wide significance (<∼10−8), and these SNPs only explain less than 3% of the heritability,4Bondy B. Genetics in psychiatry: Are the promises met?.World J. Biol. Psychiatry. 2011; 12: 81-88Crossref PubMed Scopus (25) Google Scholar, 22So H.C. Gui A.H. Cherny S.S. Sham P.C. Evaluating the heritability explained by known susceptibility variants: A survey of ten complex diseases.Genet. Epidemiol. 2011; 35: 310-317Crossref PubMed Scopus (209) Google Scholar suggesting that psychiatric disorders, such as BP, might be less amenable to GWASs than other disorders.5Gershon E.S. Alliey-Rodriguez N. Liu C. After GWAS: Searching for genetic risk for schizophrenia and bipolar disorder.Am. J. Psychiatry. 2011; 168: 253-256Crossref PubMed Scopus (160) Google Scholar, 23Neale B.M. Purcell S. The positives, protocols, and perils of genome-wide association.Am. J. Med. Genet. B. Neuropsychiatr. Genet. 2008; 147B: 1288-1294Crossref PubMed Scopus (35) Google Scholar However, systematic LD-based gene binning has not been applied to these datasets, possibly contributing to the apparent lack of success. Thus, we assessed the effects of the LD-based-binning approach relative to the traditional positional-binning approach with respect to (1) gene coverage, (2) changes in the results and, potentially, the interpretation of findings, and (3) pairwise concordance of the findings among the BP GWASs. In brief, for LDsnpR, gene bin definitions were based on the Human Ensembl release 54 (May 2009) gene identifiers with unambiguous positional information (N = 36,693). We extended these gene bins by another 10 kb on either side to best capture potential regulatory regions.24Blow M.J. McCulley D.J. Li Z. Zhang T. Akiyama J.A. Holt A. Plajzer-Frick I. Shoukry M. Wright C. Chen F. et al.ChIP-Seq identification of weakly conserved heart enhancers.Nat. Genet. 2010; 42: 806-810Crossref PubMed Scopus (332) Google Scholar, 25Vandiedonck C. Taylor M.S. Lockstone H.E. Plant K. Taylor J.M. Durrant C. Broxholme J. Fairfax B.P. Knight J.C. Pervasive haplotypic variation in the spliceo-transcriptome of the human major histocompatibility complex.Genome Res. 2011; 21: 1042-1054Crossref PubMed Scopus (54) Google Scholar The LD data were based on HapMap Phase II release 27 and were restricted to that of the CEU (Utah residents with ancestry from northern and western Europe from the CEPH collection) sample. We set the pairwise LD at the widely accepted threshold of r2 ≥ 0.826Spencer C.C. Su Z. Donnelly P. Marchini J. Designing genome-wide association studies: Sample size, power, imputation, and the choice of genotyping chip.PLoS Genet. 2009; 5: e1000477Crossref PubMed Scopus (434) Google Scholar to limit the loss of power needed for the detection of association at the linked locus.27Wray N.R. Allele frequencies and the r2 measure of linkage disequilibrium: Impact on design and interpretation of association studies.Twin Res. Hum. Genet. 2005; 8: 87-94Crossref PubMed Google Scholar We first compared the extent of coverage between the positional-binning and LD-based-binning approaches in the published, unimputed datasets (Table 1). By allowing us to identify the intergenic SNPs that tag genes, LD-based binning resulted in a ∼13%–18% increase in the number of SNPs included in the gene-binning process. Intergenic SNPs represent ∼40% of GWAS trait-associated SNPs.3Manolio T.A. Genomewide association studies and assessment of the risk of disease.N. Engl. J. Med. 2010; 363: 166-176Crossref PubMed Scopus (1116) Google Scholar Notably, LD-based binning "recovered" >2,000 genes (>6%) in all three datasets, increasing the proportion of Ensembl 54 genes tagged by at least one SNP from ∼83% to >91%. Furthermore, there was an increase in the density of coverage; an average of 5.6 to 8.4 (median of four to six) SNPs were added per gene, and there was an overall decrease (>68%) in the number of genes tagged by only one SNP. The imputed datasets also yielded increased coverage (Table 2) but, as expected, to a lesser extent depending on the reference panel used for imputation. Although HapMap II (i.e., LDsnpR reference panel) is denser than HapMap III28Santos P.S. Höhne J. Poerner F. da Graça Bicalho M. Uchanska-Ziegler B. Ziegler A. Does the new HapMap throw the baby out with the bath water?.Eur. J. Hum. Genet. 2011; 19: 733-734Crossref PubMed Scopus (1) Google Scholar (i.e., reference panel for the TOP and German studies), imputation on the 1,000 Genomes data (i.e., reference panel for the German study) potentially gives the densest coverage. For the TOP and German imputed datasets, LD-based binning resulted in an increase of 17.4% and 9.1%, respectively, in the number of SNPs included in the gene-binning process and the recovery of 951 (2.9%) and 143 (0.4%) genes, respectively. Although this is only a small proportion of the total gene coverage, the recovery of these genes enables them to be considered as candidates for BP association and might lead to a better understanding of the biology should the true association stem from them. Also of note, in the German GWAS, LD-based binning alone achieved an overall gene coverage of 92.3% (imputation achieved 87.5% coverage, and imputation combined with LD-based binning achieved 87.9% coverage), suggesting that under some scenarios, LD-based binning alone can offer the most coverage. As with the original GWASs, there was an increase in the density of coverage; an average of 17.1 and 39.5 (median 12 and 26) SNPs were added per gene for the TOP and German imputed datasets, respectively. There was also a decrease in the number of genes tagged by only one SNP (63.7%); the decrease was not as notable for the German imputed dataset (13.7%). We next assessed the effects of the LD-based-binning approach on the results of the three GWASs at both the single-marker and gene levels. At the single-marker le
Referência(s)