Revisão Acesso aberto Revisado por pares

The Next-Generation Sequencing Revolution and Its Impact on Genomics

2013; Cell Press; Volume: 155; Issue: 1 Linguagem: Inglês

10.1016/j.cell.2013.09.006

ISSN

1097-4172

Autores

Daniel C. Koboldt, Karyn Meltz Steinberg, David E. Larson, Richard K. Wilson, Elaine R. Mardis,

Tópico(s)

Epigenetics and DNA Methylation

Resumo

Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era. Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era. Prior to the advent of next-generation sequencing (NGS) technology, genomics initially was concerned with studying genomes that were tractable from the standpoint of size and repetitive content (e.g., viruses and bacteria) and with characterization of single genes associated with disease (e.g., Cystic Fibrosis, Huntington disease, and cancer). As the ability to construct large clone-based physical maps improved, the subcloned fragments of the genome contributing to physical map construction could be sequenced as individual projects, and their finished sequences melded together to represent entire chromosomes. Hence, important large genomes, including model organisms and the human genome, were decoded. Indeed, in the era of NGS, the short reads obtained from most platforms absolutely require these reference genomes as a substrate for read alignment prior to variation discovery. The impact of these technologies on genomic variant discovery has been profound, as we will describe. Although we limit the scope of this Review to genomics, an accompanying Review explores the disruptive impact of NGS on studies of the epigenome to further highlight the profound transformation brought on by NGS technology (Rivera and Ren, 2013Rivera C.M. Ren B. Mapping human epigenomes.Cell. 2013; 155 (this issue): 39-55Abstract Full Text Full Text PDF PubMed Scopus (27) Google Scholar [this issue of Cell]). Although NGS technology initially was used to study whole genomes, a variety of approaches that address defined regions of the genome have emerged. There are essentially two technical preparatory approaches to explore selected regions of the genome with NGS. The first is by PCR, typically involving multiple primer pairs in a mixture that are combined with genomic DNA of interest in a multiplex approach to preserve precious DNA. The use of multiplex primer pairs couples the high throughput of NGS platforms and the fact that each sequence read represents a single DNA product in the mixture due to the nature of the sequencing platforms (Mardis, 2013Mardis E.R. Next-generation sequencing platforms.Annu. Rev. Anal. Chem. (Palo Alto Calif.). 2013; 6: 287-303Crossref PubMed Scopus (36) Google Scholar). Following the PCR, the resulting fragments have platform-specific adapters ligated to their ends to form a library that is suitable for sequencing. The second approach involves hybrid capture, which has been developed by several groups and commercialized (Albert et al., 2007Albert T.J. Molla M.N. Muzny D.M. Nazareth L. Wheeler D. Song X. Richmond T.A. Middle C.M. Rodesch M.J. Packard C.J. et al.Direct selection of human genomic loci by microarray hybridization.Nat. Methods. 2007; 4: 903-905Crossref PubMed Scopus (348) Google Scholar, Gnirke et al., 2009Gnirke A. Melnikov A. Maguire J. Rogov P. LeProust E.M. Brockman W. Fennell T. Giannoukos G. Fisher S. Russ C. et al.Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.Nat. Biotechnol. 2009; 27: 182-189Crossref PubMed Scopus (462) Google Scholar, Hodges et al., 2007Hodges E. Xuan Z. Balija V. Kramer M. Molla M.N. Smith S.W. Middle C.M. Rodesch M.J. Albert T.J. Hannon G.J. McCombie W.R. Genome-wide in situ exon capture for selective resequencing.Nat. Genet. 2007; 39: 1522-1527Crossref PubMed Scopus (335) Google Scholar). Essentially, hybrid capture takes advantage of the hybridization of DNA fragments from a whole-genome library to complementary sequences that were synthesized and combined into a mixture of probes designed with high specificity for the matching regions in the genome. These probes typically have covalently linked biotin moieties, enabling a secondary “capture” by mixing the probe:library complexes with streptavidin-coated magnetic beads. Hence, the targeted regions of the genome can be selectively captured from solution by applying a magnetic field, whereas most of the remainder of the genome is washed away in the supernatant. Subsequent denaturation releases the captured library fragments from the beads into solution, ready for postcapture amplification, quantitation, and sequencing. When the probes are designed to capture essentially all of the known coding exons in a genome, the capture approach is referred to as “exome sequencing.” Additional probes may be designed, synthesized, and added to an exome reagent, typically referred to as “exome plus.” When only a subset of the exome or of the genome outside of the exome is targeted, this is called a “targeted panel.” As important as techniques to produce the NGS data that address biological questions are, analytical approaches are equally critical for successful interpretation of those data. Many analytical approaches depend on the digital nature of NGS data, a consequence of the fact that individual DNA fragments of the library are amplified either on beads or on flat surfaces (platform specific) prior to the sequencing reaction. Hence, each sequence read is equivalent to a single DNA fragment. What follows are selected data analysis techniques from a dizzying number of advances published in just the last 18 months. The pace of innovation in analytical approaches to genome-wide data analysis continues to engage and excite the computational biology community as the number of technical applications continues. Technological advances have often driven the methods for discovering new disease genes. Early studies leveraged families in which a disease was segregating to identify the genetic causes of the phenotype. These linkage analysis studies were successful for highly penetrant, monogenic diseases such as cystic fibrosis. Standard parametric linkage studies of some complex traits were successful, particularly when sampling from extreme ends of the phenotypic distribution. For example, analyzing families segregating early onset Alzheimer’s disease led to the discovery of multiple genes that contribute significantly to the phenotype and shed light on the biological mechanisms (e.g., plaque formation) of disease progression (Goate et al., 1991Goate A. Chartier-Harlin M.C. Mullan M. Brown J. Crawford F. Fidani L. Giuffra L. Haynes A. Irving N. James L. et al.Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer’s disease.Nature. 1991; 349: 704-706Crossref PubMed Scopus (2001) Google Scholar, Harrington et al., 1995Harrington C.R. Anderson J.R. Chan K.K. Apolipoprotein E type epsilon 4 allele frequency is not increased in patients with sporadic inclusion-body myositis.Neurosci. Lett. 1995; 183: 35-38Crossref PubMed Scopus (28) Google Scholar, Pericak-Vance et al., 1991Pericak-Vance M.A. Bebout J.L. Gaskell Jr., P.C. Yamaoka L.H. Hung W.Y. Alberts M.J. Walker A.P. Bartlett R.J. Haynes C.A. Welsh K.A. et al.Linkage studies in familial Alzheimer disease: evidence for chromosome 19 linkage.Am. J. Hum. Genet. 1991; 48: 1034-1050PubMed Google Scholar). Yet, for many complex diseases and traits, this model was not as successful because the genetic predispositions to complex traits are, as their name implies, more difficult to elucidate and require larger numbers of samples to discern signal from noise. Theoretically, it was determined that comparing allele frequencies across the genome between large numbers of cases and controls would be able to capture common disease susceptibility alleles (Risch and Merikangas, 1996Risch N. Merikangas K. The future of genetic studies of complex human diseases.Science. 1996; 273: 1516-1517Crossref PubMed Google Scholar), and this ushered in the era of genome-wide association studies (GWAS). It was economically practical to screen thousands of individuals by genotyping hundreds of thousands of common single-nucleotide polymorphisms (SNPs) on microarrays. GWAS are well suited too and have been successful in studying population structure (Price et al., 2010bPrice A.L. Zaitlen N.A. Reich D. Patterson N. New approaches to population stratification in genome-wide association studies.Nat. Rev. Genet. 2010; 11: 459-463Crossref PubMed Scopus (205) Google Scholar), anthropomorphic traits (Berndt et al., 2013Berndt S.I. Gustafsson S. Mägi R. Ganna A. Wheeler E. Feitosa M.F. Justice A.E. Monda K.L. Croteau-Chonka D.C. Day F.R. et al.Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture.Nat. Genet. 2013; 45: 501-512Crossref PubMed Scopus (42) Google Scholar), targets of natural selection such as variants associated with high-altitude adaptation (Bigham et al., 2009Bigham A.W. Mao X. Mei R. Brutsaert T. Wilson M.J. Julian C.G. Parra E.J. Akey J.M. Moore L.G. Shriver M.D. Identifying positive selection candidate loci for high-altitude adaptation in Andean populations.Hum. Genomics. 2009; 4: 79-90PubMed Google Scholar, Bigham et al., 2010Bigham A. Bauchet M. Pinto D. Mao X. Akey J.M. Mei R. Scherer S.W. Julian C.G. Wilson M.J. López Herráez D. et al.Identifying signatures of natural selection in Tibetan and Andean populations using dense genome scan data.PLoS Genet. 2010; 6: e1001116Crossref PubMed Scopus (111) Google Scholar, Scheinfeldt et al., 2012Scheinfeldt L.B. Soi S. Thompson S. Ranciaro A. Woldemeskel D. Beggs W. Lambert C. Jarvis J.P. Abate D. Belay G. Tishkoff S.A. Genetic adaptation to high altitude in the Ethiopian highlands.Genome Biol. 2012; 13: R1Crossref PubMed Scopus (35) Google Scholar), and some complex diseases such as Crohn’s disease (Yamazaki et al., 2005Yamazaki K. McGovern D. Ragoussis J. Paolucci M. Butler H. Jewell D. Cardon L. Takazoe M. Tanaka T. Ichimori T. et al.Single nucleotide polymorphisms in TNFSF15 confer susceptibility to Crohn’s disease.Hum. Mol. Genet. 2005; 14: 3499-3506Crossref PubMed Scopus (252) Google Scholar) and age-related macular degeneration (Klein et al., 2005Klein R.J. Zeiss C. Chew E.Y. Tsai J.Y. Sackler R.S. Haynes C. Henning A.K. SanGiovanni J.P. Mane S.M. Mayne S.T. et al.Complement factor H polymorphism in age-related macular degeneration.Science. 2005; 308: 385-389Crossref PubMed Scopus (2169) Google Scholar). These studies led to hundreds of replicable associated loci that cannot be fully enumerated in this Review. GWAS has perhaps had the most impact in the area of pharmacogenomics, where robust, highly replicable associations have impacted clinical actions. For example, warfarin dose is routinely adjusted based upon VKORC1, CYP2C9, and CYP4F2 genotypes confirmed by GWAS (Takeuchi et al., 2009Takeuchi F. McGinnis R. Bourgeois S. Barnes C. Eriksson N. Soranzo N. Whittaker P. Ranganath V. Kumanduri V. McLaren W. et al.A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose.PLoS Genet. 2009; 5: e1000433Crossref PubMed Scopus (300) Google Scholar), which has significantly improved patient outcomes. Yet, most early GWAS yielded few variants with large effect sizes; this was perhaps to be expected, given the heterogeneity of the phenotypes and sample sizes needed to statistically detect signals of association. The exponentially decreasing cost of next-generation sequencing data generation has put large-scale investigation of rare variation within reach, and there has been a resultant shift in the field of complex disease genetics over the past 5 years. GWAS data strongly suggest that the vast majority of the heritability of complex traits will not be due to a few common variants with low to moderate effects (Schork et al., 2009Schork N.J. Murray S.S. Frazer K.A. Topol E.J. Common vs. rare allele hypotheses for complex diseases.Curr. Opin. Genet. Dev. 2009; 19: 212-219Crossref PubMed Scopus (237) Google Scholar). Rare variation with large effect sizes is likely contributing a significant proportion to the “missing heritability” of complex traits and disease (Cohen et al., 2006Cohen J.C. Pertsemlidis A. Fahmi S. Esmail S. Vega G.L. Grundy S.M. Hobbs H.H. Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels.Proc. Natl. Acad. Sci. USA. 2006; 103: 1810-1815Crossref PubMed Scopus (227) Google Scholar, Manolio, 2009Manolio T.A. Cohort studies and the genetics of complex disease.Nat. Genet. 2009; 41: 5-6Crossref PubMed Scopus (63) Google Scholar, Zhu et al., 2010Zhu X. Feng T. Li Y. Lu Q. Elston R.C. Detecting rare variants for complex traits using family and unrelated data.Genet. Epidemiol. 2010; 34: 171-187Crossref PubMed Scopus (66) Google Scholar). The common disease-common variant versus common disease-rare variant debate remains unresolved. There are still questions that remain as to whether the genetic contribution to common traits can be attributed to an infinite number of common alleles with small effect, a large number of rare alleles with large effects, or some combination of genes and environment (Gibson, 2011Gibson G. Rare and common variants: twenty arguments.Nat. Rev. Genet. 2011; 13: 135-145Crossref Scopus (184) Google Scholar). But the evaluation of rare variants in common disease is ongoing. The advent of NGS has enabled the inquiry of nearly every base in the genome, and thus techniques to reliably interpret and identify millions of variants are being developed. As will be described below, the advantage of sequencing in this regard is that most variants, common and rare, can be discovered with the appropriate sequencing read coverage, algorithmic methods to identify the variants, and a sufficient careful orthogonal validation to confirm true from false positives. The exception to this discovery potential is due to the reliance on alignment to the Human Genome Reference sequence, which is the first step to analysis of NGS data, as this reference does not contain the entirety of novel genome content across all humans. Numerous variant-calling algorithms have been developed for the detection and genotyping of germline SNPs (DePristo et al., 2011DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. et al.A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat. Genet. 2011; 43: 491-498Crossref PubMed Scopus (927) Google Scholar, Koboldt et al., 2009Koboldt D.C. Chen K. Wylie T. Larson D.E. McLellan M.D. Mardis E.R. Weinstock G.M. Wilson R.K. Ding L. VarScan: variant detection in massively parallel sequencing of individual and pooled samples.Bioinformatics. 2009; 25: 2283-2285Crossref PubMed Scopus (241) Google Scholar, Li et al., 2008Li H. Ruan J. Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores.Genome Res. 2008; 18: 1851-1858Crossref PubMed Scopus (1299) Google Scholar, McKenna et al., 2010McKenna A. Hanna M. Banks E. Sivachenko A. Cibulskis K. Kernytsky A. Garimella K. Altshuler D. Gabriel S. Daly M. DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res. 2010; 20: 1297-1303Crossref PubMed Scopus (1288) Google Scholar, Shen et al., 2010Shen Y. Wan Z. Coarfa C. Drabek R. Chen L. Ostrowski E.A. Liu Y. Weinstock G.M. Wheeler D.A. Gibbs R.A. Yu F. A SNP discovery method to assess variant allele probability from next-generation resequencing data.Genome Res. 2010; 20: 273-280Crossref PubMed Scopus (66) Google Scholar) and small indels (Emde et al., 2012Emde A.K. Schulz M.H. Weese D. Sun R. Vingron M. Kalscheuer V.M. Haas S.A. Reinert K. Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS.Bioinformatics. 2012; 28: 619-627Crossref PubMed Scopus (12) Google Scholar, Leone et al., 2013Leone M.A. Barizzone N. Esposito F. Lucenti A. Harbo H.F. Goris A. Kockum I. Oturai A.B. Celius E.G. Mero I.L. et al.International Multiple Sclerosis Genetics ConsortiumWellcome Trust Case Control Consortium 2PROGEMUS GroupPROGRESSO GroupAssociation of genetic markers with CSF oligoclonal bands in multiple sclerosis patients.PLoS ONE. 2013; 8: e64408Crossref PubMed Scopus (4) Google Scholar, Ye et al., 2009Ye K. Schulz M.H. Long Q. Apweiler R. Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads.Bioinformatics. 2009; 25: 2865-2871Crossref PubMed Scopus (243) Google Scholar) in high-throughput sequencing data. Once detected, these variants can be analyzed in case-control studies using the same methods that have been developed for GWAS. However, unlike GWAS (which examines common mutations), sequencing facilitates the discovery of rare mutations that, combined with the continuing unexplained genetic contributions to complex phenotypes from GWAS (Manolio et al., 2009Manolio T.A. Collins F.S. Cox N.J. Goldstein D.B. Hindorff L.A. Hunter D.J. McCarthy M.I. Ramos E.M. Cardon L.R. Chakravarti A. et al.Finding the missing heritability of complex diseases.Nature. 2009; 461: 747-753Crossref PubMed Scopus (2416) Google Scholar), has sparked intense interest in measuring their association with complex phenotypes. This interest has given rise to a variety of statistical tests with varying strategies for detecting association of rare variation with phenotype (Chen et al., 2013Chen H. Meigs J.B. Dupuis J. Sequence kernel association test for quantitative traits in family samples.Genet. Epidemiol. 2013; 37: 196-204Crossref PubMed Scopus (32) Google Scholar, Han and Pan, 2010Han F. Pan W. A data-adaptive sum test for disease association with multiple common or rare variants.Hum. Hered. 2010; 70: 42-54Crossref PubMed Scopus (131) Google Scholar, Ionita-Laza et al., 2013Ionita-Laza I. Lee S. Makarov V. Buxbaum J.D. Lin X. Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants.Am. J. Hum. Genet. 2013; (Published online May 14, 2013)https://doi.org/10.1016/j.ajhg.2013.04.015Abstract Full Text Full Text PDF PubMed Scopus (18) Google Scholar, Lee et al., 2012aLee S. Emond M.J. Bamshad M.J. Barnes K.C. Rieder M.J. Nickerson D.A. Christiani D.C. Wurfel M.M. Lin X. NHLBI GO Exome Sequencing Project—ESP Lung Project TeamOptimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.Am. J. Hum. Genet. 2012; 91: 224-237Abstract Full Text Full Text PDF PubMed Scopus (64) Google Scholar, Lee et al., 2012bLee S. Wu M.C. Lin X. Optimal tests for rare variant effects in sequencing association studies.Biostatistics. 2012; 13: 762-775Crossref PubMed Scopus (70) Google Scholar, Li and Leal, 2008Li B. Leal S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.Am. J. Hum. Genet. 2008; 83: 311-321Abstract Full Text Full Text PDF PubMed Scopus (457) Google Scholar, Liu and Leal, 2010Liu D.J. Leal S.M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions.PLoS Genet. 2010; 6: e1001156Crossref PubMed Scopus (81) Google Scholar, Madsen and Browning, 2009Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (351) Google Scholar, Neale et al., 2011Neale B.M. Rivas M.A. Voight B.F. Altshuler D. Devlin B. Orho-Melander M. Kathiresan S. Purcell S.M. Roeder K. Daly M.J. Testing for an unusual distribution of rare variants.PLoS Genet. 2011; 7: e1001322Crossref PubMed Scopus (174) Google Scholar, Oualkacha et al., 2013Oualkacha K. Dastani Z. Li R. Cingolani P.E. Spector T.D. Hammond C.J. Richards J.B. Ciampi A. Greenwood C.M. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness.Genet. Epidemiol. 2013; 37: 366-376Crossref PubMed Scopus (12) Google Scholar, Price et al., 2010aPrice A.L. Kryukov G.V. de Bakker P.I. Purcell S.M. Staples J. Wei L.J. Sunyaev S.R. Pooled association tests for rare variants in exon-resequencing studies.Am. J. Hum. Genet. 2010; 86: 832-838Abstract Full Text Full Text PDF PubMed Scopus (270) Google Scholar, Wu et al., 2011Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Abstract Full Text Full Text PDF PubMed Scopus (287) Google Scholar, Zhang et al., 2011Zhang Q. Irvin M.R. Arnett D.K. Province M.A. Borecki I. A data-driven method for identifying rare variants with heterogeneous trait effects.Genet. Epidemiol. 2011; 35: 679-685Crossref PubMed Scopus (7) Google Scholar). In any single gene, there are a large number of rare variants due to recent human population growth (Coventry et al., 2010Coventry A. Bull-Otterson L.M. Liu X. Clark A.G. Maxwell T.J. Crosby J. Hixson J.E. Rea T.J. Muzny D.M. Lewis L.R. et al.Deep resequencing reveals excess rare recent variants consistent with explosive population growth.Nat. Commun. 2010; 1: 131Crossref PubMed Scopus (83) Google Scholar, Nelson et al., 2012Nelson M.R. Wegmann D. Ehm M.G. Kessner D. St Jean P. Verzilli C. Shen J. Tang Z. Bacanu S.A. Fraser D. et al.An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.Science. 2012; 337: 100-104Crossref PubMed Scopus (173) Google Scholar, Tennessen et al., 2012Tennessen J.A. Bigham A.W. O’Connor T.D. Fu W. Kenny E.E. Gravel S. McGee S. Do R. Liu X. Jun G. et al.Broad GOSeattle GONHLBI Exome Sequencing ProjectEvolution and functional impact of rare coding variation from deep sequencing of human exomes.Science. 2012; 337: 64-69Crossref PubMed Scopus (356) Google Scholar), and there may be many nonassociated variants in a gene. Furthermore, even in large cohorts, there may not be enough individuals with a given variant to achieve statistical significance. To deal with the aforementioned challenge, all of these types of tests share the common feature that they group or collapse rare variation, usually by gene, in order to increase statistical power (see Wu et al., 2013Wu X. Wang L. Ye Y. Aakre J.A. Pu X. Chang G.C. Yang P.C. Roth J.A. Marks R.S. Lippman S.M. et al.Genome-wide association study of genetic predictors of overall survival for non-small cell lung cancer in never smokers.Cancer Res. 2013; 73: 4028-4038Crossref PubMed Scopus (3) Google Scholar for a recent review). Early tests (such as the cohort allelic sums test [Morgenthaler and Thilly, 2007Morgenthaler S. Thilly W.G. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST).Mutat. Res. 2007; 615: 28-56Crossref PubMed Scopus (154) Google Scholar] and the combined multivariate collapsing method [Li and Leal, 2008Li B. Leal S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.Am. J. Hum. Genet. 2008; 83: 311-321Abstract Full Text Full Text PDF PubMed Scopus (457) Google Scholar]) assumed that each variant had the same direction of effect and, in addition, required a fixed minor allele frequency cutoff to define which variants to include; but these assumptions are not always valid or optimal. Further innovations have allowed for weighting of individual variants (for example, by variant frequency in the weighted sum statistic [Madsen and Browning, 2009Madsen B.E. Browning S.R. A groupwise association test for rare mutations using a weighted sum statistic.PLoS Genet. 2009; 5: e1000384Crossref PubMed Scopus (351) Google Scholar] or the data [Han and Pan, 2010Han F. Pan W. A data-adaptive sum test for disease association with multiple common or rare variants.Hum. Hered. 2010; 70: 42-54Crossref PubMed Scopus (131) Google Scholar, Lin and Tang, 2011Lin D.Y. Tang Z.Z. A general framework for detecting disease associations with rare variants in sequencing studies.Am. J. Hum. Genet. 2011; 89: 354-367Abstract Full Text Full Text PDF PubMed Scopus (91) Google Scholar, Wu et al., 2011Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Abstract Full Text Full Text PDF PubMed Scopus (287) Google Scholar, Zhang et al., 2011Zhang Q. Irvin M.R. Arnett D.K. Province M.A. Borecki I. A data-driven method for identifying rare variants with heterogeneous trait effects.Genet. Epidemiol. 2011; 35: 679-685Crossref PubMed Scopus (7) Google Scholar]), variants with heterogeneous direction of effect (Han and Pan, 2010Han F. Pan W. A data-adaptive sum test for disease association with multiple common or rare variants.Hum. Hered. 2010; 70: 42-54Crossref PubMed Scopus (131) Google Scholar, Lin and Tang, 2011Lin D.Y. Tang Z.Z. A general framework for detecting disease associations with rare variants in sequencing studies.Am. J. Hum. Genet. 2011; 89: 354-367Abstract Full Text Full Text PDF PubMed Scopus (91) Google Scholar, Neale et al., 2011Neale B.M. Rivas M.A. Voight B.F. Altshuler D. Devlin B. Orho-Melander M. Kathiresan S. Purcell S.M. Roeder K. Daly M.J. Testing for an unusual distribution of rare variants.PLoS Genet. 2011; 7: e1001322Crossref PubMed Scopus (174) Google Scholar, Wu et al., 2011Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Abstract Full Text Full Text PDF PubMed Scopus (287) Google Scholar, Zhang et al., 2011Zhang Q. Irvin M.R. Arnett D.K. Province M.A. Borecki I. A data-driven method for identifying rare variants with heterogeneous trait effects.Genet. Epidemiol. 2011; 35: 679-685Crossref PubMed Scopus (7) Google Scholar), and selection of the ideal frequency cutoff for rare variants (Price et al., 2010aPrice A.L. Kryukov G.V. de Bakker P.I. Purcell S.M. Staples J. Wei L.J. Sunyaev S.R. Pooled association tests for rare variants in exon-resequencing studies.Am. J. Hum. Genet. 2010; 86: 832-838Abstract Full Text Full Text PDF PubMed Scopus (270) Google Scholar). Though this remains an active area of research, the SKAT family of tests (Chen et al., 2013Chen H. Meigs J.B. Dupuis J. Sequence kernel association test for quantitative traits in family samples.Genet. Epidemiol. 2013; 37: 196-204Crossref PubMed Scopus (32) Google Scholar, Ionita-Laza et al., 2013Ionita-Laza I. Lee S. Makarov V. Buxbaum J.D. Lin X. Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants.Am. J. Hum. Genet. 2013; (Published online May 14, 2013)https://doi.org/10.1016/j.ajhg.2013.04.015Abstract Full Text Full Text PDF PubMed Scopus (18) Google Scholar, Lee et al., 2012aLee S. Emond M.J. Bamshad M.J. Barnes K.C. Rieder M.J. Nickerson D.A. Christiani D.C. Wurfel M.M. Lin X. NHLBI GO Exome Sequencing Project—ESP Lung Project TeamOptimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.Am. J. Hum. Genet. 2012; 91: 224-237Abstract Full Text Full Text PDF PubMed Scopus (64) Google Scholar, Lee et al., 2012bLee S. Wu M.C. Lin X. Optimal tests for rare variant effects in sequencing association studies.Biostatistics. 2012; 13: 762-775Crossref PubMed Scopus (70) Google Scholar, Oualkacha et al., 2013Oualkacha K. Dastani Z. Li R. Cingolani P.E. Spector T.D. Hammond C.J. Richards J.B. Ciampi A. Greenwood C.M. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness.Genet. Epidemiol. 2013; 37: 366-376Crossref PubMed Scopus (12) Google Scholar, Wu et al., 2011Wu M.C. Lee S. Cai T. Li Y. Boehnke M. Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test.Am. J. Hum. Genet. 2011; 89: 82-93Abstract Full Text Full Text PDF PubMed Scopus (287) Google Scholar) has emerged as one of the most popular. SKAT and its variants allow for inclusion of covariates for managing both case-control and quantitative data and family or unrelated data, and they are computationally undemanding. Although the initial version of SKAT lost power in cases in which all variants in a gene have the same direction of effect, the newer SKAT-O (Lee et al., 2012aLee S. Emond M.J. Bamshad M.J. Barnes K.C. Rieder M.J. Nickerson D.A. Christiani D.C. Wurfel M.M. Lin X. NHLBI GO Exome Sequencing Project—ESP Lung Project TeamOptimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.Am. J. Hum. Genet. 2012; 91: 224-237Abstract Full Text Full Text PDF PubMed Scopus (64) Google Scholar) t

Referência(s)