Improved Diagnosis of Rare Disease Patients through Systematic Detection of Runs of Homozygosity
2020; Elsevier BV; Volume: 22; Issue: 9 Linguagem: Inglês
10.1016/j.jmoldx.2020.06.008
ISSN1943-7811
AutoresLeslie Matalonga, Steven Laurie, Anastasios Papakonstantinou, Davide Piscia, Elisabetta Mereu, Gemma Bullich, Rachel Thompson, Rita Horváth, Luis A. Pérez‐Jurado, Olaf Rieß, Marta Gut, Gert‐Jan B. van Ommen, Hanns Lochmüller, Sergi Beltrán, Alessandra Renieri, Ali Dursun, Antoni Matilla‐Dueñas, Bru Cormand, Carlo Rivolta, Carmen Ayuso, Carmen Espinós, Christian Scerri, Dilek Yalnızoğlu, Doriette Soler, Éva Morava, Fabrizio Barbetti, Francesca Forzano, Francesca Mari, Francesco Muntoni, Frederic Tort, Henry Houlden, María‐Isabel Tejada, Jan Senderek, Javier Benı́tez, Javier Corral De La Calle, Jordi Serra, José M. Millán, José Carlos Segovia, Juan Ramón Gimeno Blanes, Judith Armstrong, Rıza Köksal Özgül, Laura Vilarinho, Lluı́s Montoliu, Manuel Posada de la Paz, Maria Antonietta Mencarelli, Marina Mora, Paola Bianchi, Pavel Seeman, Perry Elliott, Alessandra Ferlini, Alexis Brice, Brunhilde Wirth, Francesco Muntoni, Michael G. Hanna, Sarah J. Tabrizi, Thomas Klockgether, Vincent Timmerman, Volker Straub, Semra Hız Kurul, Yavuz Oktay, Serdal Güngör, Ahmet Yaramış, Uluç Yiş, Alfons Macaya, Antònia Ribes, Aurora Pujol, Conxi Lázaro, Daniel Grinberg, Eduardo F. Tizzano, Francesc Cardellach, Francesc Palau, Montserrat Milà, P. Gallano, Rafael Artuch, Ramon MartiSeves, Gonzalo Villanueva, Silvia M. Vidal, Glòria Garrabou, Susana Balcells, Roser Urreizti, Estrella López‐Martín, Ivon Cuscó, Irene Valenzuela, María Sabater‐Molina,
Tópico(s)Genetics and Neurodevelopmental Disorders
ResumoAutozygosity is associated with an increased risk of genetic rare disease, thus being a relevant factor for clinical genetic studies. More than 2400 exome sequencing data sets were analyzed and screened for autozygosity on the basis of detection of >1 Mbp runs of homozygosity (ROHs). A model was built to predict if an individual is likely to be a consanguineous offspring (accuracy, 98%), and probability of consanguinity ranges were established according to the total ROH size. Application of the model resulted in the reclassification of the consanguinity status of 12% of the patients. The analysis of a subset of 79 consanguineous cases with the Rare Disease (RD)–Connect Genome-Phenome Analysis Platform, combining variant filtering and homozygosity mapping, enabled a 50% reduction in the number of candidate variants and the identification of homozygous pathogenic variants in 41 patients, with an overall diagnostic yield of 52%. The newly defined consanguinity ranges provide, for the first time, specific ROH thresholds to estimate inbreeding within a pedigree on disparate exome sequencing data, enabling confirmation or (re)classification of consanguineous status, hence increasing the efficiency of molecular diagnosis and reporting on secondary consanguinity findings, as recommended by American College of Medical Genetics and Genomics guidelines. Autozygosity is associated with an increased risk of genetic rare disease, thus being a relevant factor for clinical genetic studies. More than 2400 exome sequencing data sets were analyzed and screened for autozygosity on the basis of detection of >1 Mbp runs of homozygosity (ROHs). A model was built to predict if an individual is likely to be a consanguineous offspring (accuracy, 98%), and probability of consanguinity ranges were established according to the total ROH size. Application of the model resulted in the reclassification of the consanguinity status of 12% of the patients. The analysis of a subset of 79 consanguineous cases with the Rare Disease (RD)–Connect Genome-Phenome Analysis Platform, combining variant filtering and homozygosity mapping, enabled a 50% reduction in the number of candidate variants and the identification of homozygous pathogenic variants in 41 patients, with an overall diagnostic yield of 52%. The newly defined consanguinity ranges provide, for the first time, specific ROH thresholds to estimate inbreeding within a pedigree on disparate exome sequencing data, enabling confirmation or (re)classification of consanguineous status, hence increasing the efficiency of molecular diagnosis and reporting on secondary consanguinity findings, as recommended by American College of Medical Genetics and Genomics guidelines. It is estimated that 350 million individuals worldwide experience one of approximately 7000 existing rare diseases (RDs).1Nguengang Wakap S. Lambert D.M. Olry A. Rodwell C. Gueydan C. Lanneau V. Murphy D. Le Cam Y. Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database.Eur J Hum Genet. 2020; 28: 165-173Crossref PubMed Scopus (382) Google Scholar The low prevalence of each disease and the high heterogeneity and variability of clinical symptoms make diagnosis and accessibility to appropriate treatment a real challenge. As emphasized by the International Rare Disease Research Consortium, progress in this field requires identification of RDs and their causes to develop appropriate treatments,2Austin C.P. Cutillo C.M. Lau L.P.L. Jonker A.H. Rath A. Julkowska D. Thomson D. Terry S.F. de Montleau B. Ardigò D. Hivert V. Boycott K.M. Baynam G. Kaufmann P. Taruscio D. Lochmüller H. Suematsu M. Incerti C. Draghia-Akli R. Norstedt I. Wang L. Dawkins H.J.S. International Rare Diseases Research Consortium (IRDiRC)Future of rare diseases research 2017-2027: an IRDiRC perspective.Clin Transl Sci. 2018; 11: 21-27Crossref PubMed Scopus (118) Google Scholar and as 80% of RDs are thought to have a genetic origin, particular emphasis has been placed on the rapidly expanding development of genomic technologies. The next-generation sequencing era has enabled cost-effective sequencing of RD patients' exomes or genomes, bringing these approaches into diagnostics.3Boycott K.M. Hartley T. Biesecker L.G. Gibbs R.A. Innes A.M. Riess O. Belmont J. Dunwoodie S.L. Jojic N. Lassmann T. Mackay D. Temple I.K. Visel A. Baynam G. A: Diagnosis for all rare genetic diseases: the horizon and the next frontiers.Cell. 2019; 177: 32-37Abstract Full Text Full Text PDF PubMed Scopus (62) Google Scholar However, the interpretation of the genome is still a real challenge for molecular geneticists, and innovative bioinformatics solutions combining genomic and clinical data are crucial for reaching a diagnosis.4Boycott K.M. Hartley T. Biesecker L.G. Gibbs R.A. Innes A.M. Riess O. Belmont J. Dunwoodie S.L. Jojic N. Lassmann T. Mackay D. Temple I.K. Visel A. Baynam G. International cooperation to enable the diagnosis of all rare genetic diseases.Am J Hum Genet. 2017; 100: 695-705Abstract Full Text Full Text PDF PubMed Scopus (214) Google Scholar,5Lochmüller H. Badowska D.M. Thompson R. Knoers N.V. Aartsma-Rus A. Gut I. Wood L. Harmuth T. Durudas A. Graessner H. Schaefer F. Riess O. RD-connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases.Eur J Hum Genet. 2018; 26: 778-785Crossref PubMed Scopus (38) Google Scholar Data sharing and analysis platforms, such as the RD-Connect Genome-Phenome Analysis Platform (GPAP; https://platform.rd-connect.eu, registration required, last accessed May 9, 2020),5Lochmüller H. Badowska D.M. Thompson R. Knoers N.V. Aartsma-Rus A. Gut I. Wood L. Harmuth T. Durudas A. Graessner H. Schaefer F. Riess O. RD-connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases.Eur J Hum Genet. 2018; 26: 778-785Crossref PubMed Scopus (38) Google Scholar,6Thompson R. Johnston L. Taruscio D. Monaco L. Béroud C. Gut I.G. Hansson M.G. 't Hoen P.B. Patrinos G.P. Dawkins H. Ensini M. Zatloukal K. Koubi D. Heslop E. Paschall J.E. Posada M. Robinson P.N. Bushby K. Lochmüller H. RD-connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research.J Gen Intern Med. 2014; 29 Suppl 3: S780-S787Crossref PubMed Scopus (133) Google Scholar have emerged to provide methods and standardized analyses of phenotypic and (gen)omic data to facilitate the mutation detection processes. Autozygosity, as a result of consanguineous mating, has long been known to be a risk factor for RDs of genetic origin through a variety of effects, such as reduction in genetic variation, increased frequency of homozygous genotypes for deleterious alleles, and lower population viability.7Ouborg N.J. Pertoldi C. Loeschcke V. Bijlsma R.K. Hedrick P.W. Conservation genetics in transition to conservation genomics.Trends Genet. 2010; 26: 177-187Abstract Full Text Full Text PDF PubMed Scopus (264) Google Scholar The deleterious consequences in populations with higher prevalence of consanguinity, due to physical or cultural isolation, have been widely reported (reviewed in Fareed and Afzal8Fareed M., Afzal M. Genetics of consanguinity and inbreeding in health and disease.Ann Hum Biol. 2017; 44: 99-107Crossref PubMed Scopus (47) Google Scholar), and many rare recessive disease genes have been identified by homozygosity mapping in which large regions flanking the disease-causing variant are expected to be identical by descent in affected individuals whose parents are related.9Matthijs G. Rymen D. Millón M.B. Souche E. Race V. Approaches to homozygosity mapping and exome sequencing for the identification of novel types of CDG.Glycoconj J. 2013; 30: 67-76Crossref PubMed Scopus (15) Google Scholar,10Vahidnezhad H. Youssefian L. Jazayeri A. Uitto J. Research techniques made simple: genome-wide homozygosity/autozygosity mapping is a powerful tool for identifying candidate genes in autosomal recessive genetic diseases.J Invest Dermatol. 2018; 138: 1893-1900Abstract Full Text Full Text PDF PubMed Scopus (34) Google Scholar In addition, about one-third of autosomal recessive rare disorders occurring in families with no known consanguinity are caused by homozygous variants located in regions likely identical by descent.11Posey J.E. O'Donnell-Luria A.H. Chong J.X. Harel T. Jhangiani S.N. Coban Akdemir Z.H. et al.Insights into genetics, human biology and disease gleaned from family based genomic studies.Genet Med. 2019; 21: 798-812Abstract Full Text Full Text PDF PubMed Scopus (103) Google Scholar Next-generation sequencing technologies allow precise detection of genomic regions where a reduction in heterozygosity is evident and offer the opportunity to estimate autozygosity at the exome and genome level.12Pippucci T. Magi A. Gialluisi A. Romeo G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies.Hum Hered. 2014; 77: 63-72Crossref PubMed Scopus (20) Google Scholar Different software, such as HomozygosityMapper,13Seelow D. Schuelke M. Hildebrandt F. Nürnberg P. HomozygosityMapper--an interactive approach to homozygosity mapping.Nucleic Acids Res. 2009; 37: W593-W599Crossref PubMed Scopus (300) Google Scholar PLINK,14Purcell S. Neale B. Todd-Brown K. Thomas L. Ferreira M.A. Bender D. Maller J. Sklar P. de Bakker P.I. Daly M.J. Sham P.C. PLINK: a tool set for whole genome association and population-based linkage analyses.Am J Hum Genet. 2007; 81: 559-575Abstract Full Text Full Text PDF PubMed Scopus (19634) Google Scholar HomSI,15Görmez Z. Bakir-Gungor B. Sagiroglu M.S. HomSI: a homozygous stretch identifier from next-generation sequencing data.Bioinformatics. 2014; 30: 445-447Crossref PubMed Scopus (34) Google Scholar and H3M2,16Magi A. Tattini L. Palombo F. Benelli M. Gialluisi A. Giusti B. Abbate R. Seri M. Gensini G.F. Romeo G. Pippucci T. H3M2: detection of runs of homozygosity from whole-exome sequencing data.Bioinformatics. 2014; 30: 2852-2859Crossref PubMed Scopus (72) Google Scholar has been developed for the detection of runs of homozygosity (ROHs) from exome and genome sequencing data.12Pippucci T. Magi A. Gialluisi A. Romeo G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies.Hum Hered. 2014; 77: 63-72Crossref PubMed Scopus (20) Google Scholar,17Gibson J. Morton N.E. Collins Extended tracts of homozygosity in outbred human populations.Hum Mol Genet. 2006; 15: 789-795Crossref PubMed Scopus (287) Google Scholar The identification of autozygous regions through the detection of contiguous lengths of homozygous segments of the genome where the two haplotypes inherited are identical has been applied in multiple population genomic studies (reviewed in Ceballos et al18Ceballos F.C. Joshi P.K. Clark D.W. Ramsay M. Wilson J.F. Runs of homozygosity: windows into population history and trait architecture.Nat Rev Genet. 2018; 19: 220-234Crossref PubMed Scopus (262) Google Scholar). Different homozygosity mapping software presents specific advantages and limitations, as reviewed in Howrigan et al19Howrigan D.P. Simonson M.A. Keller M.C. Detecting autozygosity through runs of homozygosity: a comparison of three autozygosity detection algorithms.BMC Genomics. 2011; 12: 460Crossref PubMed Scopus (208) Google Scholar and Oliveira et al20Oliveira J. Pereira R. Santos R. Sousa M. Homozygosity mapping using whole-exome sequencing: a valuable approach for pathogenic variant identification in genetic diseases.Bioinformatics (Biostec). 2017; 3: 210-216Google Scholar. Part of the limitations encompass the use of exome sequencing (ES), which by definition fragments genomic data, thus interfering with the identification of homozygous regions. Recently, optimized protocols for homozygosity mapping based on ES and using PLINK software have been published, with promising results.10Vahidnezhad H. Youssefian L. Jazayeri A. Uitto J. Research techniques made simple: genome-wide homozygosity/autozygosity mapping is a powerful tool for identifying candidate genes in autosomal recessive genetic diseases.J Invest Dermatol. 2018; 138: 1893-1900Abstract Full Text Full Text PDF PubMed Scopus (34) Google Scholar,21Kancheva D. Atkinson D. De Rijk P. Zimon M. Chamova T. Mitev V. Yaramis A. Maria Fabrizi G. Topaloglu H. Tournev I. Parman Y. Parma Y. Battaloglu E. Estrada-Cuzcano A. Jordanova A. Novel mutations in genes causing hereditary spastic paraplegia and Charcot-Marie-Tooth neuropathy identified by an optimized protocol for homozygosity mapping based on whole-exome sequencing.Genet Med. 2016; 18: 600-607Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar,22Masingue M. Perrot J. Carlier R.Y. Piguet-Lacroix G. Latour P. Stojkovic T. WES homozygosity mapping in a recessive form of Charcot-Marie-Tooth neuropathy reveals intronic GDAP1 variant leading to a premature stop codon.Neurogenetics. 2018; 19: 67-76Crossref PubMed Scopus (4) Google Scholar Herein, we report on the integration of genomic analysis and autozygosity assessment on the basis of the detection of long [>1 megabase (Mb)] ROHs in >2400 ES data sets from the RD-Connect GPAP. A subset of these measurements was used to generate a model to determine the likelihood of an individual being the offspring of consanguineous parents. To assess this approach, individuals were classified according to these consanguinity ranges, and a subset of consanguineous offspring was subsequently analyzed in the RD-Connect GPAP, applying ROH-specific region filtering to identify the disease-causing variant(s). To our knowledge, this is the first study providing thresholds based on total ROH length to estimate consanguinity from ES data regardless of sequencing center and protocol used and the largest study attempting to combine ES and ROH detection approaches to identify genetic defects of different types of RDs, reaching a diagnostic yield of 52% in consanguineous probands. The consanguinity ranges defined herein for ES data will facilitate inbreeding estimation in clinical laboratories and enable confirmation, or (re)classification, of consanguineous cases, hence increasing the efficiency of molecular diagnosis and reporting on secondary consanguinity findings, as recommended by American College of Medical Genetics and Genomics (ACMG) guidelines.23Rehder C.W. David K.L. Hirsch B. Toriello H.V. Wilson C.M. Kearney H.M. American College of Medical Genetics and Genomics: standards and guidelines for documenting suspected consanguinity as an incidental finding of genomic testing.Genet Med. 2013; 15: 150-152Abstract Full Text Full Text PDF PubMed Scopus (74) Google Scholar This study includes clinical and genomic data from 2432 individuals collated within the RD-Connect GPAP (data set C) (Figure 1) and 76 individuals from an independent project, the Undiagnosed Rare Disease Program of Catalonia (URDCAT; https://www.urdcat.cat/home, last accessed May 9, 2020) (data set B) (Figure 1). Clinical information concerning reported consanguinity and ethnicity classification, according to the Ontology of Precision Medicine and Investigation (OPMI; http://www.ontobee.org/ontology/OPMI, last accessed December 1, 2019) database, were obtained for each individual, where available. As required by the RD-Connect and URDCAT adherence agreements, patient consent allowing the sharing of pseudonymized clinical information with international collaborators and researchers was obtained for all individuals included in this study. This study adheres to the principles set out in the Declaration of Helsinki. Different data sets and subsets were used to train, test, and apply the model described in this study (Figure 1). Data set A, referred to as training data set, includes 199 index cases for which presence/absence of consanguinity was determined by kinship analysis and was used to define the logistic regression model. Data set B, referred to as testing data set, includes 76 index cases from URDCAT for which presence/absence of consanguinity status was determined by kinship analysis and was used to test our model. Data set C, referred to as whole data set, includes 2432 individuals (index cases and relatives) from the RD-Connect GPAP to which our model was applied. Finally, data set D, referred to as diagnostic data set, includes 79 index cases from data set C in which genomic data were combined with ROH results to identify the pathogenic variants responsible for different types of RDs. In total, ES data derived from 2432 individuals from RD-Connect GPAP, sequenced using six different exome capture kit protocols [Nextera Rapid Exome (Illumina, San Diego, CA), Nimblegen SeqCap EZ MedExome (Roche, Basel, Switzerland), SureSelect version 5 (Agilent, Santa Clara, CA), Broad Custom Exome (Broad Institute, Cambridge, MA), Nextera Expanded Exome (Illumina), and Illumina TruSeq Expanded Exome], with target capture sizes ranging from 37 Mb to 62 Mb, and 76 individuals from URDCAT, sequenced using five different exome capture kit protocols [Nimblegen SeqCapEZ Exome (Roche) and Agilent SureSelect version 3, version 4, version 5, and version 6 (Agilent)], with target capture sizes ranging from 50 to 64 Mb, were included in the study. In all cases, sequencing reads were processed using the RD-Connect GPAP standardized analysis pipeline based on GATK3.6 best practices, as described in Laurie et al,24Laurie S. Fernandez-Callejo M. Marco-Sola S. Trotta J.R. Camps J. Chacón A. Espinosa A. Gut M. Gut I. Heath S. Beltran S. From wet-lab to variations: concordance and speed of bioinformatics pipelines for whole genome and whole exome sequencing.Hum Mutat. 2016; 37: 1263-1271Crossref PubMed Scopus (31) Google Scholar and the resultant variant calls were used for ROH detection and made available for analysis through the RD-Connect GPAP. Quality filtering of the processed data (VCF files) was performed to minimize the impact of low-quality variant calls resulting from sequencing artifacts, misalignment, or low coverage. Insertions/deletions were discounted, and only single-nucleotide variants covered by a minimum read depth of 10 reads and a genotype quality of at least 90 were included in the analysis. For each individual, ROHs were identified using PLINK version 1.9014Purcell S. Neale B. Todd-Brown K. Thomas L. Ferreira M.A. Bender D. Maller J. Sklar P. de Bakker P.I. Daly M.J. Sham P.C. PLINK: a tool set for whole genome association and population-based linkage analyses.Am J Hum Genet. 2007; 81: 559-575Abstract Full Text Full Text PDF PubMed Scopus (19634) Google Scholar-homozyg option, applying the optimal parameters defined by Kancheva et al,21Kancheva D. Atkinson D. De Rijk P. Zimon M. Chamova T. Mitev V. Yaramis A. Maria Fabrizi G. Topaloglu H. Tournev I. Parman Y. Parma Y. Battaloglu E. Estrada-Cuzcano A. Jordanova A. Novel mutations in genes causing hereditary spastic paraplegia and Charcot-Marie-Tooth neuropathy identified by an optimized protocol for homozygosity mapping based on whole-exome sequencing.Genet Med. 2016; 18: 600-607Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar. This method is designed for whole exome sequencing data and assumes intronic and intergenic regions to be homozygous when surrounded by two detected homozygous coding regions.21Kancheva D. Atkinson D. De Rijk P. Zimon M. Chamova T. Mitev V. Yaramis A. Maria Fabrizi G. Topaloglu H. Tournev I. Parman Y. Parma Y. Battaloglu E. Estrada-Cuzcano A. Jordanova A. Novel mutations in genes causing hereditary spastic paraplegia and Charcot-Marie-Tooth neuropathy identified by an optimized protocol for homozygosity mapping based on whole-exome sequencing.Genet Med. 2016; 18: 600-607Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar PLINK was run for each sample to identify ROH size with a minimum length of 1 Mb to exclude common shorter ROHs. Plots were generated using RStudio version 1.0.143 (RStudio, Boston, MA). For this study, consanguineous individuals were defined as being the offspring of third degree (equivalent to being first cousins) or more closely related parents (ie, having a kinship coefficient >0.045).25Manichaikul A. Mychaleckyj J.C. Rich S.S. Daly K. Sale M. Chen W.M. Robust relationship inference in genome-wide association studies.Bioinformatics. 2010; 26: 2867-2873Crossref PubMed Scopus (1235) Google Scholar To build a logistic regression model to predict if an individual is likely to be a consanguineous offspring according to the total ROH size, we first identified a subset of samples for which presence or absence of consanguinity had been clinically reported and subsequently experimentally confirmed by trio kinship analysis using -relatedness2 from vcftools. In total, 98 index cases were confirmed as consanguineous offspring (kinship coefficient > 0.045) and 101 were confirmed as nonconsanguineous (kinship coefficient < 0.045). These cases (199 in total) were included in data set A (Figure 1), referred as training data set. Two thirds of data set A, 62 consanguineous cases and 66 nonconsanguineous cases, was used to train a logistic model, and the remaining 71 samples were used to define four consanguinity ranges: nonconsanguineous (consanguinity probability < 0.05), uncertain (0.05 < consanguinity probability < 0.5), probably consanguineous (0.5 < consanguinity probability < 0.95), and consanguineous (consanguinity probability > 0.95). The accuracy of the model is 98%, with a P value of 0.05. The specificity, sensitivity, and accuracy of the model for predicting consanguinity status was evaluated using data set B, referred to as testing data set (Figure 1), which includes 76 index cases from URDCAT for which presence or absence of consanguinity status was determined by kinship analysis. Genomic data were analyzed using the RD-Connect GPAP, which enables the combination of variant filtering and homozygosity mapping. Identification of putative disease-causing variants in consanguineous or probably consanguineous individuals was achieved by applying the following filters: homozygous variant, minimum depth of coverage of 10, variants classified as having a high (disruptive) or moderate (amino acid change) impact on the protein, according to SnpEff, and observed population allele frequency <0.02, according to gnomAD,26Karczewski K.J. Francioli L.C. Tiao G. Cummings B.B. Alföldi J. Wang Q. et al.The mutational constraint spectrum quantified from variation in 141,456 humans.Nature. 2020; 581: 434-443Crossref PubMed Scopus (3193) Google Scholar ExAC,27Lek M. Karczewski K.J. Minikel E.V. Samocha K.E. Banks E. Fennell T. et al.Analysis of protein-coding genetic variation in 60,706 humans.Nature. 2016; 536: 285-291Crossref PubMed Scopus (6555) Google Scholar and 1000 Genomes Project28Auton A. Brooks L.D. Durbin R.M. Garrison E.P. Kang H.M. Korbel J.O. Marchini J.L. McCarthy S. McVean G.A. Abecasis G.R. 1000 Genomes Project ConsortiumA global reference for human genetic variation.Nature. 2015; 526: 68-74Crossref PubMed Scopus (8429) Google Scholar databases. When no interesting variants were identified, other inheritances and genotypes were assessed (eg, autosomal recessive inheritance associated with compound heterozygous variants and X-linked inheritance). Candidate variants were classified following the ACMG standards and guidelines for interpretation29Richards S. Aziz N. Bale S. Bick D. Das S. Gastier-Foster J. Grody W.W. Hegde M. Lyon E. Spector E. Voelkerding K. Rehm H.L. ACMG Laboratory Quality Assurance CommitteeStandards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.Genet Med. 2015; 17: 405-424Abstract Full Text Full Text PDF PubMed Scopus (14622) Google Scholar and proposed to the corresponding submitter for confirmation of molecular diagnosis. A paired samples Wilcoxon test was performed for the comparison of the number of rare homozygous variant with or without applying a 1-Mb ROH filter. Statistical significance was determined as P < 0.05. To predict if an individual is likely to be a consanguineous offspring, according to the total ROH size identified from ES data, we analyzed ROH results from data set A, the training data set (Figure 1 and Figure 2A). Consanguinity was defined as unions contracted between individuals biologically related as first cousins (equivalent to third-degree relationship) or closer. We built a logistic regression model to define the probability of consanguinity according to the total ROH size identified by ROH analysis (Figure 2B). We used this model and probabilities of being a consanguineous offspring of 5%, 50%, and 95% to define four consanguinity ranges: nonconsanguineous (total ROH size < 22 Mb), uncertain (22 Mb < total ROH size < 79 Mb), probably consanguineous (79 Mb < total ROH size < 123 Mb), and consanguineous (total ROH size > 123 Mb) (Table 1). If we consider the percentage of the genome that is homozygous (Froh) assuming a total autosomal genomic length of 2691 Mb for GRCh37/hg19 (https://www.ncbi.nlm.nih.gov/assembly, accession number GRCh38.p13), consanguinity ranges can be extrapolated as follows: nonconsanguineous (Froh < 0.8% of the genome), uncertain (0.8% < Froh < 2.9%), probably consanguineous (2.9% < Froh < 4.6%), and consanguineous (total ROH size > 4.6%) (Table 1). The robustness of this approach was tested using an independent data set B, defined as testing data set (Figure 1). The sensitivity (true-positive rate), specificity (true-negative rate), and accuracy (degree of closeness to a true value) of the test were 80%, 99%, and 97%, respectively (Figure 2B and Supplemental Table S1). According to the established thresholds, 194 of the 199 cases included in the training data set were correctly classified, and five nonconsanguineous cases were incorrectly classified as (probably) consanguineous (Supplemental Table S2). Two of the five cases (C4 and C5) having a total ROH size of 82.4 and 80 Mb, respectively, were close to the defined threshold of 79 Mb. All cases presented an ES median coverage between 57 and 94.Table 1Consanguinity Classification, According to the Model Described in This StudyROH interval, MbFroh, (%)∗Froh is defined as the percentage of the genome that is homozygous compared with the total autosomal genomic length (approximately 2691 Mb for GRCh37/hg19).Experimental offspring consanguinity classification>123>4.6Consanguineous79–1232.9–4-6Probably consanguineous22–790.8–2.9Uncertain<22<0.8NonconsanguineousMb, megabase; ROH, run of homozygosity.∗ Froh is defined as the percentage of the genome that is homozygous compared with the total autosomal genomic length (approximately 2691 Mb for GRCh37/hg19). Open table in a new tab Mb, megabase; ROH, run of homozygosity. In some populations in North Africa, West Asia, or South India, consanguineous marriages are culturally and socially favored. This fact, together with existing consanguinity in isolated populations, results in almost 10% of the world population either being married to a biological relative or being a consanguineous offspring.8Fareed M., Afzal M. Genetics of consanguinity and inbreeding in health and disease.Ann Hum Biol. 2017; 44: 99-107Crossref PubMed Scopus (47) Google Scholar To know to which extent population origin may affect the consanguinity ranges defined above, total ROH size per individual was assessed across the different ethnicities reported in the training data set (Figure 2C). When analyzing the nonconsanguineous cohort across different ethnicities, total ROH size medians were, as expected, within the nonconsanguineous range for European, Latin American, and Middle Eastern individuals (Figure 2C). The median total ROH size was above the nonconsanguineous range in two different ethnicities: Arabs and Asians. Indeed, two of the incorrectly classified cases mentioned above (C2 and C3) (Supplemental Table S2) are of Asian origin. However, because of the scarce number of individuals in each of these nonconsanguineous data sets (Arabs = 2, and Asians = 3) (Supplemental Table S3), results were not confirmed statistically. Similar tendencies were observed when analyzing the mean length of the homozygous segments by ethnicity (Supplemental Table S3). The algorithm used herein to identify ROH regions uses a sliding window that scans along single-nucleotide variant data to detect nonheterozygous stretches. ES experiments target the evaluation of specific regions of the genome that differ between exome capture kits. It was hypothesized that the regions captured in each exome capture kit might affect the total ROH size, and thus interfere with the determination of consanguinity status using the ranges identified above. Therefore, we analyzed the total ROH size across the exome capture kits from the training data set. Six different exome capture kits with target capture sizes ranging from 37 to 62 Mb were assessed (Figure 2D). When analyzing the nonconsanguineous cohort across the different exome capture kits, all total ROH size medians were, as expected, within the nonconsanguineous range for all of the kits tested. Similar results were observed when analyzing the mean length of the segments by exome capture kit (Supplemental Table S3). The consanguinity ranges defined in the first part of this study were applied to the whole data set (Figure 1), 2432 individuals (index cases and relatives) from the RD-Connect GPAP. Total ROH size was computed for each individual and classified according to the comparison of the consanguinity status experimentally conferred and its corresponding clinical record. Individuals were classified as consanguineous or nonc
Referência(s)