Phenotype-aware prioritisation of rare Mendelian disease variants
2022; Elsevier BV; Volume: 38; Issue: 12 Linguagem: Inglês
10.1016/j.tig.2022.07.002
ISSN1362-4555
AutoresCatherine Kelly, Anita Szabó, Nikolas Pontikos, Gavin Arno, Peter N. Robinson, Julius O.B. Jacobsen, Damian Smedley, Valentina Cipriani,
Tópico(s)Genomic variations and chromosomal abnormalities
ResumoNext-generation sequencing technologies have made achieving a molecular diagnosis for a rare genetic disorder more and more feasible and, in turn, have enabled a more personalised clinical management of the affected patients and their families.Identifying the one or two variants that are responsible for a certain disease phenotype from the millions identified by sequencing can be time-consuming and expensive.Numerous phenotype-aware variant prioritisation (VP) software tools now exist to help semi-automate the molecular diagnosis process for rare diseases.Although many of the published VP tools have many limitations, show a lack of maintenance, and become soon unfit for usage, several are up-to-date and demonstrate an impressive capacity in prioritising molecular diagnoses when tested on real patient data.Adopting phenotype-aware VP software tools in diagnostics settings can efficiently assist the multidisciplinary teams of clinicians and scientists in reporting genetic diagnoses for rare disease. A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices. A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices. With approximately 80% of rare diseases having a genetic origin, identifying the correct causative variants in rare Mendelian single-gene disorders creates a greater potential for informed clinical management through precision medicine or recommendation for drug trials, rather than only treating evident symptoms. Improvements in sequencing genetic information at scale through parallelisation (next-generation sequencing) have enabled greatly increased quantities of genomic data production at lower overall costs, as shown by the recent completion of the 100,000 Genomes Project in the UK [1.Smedley D. et al.100,000 Genomes pilot on rare-disease diagnosis in health care - preliminary report.N. Engl. J. Med. 2021; 385: 1868-1880Crossref PubMed Scopus (192) Google Scholar]. Whole-exome sequencing (WES) is still the most commonly used method, as the exome (~2% of the human genome) harbours ~85% of currently known disease-causing sequence variants [2.Caspar S.M. et al.Clinical sequencing: from raw data to diagnosis with lifetime value.Clin. Genet. 2018; 93: 508-519Crossref PubMed Scopus (64) Google Scholar]. The candidate variants from a typical WES experiment are often derived from 60 000 to 100 000 variants affecting protein-coding regions, of which nearly all will be benign or unrelated to the disease [3.De La Vega F.M. et al.Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases.Genome Med. 2021; 13: 153Crossref PubMed Scopus (25) Google Scholar]. However, the filtering and review process can still involve many tens, if not a few hundreds, of candidate variants and is usually both time-consuming and expensive if done via manual analysis by multidisciplinary clinicians and scientists. Around one-third of children born with rare genetic diseases do not live to see their fifth birthday [4.Wright C.F. et al.Paediatric genomics: diagnosing rare disease in children.Nat. Rev. Genet. 2018; 19: 253-268Crossref PubMed Scopus (298) Google Scholar], so it is vital that their molecular diagnosis is rapid and yet, the traumatic wait time for patients is often lengthy (e.g., a median of 6 years in the 100,000 Genomes Project) [1.Smedley D. et al.100,000 Genomes pilot on rare-disease diagnosis in health care - preliminary report.N. Engl. J. Med. 2021; 385: 1868-1880Crossref PubMed Scopus (192) Google Scholar]. VP software offers the possibility of identifying the correct disease-causative variants more efficiently, sometimes within minutes. These tools usually discard large quantities of likely benign, common variants through filtering strategies based on publicly available (e.g., gnomAD) and in-house sequencing databases. The vast majority of VP tools are still only able to prioritise single-nucleotide variants (SNVs) and small insertion/deletions (indels) formatted as variant call format (VCF) files. To determine likely rare disease-causative SNVs/indels, VP tools usually incorporate several existing in silico pathogenicity prediction tools that can restrict the patients' VCF files to variants of interest based on a range of methods. They include function-prediction methods (e.g., MutationTaster, PolyPhen-2, SIFT), which are based on the likelihood of each missense variant causing pathogenic changes to protein structure or function; phylogenetic conservation methods (e.g., GERP++, phastCons, phyloP), which measure the degree of conservation at a given nucleotide site; other more recent methods, which concern a tailored use of deep neural networks (e.g., MVP, PrimateAI); and ensemble methods (e.g., CADD, DANN, REVEL), which integrate information from multiple component methods [5.Li J. et al.Performance evaluation of pathogenicity-computation methods for missense variants.Nucleic Acids Res. 2018; 46: 7793-7804Crossref PubMed Scopus (139) Google Scholar]. Despite the availability of this wide range of in silico pathogenicity prediction tools, improvements are still needed to discriminate pathogenic from benign variants with a reported median specificity of 65%; furthermore, with sensitivities ranging from 51% to 96% (median, 88%), relying only on algorithm-predicted variant pathogenicity is known to still generate a large number of false positive candidates [5.Li J. et al.Performance evaluation of pathogenicity-computation methods for missense variants.Nucleic Acids Res. 2018; 46: 7793-7804Crossref PubMed Scopus (139) Google Scholar]. With the aim of automating the manual prioritisation of candidate variants made by clinicians and scientists where the relevance of a certain gene variant to a patient's phenotype is taken into account, virtually all recent VP software tools have now enabled the incorporation of standardised patients' phenotypic terms, drawing from the more than 15 000 terms of the Human Phenotype Ontology (HPO) [6.Köhler S. et al.The Human Phenotype Ontology in 2021.Nucleic Acids Res. 2020; 49: D1207-D1217Crossref Scopus (363) Google Scholar]. This has ultimately been a significant addition; for example, Exomiser (among the first VP tools of its kind) [7.Bone W.P. et al.Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency.Genet. Med. 2016; 18: 608-617Abstract Full Text Full Text PDF PubMed Scopus (63) Google Scholar,8.Robinson P.N. et al.Improved exome prioritization of disease genes through cross-species phenotype comparison.Genome Res. 2014; 24: 340-348Crossref PubMed Scopus (235) Google Scholar] demonstrated an increased top prioritisation of the correct diagnosed causative variants from 20–77% (using only variant-based filtering) to 96–97% (with the addition of patients' HPO terms) using simulated sequencing data and across different mode of inheritances (MOIs) as well as from 3% to 74% using real patient data and inferred MOIs [9.Cipriani V. et al.An improved phenotype-driven tool for rare Mendelian variant prioritization: benchmarking exomiser on real patient whole-exome data.Genes. 2020; 11: 460Crossref Scopus (28) Google Scholar]. The VP software tools to date have been tested on (different) simulated and/or very small real patient sequencing datasets, with limited software performance comparison. Strikingly, each specific published tool virtually always claims to outperform the relatively limited number of other tools tested from the literature. Here, we set out to perform a thorough literature review with the aim of identifying up-to-date phenotype-aware VP software tools. Building on a previous benchmarking of VP tool Exomiser [9.Cipriani V. et al.An improved phenotype-driven tool for rare Mendelian variant prioritization: benchmarking exomiser on real patient whole-exome data.Genes. 2020; 11: 460Crossref Scopus (28) Google Scholar], we then conducted a relatively unbiased software performance comparison of the selected VP tools using a dataset of 134 whole-exomes from individuals affected by a range of rare inherited retinal diseases (IRDs) and known molecular diagnosis. A detailed literature search was carried out to determine a list of phenotype-aware VP software tools to use for real patient data benchmarking that would meet the following criteria: (i) directly accepting sequencing data formatted as VCF files; (ii) accepting HPO terms to describe patients' phenotypes; (iii) being relatively up-to-date (last updated or published since 2018); (iv) freely available for academic use; and (v) with local, programmatic access (and therefore safer for use with patient data as opposed to web-based access and allowing processing of data at scale). Literature searches using a combination of keywords (i.e., 'exome', 'genome', 'variant prioritisation', and alternative spelling 'variant prioritization', and 'human phenotype ontology') were conducted in PubMed and returned about 400 peer-reviewed journal articles (11 March 2022) (Figure 1). Articles were screened to identify those publications that involved a VP software tool for rare Mendelian disease. This initially gave a list of 37 candidate VP software tools [3.De La Vega F.M. et al.Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases.Genome Med. 2021; 13: 153Crossref PubMed Scopus (25) Google Scholar,8.Robinson P.N. et al.Improved exome prioritization of disease genes through cross-species phenotype comparison.Genome Res. 2014; 24: 340-348Crossref PubMed Scopus (235) Google Scholar,10.Alemán A. et al.A web-based interactive framework to assist in the prioritization of disease candidate genes in whole-exome sequencing studies.Nucleic Acids Res. 2014; 42: W88-W93Crossref PubMed Scopus (35) Google Scholar, 11.Anderson D. et al.Personalised analytics for rare disease diagnostics.Nat. Commun. 2019; 10: 5274Crossref PubMed Scopus (10) Google Scholar, 12.Antanaviciute A. et al.OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization.Bioinformatics. 2015; 31: 3822-3829PubMed Google Scholar, 13.Bertoldi L. et al.QueryOR: a comprehensive web platform for genetic variant analysis and prioritization.BMC Bioinform. 2017; 18: 225Crossref PubMed Scopus (18) Google Scholar, 14.Birgmeier J. et al.AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature.Sci. Transl. Med. 2020; 12eaau9113Crossref PubMed Scopus (30) Google Scholar, 15.Bosio M. et al.eDiVA-classification and prioritization of pathogenic variants for clinical diagnostics.Hum. Mutat. 2019; 40: 865-878Crossref PubMed Scopus (12) Google Scholar, 16.Boudellioua I. et al.DeepPVP: phenotype-based prioritization of causative variants using deep learning.BMC Bioinform. 2019; 20: 65Crossref PubMed Scopus (36) Google Scholar, 17.Boudellioua I. et al.Semantic prioritization of novel causative genomic variants.PLoS Comput. Biol. 2017; 13e1005500Crossref PubMed Scopus (24) Google Scholar, 18.Chiara M. et al.VINYL: Variant prIoritizatioN by survivaL analysis.Bioinformatics. 2020; 36: 5590-5599Crossref Scopus (4) Google Scholar, 19.Desvignes J.-P. et al.VarAFT: a variant annotation and filtration system for human next generation sequencing data.Nucleic Acids Res. 2018; 46: W545-W553Crossref PubMed Scopus (114) Google Scholar, 20.Holt J.M. et al.VarSight: prioritizing clinically reported variants with binary classification algorithms.BMC Bioinform. 2019; 20: 1-10Crossref PubMed Scopus (11) Google Scholar, 21.Holtgrewe M. et al.VarFish: comprehensive DNA variant analysis for diagnostics and research.Nucleic Acids Res. 2020; 48: W162Crossref PubMed Google Scholar, 22.Hombach D. et al.MutationDistiller: user-driven identification of pathogenic DNA variants.Nucleic Acids Res. 2019; 47: W114-W120Crossref PubMed Scopus (30) Google Scholar, 23.Hunt S.E. et al.Annotating and prioritizing genomic variants using the Ensembl Variant Effect Predictor-a tutorial.Hum. Mutat. 2021; 43: 986-997Crossref PubMed Scopus (11) Google Scholar, 24.Ip E. et al.VPOT: a customizable variant prioritization ordering tool for annotated variants.Genom. Proteom. Bioinform. 2019; 17: 540Crossref PubMed Scopus (7) Google Scholar, 25.James R.A. et al.A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics.Genome Med. 2016; 8: 13Crossref PubMed Scopus (30) Google Scholar, 26.Javed A. et al.Phen-Gen: combining phenotype and genotype to analyze rare disorders.Nat. Methods. 2014; 11: 935-937Crossref PubMed Scopus (106) Google Scholar, 27.Kennedy B. et al.Using VAAST to identify disease-associated variants in next-generation sequencing data.Curr. Protoc. Hum. Genet. 2014; 81: 6PubMed Google Scholar, 28.Koile D. et al.GenIO: a phenotype-genotype analysis web server for clinical genomics of rare diseases.BMC Bioinform. 2018; 19: 25Crossref PubMed Scopus (9) Google Scholar, 29.Li M.J. et al.wKGGSeq: a comprehensive strategy-based and disease-targeted online framework to facilitate exome sequencing studies of inherited disorders.Hum. Mutat. 2015; 36: 496-503Crossref PubMed Scopus (8) Google Scholar, 30.Li Q. et al.Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.Genet. Med. 2019; 21: 2126-2134Abstract Full Text Full Text PDF PubMed Scopus (37) Google Scholar, 31.Li Z. et al.PhenoPro: a novel toolkit for assisting in the diagnosis of Mendelian disease.Bioinformatics. 2019; 35: 3559-3566Crossref PubMed Scopus (21) Google Scholar, 32.Manshaei R. et al.GeneTerpret: a customizable multilayer approach to genomic variant prioritization and interpretation.BMC Med. Genet. 2020; 15: 31Google Scholar, 33.Muller H. et al.VCF.Filter: interactive prioritization of disease-linked genetic variants from sequencing data.Nucleic Acids Res. 2017; 45: W567-W572Crossref PubMed Scopus (19) Google Scholar, 34.O'Brien T.D. et al.Artificial intelligence (AI)-assisted exome reanalysis greatly aids in the identification of new positive cases and reduces analysis time in a clinical diagnostic laboratory.Genet. Med. 2022; 24: 192-200Abstract Full Text Full Text PDF PubMed Scopus (10) Google Scholar, 35.Robinson P.N. et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am. J. Hum. Genet. 2020; 107: 403-417Abstract Full Text Full Text PDF PubMed Scopus (30) Google Scholar, 36.Seo G.H. et al.Diagnostic yield and clinical utility of whole exome sequencing using an automated variant prioritization system, EVIDENCE.Clin. Genet. 2020; 98: 562-570Crossref PubMed Scopus (51) Google Scholar, 37.Sifrim A. et al.eXtasy: variant prioritization by genomic data fusion.Nat. Methods. 2013; 10: 1083-1084Crossref PubMed Scopus (131) Google Scholar, 38.Singleton M.V. et al.Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families.Am. J. Hum. Genet. 2014; 94: 599-610Abstract Full Text Full Text PDF PubMed Scopus (137) Google Scholar, 39.Stelzer G. et al.VarElect: the phenotype-based variation prioritizer of the GeneCards Suite.BMC Genomics. 2016; 17: 444Crossref PubMed Scopus (126) Google Scholar, 40.Trakadis Y.J. et al.PhenoVar: a phenotype-driven approach in clinical genomics for the diagnosis of polymalformative syndromes.BMC Med. Genet. 2014; 7: 22Google Scholar, 41.Ward A. et al.Clin.iobio: a collaborative diagnostic workflow to enable team-based precision genomics.J. Pers. Med. 2022; 12: 73Crossref PubMed Scopus (1) Google Scholar, 42.Wu C. et al.Rapid and accurate interpretation of clinical exomes using Phenoxome: a computational phenotype-driven approach.Eur. J. Hum. Genet. 2019; 27: 612-620Crossref PubMed Scopus (13) Google Scholar, 43.Yang H. Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR.Nat. Protoc. 2015; 10: 1556-1566Crossref PubMed Scopus (566) Google Scholar, 44.Zemojtel T. et al.Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome.Sci. Transl. Med. 2014; 6252ra123Crossref PubMed Scopus (188) Google Scholar] to prune according to the aforementioned criteria. Remarkably, only seven VP software tools passed all five criteria and were selected as final candidates for testing and comparison. Table 1 shows the details of the 37 tools retrieved from the literature search and the corresponding selection process.Table 1Selection of phenotype-aware variant prioritisation (VP) software tools based on five suitability criteriaaTable 1Selection of phenotype-aware variant prioritisation (VP) software tools based on five suitability criteriaaaA grey cell indicates that the corresponding feature is not present.bFollowing the completion of a literature review, seven viable VP software tool candidates that met all five suitability criteria (i.e., directly accepting VCF files; accepting HPO terms; last updated or published since 2018; freely available; with local, programmatic access) were selected for testing and comparison. aA grey cell indicates that the corresponding feature is not present. bFollowing the completion of a literature review, seven viable VP software tool candidates that met all five suitability criteria (i.e., directly accepting VCF files; accepting HPO terms; last updated or published since 2018; freely available; with local, programmatic access) were selected for testing and comparison. Most of the tools (33) can directly accept sequencing data as a VCF file, which is the standard file format for storing genetic variation data. A total of 28 tools are 'phenotype-aware' as opposed to simply using the genetic variant data for prioritisation; they all allow an integrative analysis of the patients' phenotypes using the HPO, which has become the de facto standard for deep phenotyping in the field of rare disease [6.Köhler S. et al.The Human Phenotype Ontology in 2021.Nucleic Acids Res. 2020; 49: D1207-D1217Crossref Scopus (363) Google Scholar]. The most discriminating criterion (failed by 23 tools) is our requirement for the VP tool to provide both local and programmatic access. Local installation is usually essential to conform to patient data privacy and security rules. Also, despite some attractive features web-based tools may seem to provide, processing of data at scale via programmatic access is usually vital to guarantee efficient analysis pipelines. It has to also be noted that 11 tools were never updated since their publication date or 2017 (one in 2013, two in 2014, three in 2015, one in 2016, and four in 2017), with corresponding website link broken for one of them (Table 1). This is largely a reflection of the challenges of maintaining academic software when resources do not exist for such an activity. Finally, Table 2 shows a summary of the different data sources that are leveraged within each of the seven remaining VP tool candidates to document the type and amount of information each tool relies on, as well as to provide insights into the need to update and/or maintain them.Table 2Data sources used within each of the seven selected phenotype-aware variant prioritisation (VP) software tools from the literature reviewaaZFIN, IMPC, and MGD are used only by Exomiser. SPIDEX, Pfam, Treefam, GIANT, REACTOME, LRT, InterPro, GWAS, Blast, GO, GERP, dbscSNV, and phyloP are used only by Xrare. ARIC, GTEx, and MetaSVM are used only by VARPP. GWAVA is used only by DeepPVP.bThe tools that were successfully downloaded, installed, and tested in the software performance comparison within this study. aZFIN, IMPC, and MGD are used only by Exomiser. SPIDEX, Pfam, Treefam, GIANT, REACTOME, LRT, InterPro, GWAS, Blast, GO, GERP, dbscSNV, and phyloP are used only by Xrare. ARIC, GTEx, and MetaSVM are used only by VARPP. GWAVA is used only by DeepPVP. bThe tools that were successfully downloaded, installed, and tested in the software performance comparison within this study. Attempts were made to download and install all seven of the selected VP software tools. Further illustrating the problems with long-term maintenance of academic software, this was not possible for three of them due to inaccessible databases, failing dockers, or lack of information in ReadMe files. In particular, for DeepPVP [16.Boudellioua I. et al.DeepPVP: phenotype-based prioritization of causative variants using deep learning.BMC Bioinform. 2019; 20: 65Crossref PubMed Scopus (36) Google Scholar], we were unable to follow their installation process as no phenomenet-vp docker container exists in Docker Hub and the dockerfile recipe provided in their GitHub repository does not build in the research computing environment containing our clinical data; Phenoxome's [42.Wu C. et al.Rapid and accurate interpretation of clinical exomes using Phenoxome: a computational phenotype-driven approach.Eur. J. Hum. Genet. 2019; 27: 612-620Crossref PubMed Scopus (13) Google Scholar] docker pull was successful but there were no further instructions for progression with its installation; VARPP [11.Anderson D. et al.Personalised analytics for rare disease diagnostics.Nat. Commun. 2019; 10: 5274Crossref PubMed Scopus (10) Google Scholar] download of the required dbNSFP database (version v3.4a) was no longer possible. The four remaining VP software tools were finally included in this software performance evaluation and comparison using real patient data as they were each successfully downloaded and installed as reported next, including a brief description of the corresponding VP rationale and algorithms. The corresponding code for all analyses is available as a repository at https://github.com/whri-phenogenomics/VPSoftware_review. Exomiser [7.Bone W.P. et al.Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency.Genet. Med. 2016; 18: 608-617Abstract Full Text Full Text PDF PubMed Scopus (63) Google Scholar, 8.Robinson P.N. et al.Improved exome prioritization of disease genes through cross-species phenotype comparison.Genome Res. 2014; 24: 340-348Crossref PubMed Scopus (235) Google Scholar, 9.Cipriani V. et al.An improved phenotype-driven tool for rare Mendelian variant prioritization: benchmarking exomiser on real patient whole-exome data.Genes. 2020; 11: 460Crossref Scopus (28) Google Scholar,45.Smedley D. et al.Next-generation diagnostics and disease-gene discovery with the Exomiser.Nat. Protoc. 2015; 10: 2004-2015Crossref PubMed Scopus (197) Google Scholar] is a freely available Java software tool that automates filtering and prioritisation of variants contained in VCF files from sequencing of rare disease patients (and, if available, their family members). A range of user-defined variant filtering criteria can be applied based on JANNOVAR [46.Jäger M. et al.Jannovar: a java library for exome annotation.Hum. Mutat. 2014; 35: 548-555Crossref PubMed Scopus (49) Google Scholar] functional annotation, minor allele frequency, and expected inheritance pattern, amongst others. Each filtered variant is then prioritised according to a variant score based on its rarity and in silico algorithm-predicted pathogenicity, which is in turn combined with a corresponding gene-specific phenotype score. The latter is obtained via the PhenoDigm algorithm [37.Sifrim A. et al.eXtasy: variant prioritization by genomic data fusion.Nat. Methods. 2013; 10: 1083-1084Crossref PubMed Scopus (131) Google Scholar] and is calculated based on the semantic similarity between the user-provided HPO-encoded patient's phenotype and the phenotypic annotations of genes in known human diseases, orthologs in mouse and zebrafish model organisms, and phenotypes of protein–protein associated neighbours [7.Bone W.P. et al.Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency.Genet. Med. 2016; 18: 608-617Abstract Full Text Full Text PDF PubMed Scopus (63) Google Scholar]. The download and installation of Exomiser version 13.0.0 (released on 23 September 2021) were straightforward, following a comprehensive ReadMe file accessed via Exomiser's GitHub pagei. We used a Bash script to create a single-sample-analysis-settings.yml file starting from the preset-exome-analysis.yml example file provided and containing the Exomiser analysis settings per each patient from the IRD dataset. Exomiser was then run using the following command line per each single-sample analysis for the IRD patient WES dataset (Java version 17.0.0; Exomiser variant and phenotype databases version 2109; default Ensembl transcript annotation):java -Xms2g -Xmx4g -jar exomiser-cli-13.0.0.jar –analysis single-sample-analysis-settings.yml A few representative sections of the HTML output file from the analysis of one single sample are reported in Figure S1 in the supplemental information online. Tab-separated (tsv) output files containing a variety of relevant information for the filtered and prioritised variants (including functional annotation, allele frequency in publicly available databases, the gene-specific phenotype score, the variant score, and the Exomiser combined score) were also obtained and processed for software performance evaluation and statistical comparison, as described later and in the supplemental information online. PhenIX (i.e., phenotypic interpretation of exomes) [44.Zemojtel T. et al.Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome.Sci. Transl. Med. 2014; 6252ra123Crossref PubMed Scopus (188) Google Scholar] is a computational method that evaluates and ranks variants based on their rarity and predicted pathogenicity, as well as the semantic similarity of the HPO terms used to describe the patients' phenotypes to those of thousands of human Mendelian diseases as reported in OMIM and Orphanet (last updated in 2019). PhenIX is available within Exomiser. Therefore, it did not require any additional download and installation, can be run in the same way as Exomiser, and produces similar output files. It exploits the same variant filtering framework of Exomiser, while its semantic similarity algorithm is enabled by replacing Exomiser option 'hiPhivePrioritiser: {}' with 'phenixPrioritiser: {}'. LIRICAL (i.e., likelihood ratio interpretation of clinical abnormalities) [35.Robinson P.N. et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am. J. Hum. Genet. 2020; 107: 403-417Abstract Full Text Full Text PDF PubMed Scopus (30) Google Scholar] exploits the likelihood ratio (LR) statistical framework. Not only does it ultimately rank the candidate variants but it also provides an estimate of the post-test probability of candidate diagnoses and calculates the extent to which (LR) each provided HPO-encoded abnormality (and, if VCF files are available, genotype too) is consistent with the diagnosis. LIRICAL version 1.3.4 (released on 26 September 2021) was downloaded by git cloning the corresponding GitHub repositoryii and installed following the clear instructions from the corresponding 'readthedocs' pagesiii. LIRICAL makes use of the Exomiser variant and phenotype databases (we enabled database version 2109). The preferred input format for LIRICAL is Phenopacketsiv, an open standard, also adopted within the Global Alliance for Genomics and Healthv, for sharing detailed phenotypic descriptions linked with disease, patient, and genetic information. We used a Python script to create a Phenopacket single-sample-phenopacket.json per each patient from the IRD patient WES dataset. LIRICAL was then run using the following command line per each single-sample analysis:java -jar LIRICAL.jar phenopacket -p single-sample-phenopacket.json -e path/to/Exomiser-data-directory -x prefixOfOutputFile --tsv --output-directory path/to/output-directory A few representative sections of the HTML output file from the analysis of one single sample are reported in Figure S1 in the supplemental information online. Tab-separated (tsv) output files containing relevant information for the candidate prioritised diagnoses, together with the corresponding filtered variants (including rank, post-test probability, and LR), were
Referência(s)