Resolving MiSeq-Generated Ambiguities in HLA-DPB1 Typing by Using the Oxford Nanopore Technology
2019; Elsevier BV; Volume: 21; Issue: 5 Linguagem: Inglês
10.1016/j.jmoldx.2019.04.009
ISSN1943-7811
AutoresJamie L. Duke, Timothy L. Mosbruger, Deborah Ferriola, Nilesh Chitnis, Taishan Hu, Nikolaos Tairis, David J. Margolis, Dimitri Monos,
Tópico(s)T-cell and B-cell Immunology
ResumoThe technical limitations of current next-generation sequencing technologies, combined with an ever-increasing number of human leukocyte antigen (HLA) alleles, form the basis for the additional ambiguities encountered at an increasing rate in clinical practice. HLA-DPB1 characterization, particularly, generates a significant percentage of ambiguities (25.5%), posing a challenge for accurate and unambiguous HLA-DPB1 genotyping. Phasing of exonic heterozygous positions between exon 2 and all other downstream exons has been the major cause of ambiguities. In this study, the Oxford Nanopore MinION, a third-generation sequencing technology, was used to resolve the phasing. The accurate MiSeq sequencing data, combined with the long reads obtained from the MinION platform, allow for the resolution of the tested ambiguities. The technical limitations of current next-generation sequencing technologies, combined with an ever-increasing number of human leukocyte antigen (HLA) alleles, form the basis for the additional ambiguities encountered at an increasing rate in clinical practice. HLA-DPB1 characterization, particularly, generates a significant percentage of ambiguities (25.5%), posing a challenge for accurate and unambiguous HLA-DPB1 genotyping. Phasing of exonic heterozygous positions between exon 2 and all other downstream exons has been the major cause of ambiguities. In this study, the Oxford Nanopore MinION, a third-generation sequencing technology, was used to resolve the phasing. The accurate MiSeq sequencing data, combined with the long reads obtained from the MinION platform, allow for the resolution of the tested ambiguities. The introduction of high-throughput single DNA molecule sequencing technologies [namely, next-generation sequencing (NGS)] in the field of immunogenetics has significantly affected the practice of human leukocyte antigen (HLA) typing.1Duke J.L. Lind C. Mackiewicz K. Ferriola D. Papazoglou A. Gasiewski A. Heron S. Huynh A. McLaughlin L. Rogers M. Slavich L. Walker R. Monos D.S. Determining performance characteristics of an NGS-based HLA typing method for clinical applications.HLA. 2016; 87: 141-152Crossref PubMed Scopus (69) Google Scholar One of the major consequences is the elimination of almost all ambiguities observed when legacy methods are used for typing. Nevertheless, as our experience expands regarding the use of NGS for HLA typing, it becomes apparent that the ever-increasing number of alleles, combined with the technical limitation of Illumina (San Diego, CA) platforms to sequence and provide credible phase information for distances >800 bases,1Duke J.L. Lind C. Mackiewicz K. Ferriola D. Papazoglou A. Gasiewski A. Heron S. Huynh A. McLaughlin L. Rogers M. Slavich L. Walker R. Monos D.S. Determining performance characteristics of an NGS-based HLA typing method for clinical applications.HLA. 2016; 87: 141-152Crossref PubMed Scopus (69) Google Scholar, 2Duke J.L. Lind C. Mackiewicz K. Ferriola D. Papazoglou A. Derbeneva O. Wallace D. Monos D.S. Towards allele-level human leucocyte antigens genotyping - assessing two next-generation sequencing platforms: ion Torrent Personal Genome Machine and Illumina MiSeq.Int J Immunogenet. 2015; 42: 346-358Crossref PubMed Scopus (18) Google Scholar, 3Gandhi M.J. Ferriola D. Huang Y. Duke J.L. Monos D. Targeted next-generation sequencing for human leukocyte antigen typing in a clinical laboratory: metrics of relevance and considerations for its successful implementation.Arch Pathol Lab Med. 2017; 141: 806-812Crossref PubMed Scopus (29) Google Scholar generates circumstances of new ambiguities that interfere with the unequivocal reporting of HLA genotyping. One locus that is particularly notorious for generating ambiguities is the HLA-DPB1 locus. This locus is sequenced in our laboratory as it is clinically relevant to both allogeneic hematopoietic stem cell transplantation and solid organ transplantation.4Fleischhauer K. Shaw B.E. Gooley T. Malkki M. Bardy P. Bignon J.-D. Dubois V. Horowitz M.M. Madrigal J.A. Morishima Y. Oudshoorn M. Ringden O. Spellman S. Velardi A. Zino E. Petersdorf E.W. International Histocompatibility Working Group in Hematopoietic Cell TransplantationEffect of T-cell-epitope matching at HLA-DPB1 in recipients of unrelated-donor haemopoietic-cell transplantation: a retrospective study.Lancet Oncol. 2012; 13: 366-374Abstract Full Text Full Text PDF PubMed Scopus (250) Google Scholar, 5Pidala J. Lee S.J. Ahn K.W. Spellman S. Wang H.-L. Aljurf M. Askar M. Dehn J. Fernandez Viña M. Gratwohl A. Gupta V. Hanna R. Horowitz M.M. Hurley C.K. Inamoto Y. Kassim A.A. Nishihori T. Mueller C. Oudshoorn M. Petersdorf E.W. Prasad V. Robinson J. Saber W. Schultz K.R. Shaw B. Storek J. Wood W.A. Woolfrey A.E. Anasetti C. Nonpermissive HLA-DPB1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation.Blood. 2014; 124: 2596-2606Crossref PubMed Scopus (189) Google Scholar, 6Thaunat O. Hanf W. Dubois V. McGregor B. Perrat G. Chauvet C. Touraine J.-L. Morelon E. Chronic humoral rejection mediated by anti-HLA-DP alloantibodies: insights into the role of epitope sharing in donor-specific and non-donor specific alloantibodies generation.Transpl Immunol. 2009; 20: 209-211Crossref PubMed Scopus (50) Google Scholar, 7Singh P. Colombe B.W. Francos G.C. Martinez Cantarin M.P. Frank A.M. Acute humoral rejection in a zero mismatch deceased donor renal transplant due to an antibody to an HLA-DP alpha.Transplantation. 2010; 90: 220-221Crossref PubMed Scopus (32) Google Scholar, 8Mytilineos J. Deufel A. Opelz G. Clinical relevance of HLA-DPB locus matching for cadaver kidney retransplants: a report of the Collaborative Transplant Study.Transplantation. 1997; 63: 1351-1354Crossref PubMed Scopus (61) Google Scholar, 9Qiu J. Cai J. Terasaki P.I. El-Awar N. Lee J.-H. Detection of antibodies to HLA-DP in renal transplant recipients using single antigen beads.Transplantation. 2005; 80: 1511-1513Crossref PubMed Scopus (54) Google Scholar, 10Laux G. Mansmann U. Deufel A. Opelz G. Mytilineos J. A new epitope-based HLA-DPB matching approach for cadaver kidney retransplants.Transplantation. 2003; 75: 1527-1532Crossref PubMed Scopus (46) Google Scholar, 11Goral S. Prak E.L. Kearns J. Bloom R.D. Pierce E. Doyle A. Grossman R. Naji A. Kamoun M. Preformed donor-directed anti-HLA-DP antibodies may be an impediment to successful kidney transplantation.Nephrol Dial Transplant. 2008; 23: 390-392Crossref PubMed Scopus (43) Google Scholar, 12Vazirabad I. Chhabra S. Nytes J. Mehra V. Narra R.K. Szabo A. Jerkins J.H. Dhakal B. Hari P. Anderson M.W. Direct HLA genetic comparisons identify highly matched unrelated donor-recipient pairs with improved transplantation outcome.Biol Blood Marrow Transplant. 2019; 25: 921-931Abstract Full Text Full Text PDF PubMed Scopus (15) Google Scholar, 13Nowak J. Nestorowicz K. Graczyk-Pol E. Mika-Witkowska R. Rogatko-Koros M. Jaskula E. et al.HLA-inferred extended haplotype disparity level is more relevant than the level of HLA mismatch alone for the patients survival and GvHD in T cell-replate hematopoietic stem cell transplantation from unrelated donor.Hum Immunol. 2018; 79: 403-412Crossref PubMed Scopus (9) Google Scholar However, in the course of sequencing the HLA-DPB1 gene in the clinical setting, often it is not possible to report the genotype unambiguously. Over the past 5 years, >5700 samples have been sequenced for DPB1 and an average of 25.5% of all samples have at least one allele reported ambiguously. There are two types of ambiguity possible for the DPB1 locus: the first is because the current assay does not characterize exon 1 and alleles whose only difference is in exon 1 are not distinguished, and the second type of ambiguity is due to the inability to set phase between the heterozygous positions within the exons sequenced (exons 2 to 5). The rate of reported ambiguity for the DPB1 gene in our laboratory has dramatically increased in that time frame from an overall ambiguity rate of 17.8% in 2014 (type 1, 6.1%; type 2, 11.7%) to 34.0% in 2019 (type 1, 7.9%; type 2, 26.1%). The increase is directly related to the increase in the number of alleles that have been reported and characterized beyond exon 2 to the international ImMunoGeneTics information system HLA(IMGT/HLA) database14Robinson J. Halliwell J.A. Hayhurst J.D. Flicek P. Parham P. Marsh S.G.E. The IPD and IMGT/HLA database: allele variant databases.Nucleic Acids Res. 2015; 43: D423-D431Crossref PubMed Scopus (1449) Google Scholar in that time frame. Herein, we focus on the second type of ambiguity, where initially an unphased set of two alleles only had one combination of alleles that satisfied the set of heterozygous positions in the data. However, as time has progressed, new alleles have been described that also satisfy the same set of heterozygous positions but in different cis/trans combinations. For example, a sample in 2014 would have been typed as DPB1*01:01:01 + DPB1*11:01:01 unambiguously as no other pair of alleles could explain the set of heterozygous positions present in the data; however, in 2017, the DPB1*654:01 allele was described, and when this newer allele is combined with the DPB1*417:01 allele, they also have the same set of heterozygous positions as the DPB1*01:01:01 + DPB1*11:01:01 combination (Figure 1D). The difference between the two allele combinations is the arrangement of a polymorphic position located in exon 4, the transmembrane domain, with the polymorphic positions in exon 2 that are 4.8 kb apart, and phase is broken by a 1.6-kb homozygous region in intron 2 (Figure 1D). Therefore, unless all positions across the sequenced region are phased, the two possible combinations of alleles cannot be discerned from each other. Within the new alleles that are being published for the DPB1 locus, it is not uncommon to find the same elements from other, often common, alleles rearranged to form these new alleles. This can be accomplished through multiple mechanisms, including recombination and gene conversion, both of which have been well-described mechanisms at play within the major histocompatibility complex as a way to increase allelic diversity, allowing for the possibility of millions of alleles per locus to exist.15Carrington M. Recombination within the human MHC.Immunol Rev. 1999; 167: 245-256Crossref PubMed Scopus (96) Google Scholar, 16Kotsch K. Blasczyk R. The noncoding regions of HLA-DRB uncover interlineage recombinations as a mechanism of HLA diversification.J Immunol. 2000; 165: 5664-5670Crossref PubMed Scopus (14) Google Scholar, 17Högstrand K. Böhme J. Gene conversion can create new MHC alleles.Immunol Rev. 1999; 167: 305-317Crossref PubMed Scopus (36) Google Scholar, 18Seemann G.H. Rein R.S. Brown C.S. Ploegh H.L. Gene conversion-like mechanisms may generate polymorphism in human class I genes.EMBO J. 1986; 5: 547-552Crossref PubMed Scopus (69) Google Scholar, 19Pease L.R. Schulze D.H. Pfaffenbach G.M. Nathenson S.G. Spontaneous H-2 mutants provide evidence that a copy mechanism analogous to gene conversion generates polymorphism in the major histocompatibility complex.Proc Natl Acad Sci U S A. 1983; 80: 242-246Crossref PubMed Scopus (124) Google Scholar, 20Klitz W. Hedrick P. Louis E.J. New reservoirs of HLA alleles: pools of rare variants enhance immune defense.Trends Genet. 2012; 28: 480-486Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar In the case of DPB1, gene conversion has been proposed as a way in which the six hypervariable regions found within exon 2 are shuffled between alleles to generate diversity.21Bugawan T.L. Horn G.T. Long C.M. Mickelson E. Hansen J.A. Ferrara G.B. Angelini G. Erlich H.A. Analysis of HLA-DP allelic sequence polymorphism using the in vitro enzymatic DNA amplification of DP-alpha and DP-beta loci.J Immunol. 1988; 141: 4024-4030PubMed Google Scholar, 22Zangenberg G. Huang M.-M. Arnheim N. Erlich H. New HLA–DPB1 alleles generated by interallelic gene conversion detected by analysis of sperm.Nat Genet. 1995; 10: 407Crossref PubMed Scopus (116) Google Scholar, 23Huang M.M. Erlich H.A. Goodman M.F. Arnheim N. Analysis of mutational changes at the HLA locus in single human sperm.Hum Mutat. 1995; 6: 303-310Crossref PubMed Scopus (19) Google Scholar However, this is only used to explain the exchange of small regions of DNA, subsegments of exons, and does not explain larger rearrangements, which have been generated, presumably, through larger recombination events. Recent NGS sequencing characterization of the complete genomic sequence of DPB1 alleles shows that in the region between exon 3 and the 3′ untranslated region (UTR) of the DPB1 alleles are clustered into two distinct clades of alleles that are independent of the clustering observed from the same set of alleles for exon 2, the antigen recognition domain.24Morishima S. Shiina T. Suzuki S. Ogawa S. Sato-Otsubo A. Kashiwase K. Azuma F. Yabe T. Satake M. Kato S. Kodera Y. Sasazuki T. Morishima Y. Evolutionary basis of HLA-DPB1 alleles affects acute GVHD in unrelated donor stem cell transplantation.Blood. 2018; 131: 808-817Crossref PubMed Scopus (27) Google Scholar, 25Klasberg S. Lang K. Günther M. Schober G. Massalski C. Schmidt A.H. Lange V. Schöfl G. Patterns of non-ARD variation in more than 300 full-length HLA-DPB1 alleles.Hum Immunol. 2019; 80: 44-52Crossref PubMed Scopus (17) Google Scholar The two clusters of alleles described segregate with the single-nucleotide polymorphism (SNP) rs9277534 that serves as a marker for DPB1 expression.26Thomas R. Thio C.L. Apps R. Qi Y. Gao X. Marti D. Stein J.L. Soderberg K.A. Moody M.A. Goedert J.J. Kirk G.D. Hoots W.K. Wolinsky S. Carrington M. A novel variant marking HLA-DP expression levels predicts recovery from hepatitis B virus infection.J Virol. 2012; 86: 6979-6985Crossref PubMed Scopus (118) Google Scholar, 27Petersdorf E.W. Malkki M. O'hUigin C. Carrington M. Gooley T. Haagenson M.D. Horowitz M.M. Spellman S.R. Wang T. Stevenson P. High HLA-DP expression and graft-versus-host disease.N Engl J Med. 2015; 373: 599-609Crossref PubMed Scopus (194) Google Scholar Phase is often broken in intron 2, which is approximately 4 kb in length, and can be either sparsely or densely populated with heterozygous positions, depending on the combination of alleles, and falls in line with the observations regarding the evolution of the DPB1 locus.24Morishima S. Shiina T. Suzuki S. Ogawa S. Sato-Otsubo A. Kashiwase K. Azuma F. Yabe T. Satake M. Kato S. Kodera Y. Sasazuki T. Morishima Y. Evolutionary basis of HLA-DPB1 alleles affects acute GVHD in unrelated donor stem cell transplantation.Blood. 2018; 131: 808-817Crossref PubMed Scopus (27) Google Scholar, 25Klasberg S. Lang K. Günther M. Schober G. Massalski C. Schmidt A.H. Lange V. Schöfl G. Patterns of non-ARD variation in more than 300 full-length HLA-DPB1 alleles.Hum Immunol. 2019; 80: 44-52Crossref PubMed Scopus (17) Google Scholar To reduce the rate at which ambiguities are reported for DPB1, phasing of intron 2 is of utmost importance and short-read technologies cannot completely address the problem. Therefore, we explored long-read sequencing methods to combat the ambiguity problem, keeping in mind the needs of clinical HLA laboratories for methods that are accurate, are fast, allow multiplexing, are easy to use, and permit reading long fragments of DNA. Therefore, it was evaluated whether the combination of MiSeq sequencing together with Oxford Nanopore Technology (ONT; Oxford, UK) MinION sequencing offers a way to accurately resolve ambiguities in HLA-DPB1 genotyping. In general, nanopore sequencing is a technology in which DNA is transported through nanoscale-sized pores. Nanopore holes can be proteinaceous (biological nanopores) or solid (solid-state nanopores).28Deamer D. Akeson M. Branton D. Three decades of nanopore sequencing.Nat Biotechnol. 2016; 34: 518Crossref PubMed Scopus (546) Google Scholar, 29Sadki E.S. Garaj S. Vlassarev D. Golovchenko J.A. Branton D. Embedding a carbon nanotube across the diameter of a solid state nanopore.J Vac Sci Technol B. 2011; 29 (053001)Crossref Scopus (9) Google Scholar The R9 chemistry of ONT flow cells uses Escherichia coli mutant curli production assembly/transport protein CsgG lipoprotein that has been engineered for DNA translocation. The protein nanopore is placed in electrically resistant polymer membrane, and a current passes through the nanopore. A disturbance in electric current due to passing of DNA molecule through nanopore is used to identify the different bases. Each of the combination of bases produces a different pattern of disturbance in the current flow, and this allows identification of nucleotides as DNA moves through the single pore. A major difference between ONT and Illumina is read length. ONT allows large amplicon fragments to be sequenced in their entirety, whereas Illumina platforms have limitations with respect to sequencing long DNA fragments, as previously mentioned (Introduction), and the read length does not exceed 500 bases, whether as a continuum or paired ends of a fragment. ONT sequencing is also relatively fast. However, the major limitation of sequencing with the R9 chemistry from ONT is that it is highly error prone, and the instrument makes systematic errors.30Magi A. Giusti B. Tattini L. Characterization of MinION nanopore data for resequencing analyses.Brief Bioinformatics. 2017; 18: 940-953PubMed Google Scholar Others have shown the strength of combining ONT data with Illumina data to produce high-quality data.31Goodwin S. Gurtowski J. Ethe-Sayers S. Deshpande P. Schatz M.C. McCombie W.R. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.Genome Res. 2015; 25: 1750-1756Crossref PubMed Scopus (238) Google Scholar, 32Morisse P. Lecroq T. Lefebvre A. Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.Bioinformatics. 2018; 34: 4213-4222Crossref PubMed Scopus (19) Google Scholar For our purposes of phasing, using Illumina MiSeq data to phase between distal heterozygous positions requires phasing both intronic and exonic heterozygous positions, often including regions with low-quality sequencing data (including homopolymers and short tandem repeats); and because the Illumina fragment sizes are small (300 to 800 bases), many fragments are needed to tile from one heterozygous position to the next. However, with the ONT MinION data, in which a single read encompasses the entire region of interest, there is freedom to choose the appropriate heterozygous positions and bypass regions that are difficult to sequence, in particular low-complexity regions often found in the intronic sequences. Therefore, we have explored a combination of Illumina MiSeq, high-quality data with short fragment sizes, and ONT MinION sequencing, error prone but long reads, to resolve the ambiguities present within the HLA-DPB1 gene and aid in novel allele discovery. HLA-DPB1 ambiguity is a common problem observed through routine HLA genotyping. Fifteen samples, in which a common ambiguity or novelty at HLA-DPB1 was observed in routine testing, were chosen to undergo ambiguity resolution with Oxford Nanopore sequencing. Thirteen samples were from clinical specimens, whereas two samples were from a research study regarding atopic dermatitis. Inclusion of all deidentified subjects for this study has received institutional review board approval. Initial HLA-DPB1 genotyping for each sample was performed as part of a pool with other HLA genes using Omixon Holotype V2 HLA kits (Omixon, Budapest, Hungary). Briefly, reagents from Qiagen LR PCR kits (Qiagen, Valencia, CA) were combined with kit primers for amplification; and PCR was performed on a ThermoFisher Veriti thermal cycler (ThermoFisher, Waltham, MA), according to the Holotype V2 protocol. The amplicon excludes exon 1 containing the leader peptide, starts in intron 1, continues through to the 3′ UTR of the gene, containing exon 2 (antigen presentation domain), exon 3 (extracellular domain proximal to the membrane), exon 4 (transmembrane domain), exon 5 (cytoplasmic tail), and the 3′ UTR, which includes the SNP rs9277534 that has been a identified as a marker for HLA-DPB1 high/low expression. Library preparation for NGS was performed using the standard library preparation method and reagents from Holotype V2 HLA kits. Amplicons were quantitated with the QuantiFluor dsDNA system (Promega, Madison, WI) and diluted to approximately 150 ng/μL and then equal volumes were pooled to 35 μL. After cleaning the diluted 35 μL amplicon pools with 4 μl ExoSAP Express (ThermoFisher), the diluted amplicons were then enzymatically fragmented, end repaired, ligated to indexed adaptors, pooled, concentrated with 1X AMPure XP beads (Beckman Coulter, Indianapolis, IN), and size selected on a Blue Pippin (Sage Science, Beverly, MA). The size-selected library concentration was measured by quantitative PCR with KAPA library quantification kits (KAPA, Wilmington, MA) diluted to 9 pM (picomoles) and sequenced on an Illumina MiSeq using paired-end 2 × 250 V2 sequencing chemistry. The DPB1 loci for these samples were then reamplified and sequenced as individual locus libraries, as described above, using paired-end, 2 × 150 V2 sequencing chemistry to obtain higher depth and, therefore, increase the chances for phasing distant polymorphisms (Results). NGS data were demultiplexed and FASTQ files were generated by MiSeq Reporter on the platform. FASTQ files were analyzed with Omixon's Twin software version 2.5.1 (IMGT/HLA version 3.29.0.1_5) using 7000 pairs of reads for the DPB1 locus and GenDx’s NGSengine software version 2.9.1 (IMGT/HLA version 3.31; GenDx, Utrecht, the Netherlands) using 100,000 pairs of reads per sample. DPB1 amplicons from 14 samples with ambiguous DPB1 allele combinations and one sample with a novelty that could not be phased were then sequenced on an ONT MinION to obtain a fully phased sequence. Twelve of these samples were sequenced in a single run using an ONT 1D Native barcoding kit for genomic DNA with the manufacturer's protocol version NBE_9006_v103_revQ_21Dec2016 beginning at the end repair step. The three other DPB1 amplicons were processed in subsequent MinION runs. In short, 2 μg of each amplicon was end repaired and dA tailed with New England Biolabs (NEB; Ipswich, MA) Ultra II End-prep enzyme and buffer, ligated to ONT native barcodes with NEB Blunt/TA Ligase Master Mix, pooled equimolar, and then ligated to ONT Barcode Adaptor Mix with NEB NEBNext5X Quick Ligation Reaction Buffer and Quick T4 DNA Ligase. Between each step in the ONT protocol, reactions were cleaned with AMPure XP beads. The final adaptor-ligated, barcoded pool was loaded onto an ONT SpotON SQK-LSK108 flow cell on the MinION, and amplicon strands were processed through the nanopores on the flow cell for 2 hours using MinKNOW software version 1.13.1 (Oxford Nanopore Technologies). Local base calling was performed using MinKNOW, and the resulting FASTQ files were demultiplexed and adapter trimmed using Porechop version 0.2.3 (R. Wick, Melbourne, VIC, Australia; https://github.com/rrwick/Porechop, last accessed May 1, 2018). ONT reads with lengths between 6500 and 7400 bp were retained for further analysis. Illumina FASTQ files were analyzed for DPB1 with Omixon Twin version 2.5.1 (IMGT 3.29.0.1_5) using 7000 paired reads per sample. The sequence and quality information for reads assigned to DPB1 were extracted from the Omixion HTR file and written out in FASTQ format. Hybrid correction of the ONT reads with Illumina reads was performed with FMLRC version 0.1.2 using default settings.33Wang J.R. Holt J. McMillan L. Jones C.D. FMLRC: hybrid long read error correction using an FM-index.BMC Bioinformatics. 2018; 19: 50Crossref PubMed Scopus (54) Google Scholar MAFFT version 7.394 was used to generate a multiple sequence alignment (MSA) on 100 randomly selected corrected ONT reads per sample.34Katoh K. Standley D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.Mol Biol Evol. 2013; 30: 772-780Crossref PubMed Scopus (21873) Google Scholar The adjustdirection argument was used to account for unstranded reads, and the first read in the input file was selected to be in the sense orientation to keep the direction of the MSA consistent. The function msaConsensusSequence from the Bioconductor package msa version 1.10.0 was used to generate a consensus sequence from the MAFFT MSA.35Bodenhofer U. Bonatesta E. Horejš-Kainrath C. Hochreiter S. msa: An R package for multiple sequence alignment.Bioinformatics. 2015; 31: 3997-3999PubMed Google Scholar The resulting consensus sequence was cleaned by removing gaps and replacing question marks with N. A total of 1000 corrected ONT reads were aligned to the consensus sequence using minimap236Li H. Minimap2: pairwise alignment for nucleotide sequences.Bioinformatics. 2018; 34: 3094-3100Crossref PubMed Scopus (3144) Google Scholar version 2.10, and the alignments were left aligned using the function LeftAlignIndels from the Genome Analysis Toolkit version 3.3.0.37DePristo M.A. Banks E. Poplin R. Garimella K.V. Maguire J.R. Hartl C. Philippakis A.A. del Angel G. Rivas M.A. Hanna M. McKenna A. Fennell T.J. Kernytsky A.M. Sivachenko A.Y. Cibulskis K. Gabriel S.B. Altshuler D. Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat Genet. 2011; 43: 491Crossref PubMed Scopus (7140) Google Scholar Two variants were selected for each ambiguous pair, the last variant in exon 2 and the next exonic variant in exon 3, 4, or 5. Intronic variants were ignored. A custom Python script using the pysam module version 0.14 (https://pypi.org/project/pysam, last accessed February 1, 2018) was used to iterate over each aligned read and record the observed bases at the two variant positions. The counts of the base combinations representing the ambiguous pairs were used to calculate a Φ (Phi) coefficient. Given the high ambiguity rate in the reported typing for the HLA-DPB1 gene using short-read technology, five different common ambiguities across 14 samples and an unphased novel allele using the Oxford Nanopore MinION were examined. The five allele combinations that could not be phased with the Illumina platform were ambiguous and chosen for further study: DPB1*02:01:02 + DPB1*04:02:01 versus DPB1*105:01:01 + DPB1*416:01:01 (Figure 1A); DPB1*03:01:01 + DPB1*04:01:01 versus DPB1*124:01:01 + DPB1*350:01 (Figure 1B); DPB1*04:01:01 + DPB1*04:02:01 versus DPB1*126:01:01 + DPB1*105:01:01 (Figure 1C); DPB1*01:01:01 + DPB1*11:01:01 versus DPB1*417:01 + DPB1*654:01 (Figure 1D); and DPB1*04:01:01 + DPB1*104:01:01 versus DPB1*124:01:01 + DPB1*702:01 (Figure 1E). The final sample included in this study has a novel mutation found in exon 5 in a sample that otherwise types as DPB1*02:01:02 + DPB1*04:01:01:13 (Figure 1F). To determine the allele to which the novelty belongs, it must be phased to heterozygous positions in exon 2. In the allele combinations tested, the heterozygous positions in exon 2 (Figure 1) could not be phased to heterozygous positions in exon 3, 4, or 5 (Figure 1) because of long distances between polymorphic positions, >650 bases in the clinical setting. Technical limitations of the Illumina sequencing method make it difficult to sequence DNA fragments >650 bases with enough depth to determine phasing. All samples selected for this study were initially sequenced on the Illumina MiSeq with paired-end sequencing through routine testing, and specific polymorphisms were unphased. The method used to sequence HLA genes specifically aims to provide a wide range of DNA fragment sizes to be able to phase heterozygous positions not necessarily located on the same physical read but located on the same fragment, providing phasing of positions that are 650 bases in length, ranging from 11.98% to 20.13% of the total fragments sequenced (Figure 2A). Only one sample, S14 (Figure 1E), was phased with the higher depth of coverage achieved through the resequencing effort, whereby the polymorphic positions that were 670 bases apart were now phased. This sample had the highest number of fragments used for analysis (approximately 69,000), where the median fragment size was 464 bases and 15.87% of fragments were ≥650 bases. The overall depth of coverage coupled with the profile of fragment sizes allowed phasing of this locus when all possible fragments were accounted for during the analysis. For all other samples, which had larger distances to phase and fewer fragments available for analysis, the reanalysis did not improve the phasing. Taken together, it is difficult, if not impossible, to phase the polymorphic positions observed in the ambiguous allele combinations using the Illumina data alone. To secure phase between exon 2 and all other distal exons in the HLA-DPB1 gene, the samples that presented us with phasing challenges on the Illumina MiSeq were then sequenced on the ONT MinION. The benefit of the MinION technology allows for sequencing the DPB1 amplicon in its entirety, a total of approximately 7150 bases, without fragmentation, guaranteeing that the heterozygous positions will be on phase. The length of DNA fragments that were sequenced on the MinION can be seen in Figure 2B. All samples had most reads at the expected
Referência(s)