Complete mitogenome in a population sample from Cameroon
2021; Elsevier BV; Volume: 55; Linguagem: Inglês
10.1016/j.fsigen.2021.102597
ISSN1878-0326
AutoresS. Olaechea-Lázaro, Óscar García, Rafaela González‐Montelongo, José M. Lorenzo-Salazar, Carlos Flores, Saioa López, Krishna R. Veeramah, Garrett Hellenthal, Mark Thomas, Santos Alonso,
Tópico(s)Environmental DNA in Biodiversity Studies
ResumoMitochondrial DNA (mtDNA) can be a useful tool for forensic applications, particularly when the amount of nuclear DNA is limited and/or is severely degraded [[1]King T.E. Fortes G.G. Balaresque P. Thomas M.G. Balding D. Maisano Delser P. Neumann R. Parson W. Knapp M. Walsh S. Tonasso L. Holt J. Kayser M. Appleby J. Forster P. Ekserdjian D. Hofreiter M. Schurer K. Identification of the remains of King Richard III.Nat. Commun. 2014; 5: 5631Crossref PubMed Scopus (109) Google Scholar]. Its maternal transmission, lack of recombination, and high mutation rate have also proved useful in population-genetics studies [[2]Amorim A. Fernandes T. Taveira N. Mitochondrial DNA in human identification: a review.PeerJ. 2019; 7: 7314Crossref Scopus (14) Google Scholar]. In forensics, large amount of data is indeed important to accurately estimate the frequency of a questioned haplotype. EMPOP (EDNAP mtDNA Population Database, https://empop.online; [[3]Parson W. Dür A. EMPOP-a forensic mtDNA database.Forensic Sci. Int. Genet. 2007; 1: 88-92Abstract Full Text Full Text PDF PubMed Scopus (267) Google Scholar]) has become the reference repository of high-quality mtDNA sequences. The last available version (v4/release 13) features sequences from about 50,000 individuals. However the majority of these sequences (>95%) only cover the control region (CR). As regards the proportion of entries from African individuals (about 5%), this figure shows that Africans are under-represented as ~17% of people in the world are African. Besides, only a small number of these entries in EMPOP correspond to full mitogenomes. This underrepresentation is a matter of particular concern from a population genetic and human evolution point of view, since Africa has the highest mtDNA diversity in the world [[4]Cerezo M. Gusmão L. Černý V. Uddin N. Syndercombe-Court D. Gómez-Carballa A. Göbel T. Schneider P.M. Salas A. Comprehensive analysis of pan-african mitochondrial DNA variation provides new insights into continental variation and demography.J. Genet. Genom. 2016; 43: 133-143Crossref PubMed Scopus (9) Google Scholar]. To address this lacuna, we present 100 mitogenomes from Cameroonian individuals. Cameroon, a name derived from the colonial Portuguese settler's expression Rio dos Camarões, is often referred to as "Africa-in-miniature" due to its high ecological, linguistic, and ethnic diversity. The Cameroonian people practice a varied set of subsistence strategies, depending on their habitat. Around 250 languages are spoken in Cameroon, mainly belong to three big language families: the Niger-Congo, Nilo-Saharian, and Afro-Asiatic language families. Those local languages are, in general, confined to specific geographic areas in the country [[5]Kouega J.P. The language situation in Cameroon.Curr. Issues Lang. Plan. 2007; 8: 3-93Crossref Google Scholar]. Buccal swabs of 100 Cameroonian individuals of 30 ethnic groups were analyzed (Fig. 1). Samples were collected anonymously and all study participants gave their informed consent. Local permissions were obtained in Cameroon by the Ministry of Higher Education and Scientific Research, permits 0188/MINREST/B00/D00/D10/D12 and 317/MINREST/B00/D00/D10 and University of Yaounde I. Sample collection was approved by the UK ethics committee London Bentham REC (formally the Joint UCL/UCLH Committees on the Ethics of Human Research: Committee A and Alpha, REC reference number 99/0196, Chief Investigator Mark G. Thomas). Genomic DNA was extracted by a standard phenol-chloroform separation and isopropanol precipitation procedure. All experiments were conducted in accordance with quality control measures following ISFG recommendations on good laboratory practices for databasing. mtDNA files in FASTQ format were generated from BAM files obtained from whole exome sequencing (WES), previously mapped to the hg19 human reference genome (manuscript in preparation). Briefly, WES was performed using Illumina DNA Prep with Enrichment, following the manufacturer's recommendations. Library preparation began from 50 ng of genomic DNA, generating libraries flanked by dual indexes with mean sizes between 300 and 400 bp. Each library pool was clustered in two lanes of the flow cell, and sequenced on the Illumina HiSeq 4000 sequencing platform using 2 × 75 bp paired-end reads. The sequencing experiments were carried out at the Instituto Tecnológico y de Energías Renovables (Santa Cruz de Tenerife, Spain). To remap mitochondrial reads against rCRS (the Revised Cambridge Reference Sequence, sequence number NC_012920), the Genome Analysis Toolkit (GATK) v4 [[6]McKenna A. Hanna M. Banks E. Sivachenko A. Cibulskis K. Kernytsky A. Garimella K. Altshuler D. Gabriel S. Daly M. DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.Genome Res. 2010; 20: 1297-1303Crossref PubMed Scopus (11990) Google Scholar], SAMtools and HTSlib v1.3.1 [[7]Li H. Handsaker B. Wysoker A. Fennell T. Ruan J. Homer N. Marth G. Abecasis G. Durbin R. 1000 genome project data processing subgroup, the sequence alignment/map (SAM) format and SAMtools.Bioinformatics. 2009; 25: 2078-2079Crossref PubMed Scopus (24936) Google Scholar] and BEDtools bamtofastq v2.26.0 [[8]Quinlan A.R. Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features.Bioinformatics. 2010; 26: 841-842Crossref PubMed Scopus (9080) Google Scholar] were integrated in an in-house bioinformatics pipeline. The rCRS-fasta file was downloaded from MITOMAP (https://www.mitomap.org/foswiki/bin/view/MITOMAP/MitoSeqs). Mapping was done using the Burrows-Wheeler Alignment Tool (BWA-MEM v.0.7.12-r1039) [[9]H. Li, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997v2 (2013).Google Scholar]. A fasta file was also generated from rCRS-aligned bam files, using samtools mpileup and bcftools (v1.9) (SAMtools package). The commands used are available in the Supplementary File 1. To look for unexpected or missing mutations, we further manually checked the corresponding BAM files with Integrative Genomics View (IGV v2.8.13) [[10]Thorvaldsdóttir H. Robinson J.T. Mesirov J.P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.Brief. Bioinform. 2013; 14: 178-192Crossref PubMed Scopus (4050) Google Scholar]. Mitochondrial haplogroups were assigned using HaploGrep 2, v2.1.21 [[11]Weissensteiner H. Pacher D. Kloss-Brandstätter A. Forer L. Spetch G. Bandelt H.J. Kronenberg F. Salas A. Schönherr S. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing.Nucleic Acids Res. 2016; 44: W58-W63Crossref PubMed Google Scholar], according to PhyloTree Build 17 (www.phylotree.org; [[12]van Oven M. Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.Hum. Mutat. 2009; 30: e386-e394Crossref PubMed Scopus (1188) Google Scholar]) and confirmed and/or adapted using SAM2 [[13]Huber N. Parson W. Dür A. Next generation database search algorithm for forensic mitogenome analyses.Forensic Sci. Int. Genet. 2018; 37: 204-214Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar] provided by EMPOP (https://empop.online; [[3]Parson W. Dür A. EMPOP-a forensic mtDNA database.Forensic Sci. Int. Genet. 2007; 1: 88-92Abstract Full Text Full Text PDF PubMed Scopus (267) Google Scholar]). Confidence intervals were calculated using the Beta distribution with qbeta function in base R [[14]R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL 〈https://www.R-project.org/〉.Google Scholar]. The Cameroonian mitogenome sequences are available on EMPOP in phylogenetic alignment [13Huber N. Parson W. Dür A. Next generation database search algorithm for forensic mitogenome analyses.Forensic Sci. Int. Genet. 2018; 37: 204-214Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar, 15Parson W. Gusmão L. Hares D.L. Irwin J.A. Mayr W.R. Morling N. Pokorak E. Prinz M. Salas A. Schneider P.M. Parsons T.J. DNA Commission of the International Society for Forensic Genetics: revised and extended guidelines for mitochondrial DNA typing.Forensic Sci. Int. Genet. 2014; 13: 134-142Abstract Full Text Full Text PDF PubMed Scopus (167) Google Scholar] under the accession number EMP00844. The entire dataset underwent EMPOP quality control according to [[3]Parson W. Dür A. EMPOP-a forensic mtDNA database.Forensic Sci. Int. Genet. 2007; 1: 88-92Abstract Full Text Full Text PDF PubMed Scopus (267) Google Scholar]. Random match probability (RMP) and power of discrimination (haplotype diversity) were calculated as in [[16]Stoneking M. Hedgecock D. Higuchi R.G. Vigilant L. Erlich H.A. Population variation of human mtDNA control region sequences detected by enzymatic amplification and sequence-specific oligonucleotide probes.Am. J. Hum. Genet. 1991; 48: 370-382PubMed Google Scholar], disregarding indels in poli-C and AC tracts. Mean pairwise differences (average number of nucleotide site differences between pairs of sequences) were calculated using an in-house Excel worksheet. To visualize how our Cameroonian sample relates to other African population samples, a multi-dimensional scaling (MDS) plot was performed on haplogroup frequencies using the FinePop package [14R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL 〈https://www.R-project.org/〉.Google Scholar, 17Kitada S. Kitakado T. Kishino H. The empirical Bayes inference of pairwise FST and its distribution in the genome.Genetics. 2007; 177: 861-873Crossref PubMed Scopus (27) Google Scholar]. For that, FST values were obtained by means of the Empirical Bayes estimator of FST. Publicly available mtDNA haplogroup frequencies from Rwanda (n = 153), Ivory Coast (n = 100), a mixed West African dataset (n = 145) [[18]Göbel T.M.K. Bodner M. Robino C. Augustin C. Huber G.E. Marra M. Mutesa L. Pasino S. Santovito A. Zimmermann B. Schneider P.M. Parson W. Mitochondrial DNA variation in Sub-Saharan Africa: forensic data from a mixed West African sample, Côte d′Ivoire (Ivory Coast), and Rwanda.Forensic Sci. Int. Genet. 2020; 44102202Abstract Full Text Full Text PDF PubMed Scopus (2) Google Scholar], Ghana (n = 192) [[19]Fendt L. Röck A. Zimmermann B. Bodner M. Thye T. Tschentscher F. Owusu-Dabo E. Göbel T.M. Schneider P.M. Parson W. MtDNA diversity of Ghana: a forensic and phylogeographic view.Forensic Sci. Int. Genet. 2012; 6: 244-249Abstract Full Text Full Text PDF PubMed Scopus (21) Google Scholar], Kenya (n = 84) [[20]Brandstätter A. Peterson C.T. Irwin J.A. Mpoke S. Koech D.K. Parson W. Parsons T.J. Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.Int. J. Leg. Med. 2004; 118: 294-306Crossref PubMed Scopus (97) Google Scholar], Somalia (n = 190) [[21]Mikkelsen M. Fendt L. Röck A.W. Zimmermann B. Rockenbauer E. Hansen A.J. Parson W. Morling N. Forensic and phylogeographic characterisation of mtDNA lineages from Somalia.Int. J. Leg. Med. 2012; 126: 573-579Crossref PubMed Scopus (15) Google Scholar], and Egypt (n = 276) [[22]Saunier J.L. Irwin J.A. Strouss K.M. Ragab H. Sturk K.A. Parsons T.J. Mitochondrial control region sequences from an Egyptian population sample.Forensic Sci. Int. Genet. 2009; 3: e97-e103Abstract Full Text Full Text PDF PubMed Scopus (36) Google Scholar] were used in this comparison. While our study is based on complete mitogenomes, those datasets include only the control region (CR). For this reason, our Cameroonian haplogroups were clustered to the closest common haplogroup level with the rest of the datasets. The hierarchical haplogroup frequencies of 100 Cameroonian mitogenomes are presented in Table 2, according to Phylotree Build 17 (www.phylotree.org; [[12]van Oven M. Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.Hum. Mutat. 2009; 30: e386-e394Crossref PubMed Scopus (1188) Google Scholar]). This sample contained 95 different mitogenomes and 87 different CR sequences of which 91 and 76 were singletons, respectively (Supplementary Table 1). For the mitogenome and the CR, the random match probability (RMP) was estimated at 1.12% and 1.30%, respectively. The power of discrimination was 98.88% and 98.70%, and the mean number of pairwise differences (MNPD) was 27.87 ± 21.16 and 6.59 ± 4.85, respectively. These results confirm marginal gains in discriminatory capacity of mitogenomes when compared to the standard CR for this population (Table 1).Table 1Forensic parameters in mitogenomes from Cameroon (disregarding length variants at positions 309, 16093 and point heteroplasmies).Mitogenome 1–16569Control region 16024–576No. of samples100100No. of haplotypes9587No. of unique haplotypes9176No. of haplogroups6254Power of discrimination (%)98.8898.70Haplotype diversity0.99880.9970Mean pairwise differences27.87 ± 21.166.59 ± 4.85 Open table in a new tab Table 2Haplogroup frequencies of 100 mitogenomes from Cameroon according to Phylotree build 17 [12]van Oven M. Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.Hum. Mutat. 2009; 30: e386-e394Crossref PubMed Scopus (1188) Google Scholar.HV0 (T195C)1L0a1a21L0a1b1a1L0a1b21L0a1e3L0a21L0a2a13L0a2a1b1L0a2a2a2L1b1a3L1b1a151L1b1a3b1L1b1a62L1c1'2'4'5'61L1c1d12L1c2a21L1c2a3a1L1c2b1a1L1c2b1a'b1L1c2b1b3L1c3a1b1L1c3b1a2L1c3b1b2L2a1(G143A T16189C (C16192T))2L2a1a12L2a1a21L2a1a2a1a1L2a1b2L2a1c51L2a1d11L2a1f1L2a1q1L2a2a12L2c2b1b1L2d11L2d1a2L3b1a2L3b1a11L3b1a1a2L3b1a7a1L3b2a1L3d1a1a1L3d1b1L3d1b31L3d1b3a1L3d1d2L3e12L3e1b23L3e1e1L3e2a1b11L3e2a1b24L3e2b8L3e2b1a21L3e3b11L3e3b22L3e5c1L3f1b1a12L3f1b4a1L3f2b1L3f32L3h1b1a2L4b2a21100 Open table in a new tab The most common mitogenome, with three observations (3%; population 95% CI 1.089–8.436%), belonged to haplogroup L1b1a (PhyloTree, build 17; [[13]Huber N. Parson W. Dür A. Next generation database search algorithm for forensic mitogenome analyses.Forensic Sci. Int. Genet. 2018; 37: 204-214Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar]) (see samples CAM-002, CAM-005 and CAM-017 in Supplementary Table 1), relative to the rCRS [[23]Andrews R.M. Kubacka I. Chinnery P.F. Lightowlers R.N. Turnbull D.M. Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.Nat. Genet. 1999; 23: 147Crossref PubMed Scopus (2383) Google Scholar], followed by haplotypes belonging to the haplogroups L1b1a6, L3e1b2, as well as the haplogroup L3f1b1a1, with two observations each (2%; population 95% CI 0.617–6.971%). The two most common CR mitotypes with three observations each (3%; population 95% CI 1.089–8.436%), belonged to the haplogroups L1b and L3e2b (see samples CAM-002, CAM-005, CAM-017 and CAM-093, CAM-094, CAM-098, respectively in Supplementary Table 1). Setting the heteroplasmy threshold at 20% of total coverage, twelve heteroplasmic positions were found (6581 R, 7163Y, 7347 R, 8746Y, 9890 R, 10410Y, 10685 R, 10846Y, 12366 R, 16093Y, 16172Y, 16189Y) in a total of eleven samples (one sample shows two heteroplasmic mutations). The HVI positions were previously described as hotspots [[24]Irwin J.A. Saunier J.L. Niederstätter H. Strouss K.M. Sturk K.A. Diegoli T.M. Brandstätter A. Parson W. Parsons T.J. Investigation of heteroplasmy in the human mitochondrial DNA Control region: a synthesis of observations from more than 5000 global population samples.J. Mol. Evol. 2009; 68: 516-527Crossref PubMed Scopus (117) Google Scholar]. The 95 haplotypes from Cameroon were assigned to 62 distinct haplogroups (Table 2). The majority (99%) belonged to the main common haplogroups in Africa (L1-L4). The sample composition was as follows: 45% (population 95% CI 35.600–54.778%) were L3 haplotypes (in descending frequency L3e, L3b, L3d, L3f, L3h), 22% (population 95% CI 15.017–31.101%) were L1 haplotypes (mainly L1c, L1b), 18% (population 95% CI 11.723–26.696%) were L2 haplotypes (predominantly L2a; also L2d, L2c), 13% (population 95% CI 7.790 21.004%) were L0 haplotypes (all L0a), and two singletons haplotypes belonged to L4b and HV0(T195C) with 1% frequency each. The most frequent of the 62 haplogroups were L3e2b (8%) and L3e2a1b2 (4%). The presence of one HV0 haplotype stands out in our study, as this haplogroup is typically found in West Eurasia, particularly Northwest Europe, and it has been inferred to have originated 20 kya in Anatolia (present day Turkey) [[25]Behar D.M. van Oven M. Rosset S. Metspalu M. Loogväli E.L. Silva N.M. Kivisild T. Torroni A. Villems R. A "Copernican" reassessment of the human mitochondrial DNA tree from its root.Am. J. Hum. Genet. 2012; 90: 675-684Abstract Full Text Full Text PDF PubMed Scopus (273) Google Scholar]. However, this haplogroup has also been reported in Tcheboua Fulani (North of Cameroon) and Kanuri (Nigeria) individuals by [[26]Cerezo M. Černý V. Carracedo A. Salas A. New Insights into the lake chad basin population structure revealed by high-throughput genotyping of mitochondrial DNA coding SNPs.PLoS One. 2011; 6: 18682Crossref PubMed Scopus (21) Google Scholar], consistent with gene flow from western Eurasia to Africa. The HV0 individual reported in this study is also a Kanuri and this could indicate a possible recent migratory event from the Near East to Cameroon. To visualize haplogroup sharing between our Cameroonian sample and a set of other African population samples, MDS analysis was performed (Supplementary Figure 1). Pairwise FST distances (Supplementary Table 2) show the highest distance values with Egypt. Egypt is located at the nexus of Africa and Eurasia, and has a haplogroup composition reflecting this, i.e., it shares many haplogroups with other North African populations as well those from southwest Asia [[22]Saunier J.L. Irwin J.A. Strouss K.M. Ragab H. Sturk K.A. Parsons T.J. Mitochondrial control region sequences from an Egyptian population sample.Forensic Sci. Int. Genet. 2009; 3: e97-e103Abstract Full Text Full Text PDF PubMed Scopus (36) Google Scholar]. Reflecting this, we can observe how the West African populations (Ghana, West Africa and Ivory Coast) map closely together on one end of the X axis, whereas Egypt maps on the opposite end. Our Cameroon sample tends to cluster with this West African set of samples along the X axis, although it tends to occupy and intermediate position between the West and East (Kenya, Rwanda) samples along the Y axis. This is the first study presenting the complete mitogenomes of an appreciably-sized Cameroonian sample. The Cameroonian sample mtDNA haplogroup distributions shows similarities to other West African populations. In addition, this study contributes to palliating the current underrepresentation of African populations in mtDNA databases, particularly concerning whole mitogenomes.
Referência(s)