The Dual Origin of the Malagasy in Island Southeast Asia and East Africa: Evidence from Maternal and Paternal Lineages
2005; Elsevier BV; Volume: 76; Issue: 5 Linguagem: Inglês
10.1086/430051
ISSN1537-6605
AutoresMatthew E. Hurles, Bryan Sykes, Mark A. Jobling, Peter Forster,
Tópico(s)Rangeland Management and Livestock Ecology
ResumoLinguistic and archaeological evidence about the origins of the Malagasy, the indigenous peoples of Madagascar, points to mixed African and Indonesian ancestry. By contrast, genetic evidence about the origins of the Malagasy has hitherto remained partial and imprecise. We defined 26 Y-chromosomal lineages by typing 44 Y-chromosomal polymorphisms in 362 males from four different ethnic groups from Madagascar and 10 potential ancestral populations in Island Southeast Asia and the Pacific. We also compared mitochondrial sequence diversity in the Malagasy with a manually curated database of 19,371 hypervariable segment I sequences, incorporating both published and unpublished data. We could attribute every maternal and paternal lineage found in the Malagasy to a likely geographic origin. Here, we demonstrate approximately equal African and Indonesian contributions to both paternal and maternal Malagasy lineages. The most likely origin of the Asia-derived paternal lineages found in the Malagasy is Borneo. This agrees strikingly with the linguistic evidence that the languages spoken around the Barito River in southern Borneo are the closest extant relatives of Malagasy languages. As a result of their equally balanced admixed ancestry, the Malagasy may represent an ideal population in which to identify loci underlying complex traits of both anthropological and medical interest. Linguistic and archaeological evidence about the origins of the Malagasy, the indigenous peoples of Madagascar, points to mixed African and Indonesian ancestry. By contrast, genetic evidence about the origins of the Malagasy has hitherto remained partial and imprecise. We defined 26 Y-chromosomal lineages by typing 44 Y-chromosomal polymorphisms in 362 males from four different ethnic groups from Madagascar and 10 potential ancestral populations in Island Southeast Asia and the Pacific. We also compared mitochondrial sequence diversity in the Malagasy with a manually curated database of 19,371 hypervariable segment I sequences, incorporating both published and unpublished data. We could attribute every maternal and paternal lineage found in the Malagasy to a likely geographic origin. Here, we demonstrate approximately equal African and Indonesian contributions to both paternal and maternal Malagasy lineages. The most likely origin of the Asia-derived paternal lineages found in the Malagasy is Borneo. This agrees strikingly with the linguistic evidence that the languages spoken around the Barito River in southern Borneo are the closest extant relatives of Malagasy languages. As a result of their equally balanced admixed ancestry, the Malagasy may represent an ideal population in which to identify loci underlying complex traits of both anthropological and medical interest. The island of Madagascar lies in the Indian Ocean, ∼250 miles from the African coast and ∼4,000 miles from Indonesia. Paleoecological and archaeological evidence suggest that, by 1,500–2,000 years ago, Madagascar had become the last great island landmass to be settled (Dewar and Wright Dewar and Wright, 1993Dewar RE Wright HT The culture history of Madagascar.J World Prehistory. 1993; 7: 417-466Crossref Scopus (114) Google Scholar; Burney et al. Burney et al., 2004Burney DA Burney LP Godfrey LR Jungers WL Goodman SM Wright HT Jull AJ A chronology for late prehistoric Madagascar.J Hum Evol. 2004; 47: 25-63Crossref PubMed Scopus (293) Google Scholar). The Malagasy language shares 90% of its basic vocabulary with Maanyan, a language spoken in the Barito River region of southern Borneo, which indicates that the predominant ancestry of the Malagasy language most likely derives from Borneo (Dahl Dahl, 1951Dahl OC Malgache et Maanyan: une comparison linguistique. Egede Intitutett, Oslo1951Google Scholar; Adelaar Adelaar, 1995Adelaar A Asian roots of the Malagasy: a linguistic perspective.Bijdragen tot de Taal-Land en Volkenkunde. 1995; 151: 325-356Crossref Google Scholar). Malagasy also contains linguistic borrowings from the Bantu languages spoken in East Africa (Dahl Dahl, 1988Dahl OC Bantu substratum in Malagasy.Études Océan Indien. 1988; 9: 91-132Google Scholar). Furthermore, substantial components of Malagasy material culture (e.g., cattle pastoralism) could be derived only from African sources. At the time of the first Madagascan settlement, the entire Indian Ocean was a vast trading network connecting China with the Mediterranean and all societies in between (Vérin and Wright Vérin and Wright, 1999Vérin P Wright H Madagascar and Indonesia: new evidence from archaeology and linguistics.Indo Pac Prehist Assoc Bull. 1999; 18: 35-42Google Scholar). There is substantial evidence of Islamic influence and limited evidence of Indian influence on the Malagasy, in both language and culture. In contrast to these cultural and linguistic traces of Malagasy ancestry, the genetic origins of the Malagasy are relatively poorly understood, and conflicting signals of African, Asian, and Pacific origin have appeared from studies of different loci (Migot et al. Migot et al., 1995Migot F Perichon B Danze PM Raharimalala L Lepers JP Deloron P Krishnamoorthy R HLA class II haplotype studies bring molecular evidence for population affinity between Madagascans and Javanese.Tissue Antigens. 1995; 46: 131-135Crossref PubMed Scopus (8) Google Scholar; Soodyall et al. Soodyall et al., 1995Soodyall H Jenkins T Stoneking M “Polynesian” mtDNA in the Malagasy.Nat Genet. 1995; 10: 377-378Crossref PubMed Scopus (41) Google Scholar; Hewitt et al. Hewitt et al., 1996Hewitt R Krause A Goldman A Campbell G Jenkins T β-globin haplotype analysis suggests that a major source of Malagasy ancestry is derived from Bantu-speaking Negroids.Am J Hum Genet. 1996; 58: 1303-1308PubMed Google Scholar). These contradictions result, in part, from being able to identify the likely origins of only a subset of lineages present at any single locus. In the present study, we employed the detailed phylogenetic and geographic resolution of paternally inherited Y-chromosomal lineages and maternally inherited mtDNA lineages to apportion Malagasy lineages to ancestral populations. In this way, the contributions of the different ancestral populations to the modern Malagasy gene pool can be estimated directly, and likely geographic origins can be pinpointed with precision. We assayed mtDNA and Y-chromosomal diversity in a Malagasy population sample comprising four different ethnic populations: Bezanozano (n=6), Betsileo (n=18), Merina (n=10), and Sihanaka (n=3). Ten potential ancestral populations (n=327) representing major population groups within Island Southeast Asia and Oceania were also analyzed with Y-chromosomal markers. To type all these samples for the required number of Y-chromosomal and mitochondrial (mt) markers, it was necessary to perform whole-genome amplification. Degenerate oligonucleotide-primed PCR (Nrich [Genetix]) (Telenius et al. Telenius et al., 1992Telenius H Carter NP Bebb CE Nordenskjold M Ponder BA Tunnacliffe A Degenerate oligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer.Genomics. 1992; 13: 718-725Crossref PubMed Scopus (1157) Google Scholar) performed better than multiple displacement amplification (Molecular Staging) (Dean et al. Dean et al., 2002Dean FB Hosono S Fang L Wu X Faruqi AF Bray-Ward P Sun Z Zong Q Du Y Du J Driscoll M Song W Kingsmore SF Egholm M Lasken RS Comprehensive human genome amplification using multiple displacement amplification.Proc Natl Acad Sci USA. 2002; 99: 5261-5266Crossref PubMed Scopus (1026) Google Scholar) in early trials and, consequently, was used throughout. We selected 44 binary markers in the present Y-haplogroup phylogeny (Y Chromosome Consortium [YCC] Y Chromosome Consortium (YCC), 2002Y Chromosome Consortium (YCC) A nomenclature system for the tree of human Y-chromosomal binary haplogroups.Genome Res. 2002; 12: 339-348Crossref PubMed Scopus (657) Google Scholar; Jobling and Tyler-Smith Jobling and Tyler-Smith, 2003Jobling MA Tyler-Smith C The human Y chromosome: an evolutionary marker comes of age.Nat Rev Genet. 2003; 4: 598-612Crossref PubMed Scopus (661) Google Scholar) that were predicted to be particularly informative in this study. These markers were typed using a combination of single-plex PCRs described elsewhere (Hurles et al. Hurles et al., 2002Hurles ME Nicholson J Bosch E Renfrew C Sykes BC Jobling MA Y chromosomal evidence for the origins of Oceanic-speaking peoples.Genetics. 2002; 160: 289-303PubMed Google Scholar; YCC Y Chromosome Consortium (YCC), 2002Y Chromosome Consortium (YCC) A nomenclature system for the tree of human Y-chromosomal binary haplogroups.Genome Res. 2002; 12: 339-348Crossref PubMed Scopus (657) Google Scholar) and nine novel PCR multiplexes, each analyzing between three and seven SNPs. These multiplexes were designed to facilitate hierarchical typing, which minimizes the amount of genomic DNA required to define lineages at high resolution. These multiplexes use locus-specific primers tagged with universal primers to enable a two-step amplification protocol that equalizes the simultaneous amplification of multiple loci (Belgrader et al. Belgrader et al., 1996Belgrader P Marino MM Lubin M Barany F A multiplex PCR-ligase detection reaction assay for human identity testing.Genome Sci Technol. 1996; 1: 77-87Crossref Google Scholar; Paracchini et al. Paracchini et al., 2002Paracchini S Arredi B Chalk R Tyler-Smith C Hierarchical high-throughput SNP genotyping of the human Y chromosome using MALDI-TOF mass spectrometry.Nucleic Acids Res. 2002; 30: e27Crossref PubMed Scopus (58) Google Scholar). SNPs lying within these PCR products were subsequently genotyped by single-base extension (SNaPshot [Applied Biosystems]) and capillary electrophoresis. Primer extension reactions were performed in half the recommended reaction volume but were otherwise processed in accordance with the manufacturer’s instructions. The amplification and extension primers used in the present study are detailed in table 1.Table 1Primers for Y Binary and mt Variant Marker MultiplexesPrimerAmplificationbLowercase letters indicate universal (ZIP) primers; letters in italics indicate spacer primers; bold uppercase letters indicate locus-specific primers.Multiplex and MarkeraAll but “Variant” markers are Y-binary markers.ForwardReverseExtensioncLowercase letters indicate variable-length tag primers; uppercase letters indicate locus-specific primers.Y binary A multiplex: M130ggagcacgctatcccgttagacTTGTGTTTTGGTGGGATGTTGcgctgccaactaccgcacatgTACTCTGCCCACAGAGATGGTtgactgaGCCCTTTCCCCTGGGCAG M145ggagcacgctatcccgttagacTATTCAGCAAGAGTAAGCAAGAGGcgctgccaactaccgcacatgATCCTTTTTGGATCATGGTTCTTactgactgactTTAGGCTAAGGCTGGCTCT M89ggagcacgctatcccgttagacTCCTATGAGGTGCCATGAAAcgctgccaactaccgcacatgGGATCACCAGCAAAGGTAGCctgactgactgactgactgactgactCTCAGGCAAAGTGAGAGAT M9ggagcacgctatcccgttagacTCTGCAAAGAAACGGCCTAAGcgctgccaactaccgcacatgACCGATTAAAAAGAGGCATTTTGACGGCCTAAGATGGTTGAAT M45ggagcacgctatcccgttagacAGCTGGCAAGACACTTCTGAGcgctgccaactaccgcacatgTAATATGTTCCTGACACCTTCCgactgactgactgactgactCCTCAGAAGGAGCTTTTTGC M96ggagcacgctatcccgttagacAGTTGCCCTCTCACAGAGCACcgctgccaactaccgcacatgAAAGGTCACTGGAAGGATTGCctgactgactgactGAAAACAGGTCTCTCATAATA M168ggagcacgctatcccgttagacAGGATTCATGATGAAATCTGCTTcgctgccaactaccgcacatgAAATCTCATAGGTCTCTGACTGTTCctgactgactgactgactgactgactgactGTATGTGTTGGAGGTGAGTY binary B multiplex: M60ggagcacgctatcccgttagacAGAGCCCTGATGTGGACTCAAcgctgccaactaccgcacatgACGCCAGTGCATTGAACACTAgactgactgactgactgactgTAACCACTGTGTGCCTGAT M91ggagcacgctatcccgttagacCACCCGTTAAGCAAAAATCCcgctgccaactaccgcacatgTGCAGTGCCCTTCCAAATAAAgactgactgactgGTAGTGAACTGATTAAAAAAAAY binary C multiplex: LLy22gggagcacgctatcccgttagacTGATGTTGGCCTTTACAGCTCcgctgccaactaccgcacatgTTTGGCTGAGAGACTGCGGGgactgactgactgactgaAATTATTGTTTAAGCCACTAAG M5ggagcacgctatcccgttagacGGGGTCCTATCAGGGGTTTAcgctgccaactaccgcacatgTTTGTCTATTACCAAAGGTTTGTGgactgaCTTGCACTCTTCTCCTTCT M122ggagcacgctatcccgttagacTGGTAAACTCTACTTAGTTGCCTTTcgctgccaactaccgcacatgATCAGCGAATTAGATTTTCTTGCgactgactgacTTTTTTTTCCCCTGAGAGC M134ggagcacgctatcccgttagacTAGAATCATCAAACCCAGAAGGcgctgccaactaccgcacatgTCTTTGGCTTCTCTTTGAACAGgactgactgactgactgactgactgTACTTTTGATCCCCACCAAT M175ggagcacgctatcccgttagacTATCAGGCACATGCCTTCTCACcgctgccaactaccgcacatgATGGTCGAGTGTAGTGCATTGGCACATGCCTTCTCACTTCTC PN31ggagcacgctatcccgttagacTTAAGGCTGCGTGTTCCCTATcgctgccaactaccgcacatgTTGCACCTGACCTGTTCTTACgactgactgacATAAATAAGGTTTTTTTTTGGTTGY binary D multiplex: M3ggagcacgctatcccgttagacTAATCAGTCTCCTCCCAGCAcgctgccaactaccgcacatgAAATTGTGAATCTGAAATTTAAGGgactgactgactgactgGGTCACCTCTGGGACTGA MEH2ggagcacgctatcccgttagacTCGTTTTCTGATAGAAGATATAAATGcgctgccaactaccgcacatgATACCATGAAAATTCATAATCCACAgactgacTTTATGTAATTTAAAGCATAGTG PN25ggagcacgctatcccgttagacAGCTATGCCTACAAAATGACACcgctgccaactaccgcacatgTAAAGGCTAAAGCAAAAAAGAAACgactgacCTGCCTGAAACCTGCCTG M207ggagcacgctatcccgttagacAAGGAAAAATCAGAAGTATCCCTGcgctgccaactaccgcacatgTTGGGATCTAATTTCTTCATTAGgactgactgactgactgaTGTAAGTCAAGCAAGAAATTTAY binary E multiplex: M35ggagcacgctatcccgttagacTATAAGCCTAAAGAGCAGTCAGAGcgctgccaactaccgcacatgAGGTGAATGAACAACTAATCCATgactgactgactgactCGGAGTCTCTGCCTGTGTC M123ggagcacgctatcccgttagacTACACAGAGCAAGTGACTCTCAAAcgctgccaactaccgcacatgAAGTTGCCCAGGAATTTGCATGTATCTGAACTAGCATATCA PN2ggagcacgctatcccgttagacTCTTGATGCAAATGAGAAAGAACTcgctgccaactaccgcacatgACTCTAAAAACTGGAGGGAGAAAgactgactgactgTGCCCCTAGGAGGAGAA M2ggagcacgctatcccgttagacTCCCAGGAAGGTCCAGTAACAcgctgccaactaccgcacatgAAAATGGAAAATACAGCTCCCCgactgaTTATCCTCCACAGATCTCAY binary F multiplex: M27ggagcacgctatcccgttagacTCATGCCCAGCTGAAACAATAcgctgccaactaccgcacatgTCATGATTTGTCTTCTATTTCGgGGAATCGAGGTTCAGGACA M61ggagcacgctatcccgttagacTATTGGATTGATTTCAGCCTTCcgctgccaactaccgcacatgTATTTTATTTTCTGTGTTCCTTGCgactgacAGCTTCTCCTCTGGAGTC M70ggagcacgctatcccgttagacTGGCACCATCTGTGAAAACACcgctgccaactaccgcacatgTTATCTTTATTCCCTTTGTCTTGCTgactgactgacTCTGTTGTGGTAGTCTTAG M147ggagcacgctatcccgttagacTCACTCTGGAGGCCAAGGTAGcgctgccaactaccgcacatgTTATTCTGGGGCAATTTTAGGGgactgactgactgGTCTCTGAAAGAAAAAAACAAA SRY9138ggagcacgctatcccgttagacTGTTGATATGATATTATAGAGGCcgctgccaactaccgcacatgTCCCAGATGCATATATTACAGGgactgactgactgactgactgGCAAATTTAATGCTCTCGG M214ggagcacgctatcccgttagacTTAGGCTGATTTTGCTGCTGAcgctgccaactaccgcacatgTGAAATGCCACTTCACTCCAGgactgactgactgactgactgactGAGACACTGTCTGAAAACAACY binary G multiplex: SRY465ggagcacgctatcccgttagacTGCCGAAGAATTGCAGTTTGCcgctgccaactaccgcacatgTGTTGATGGGCGGTAAGTGGCgaGTTGTCCAGTTGCACTTC 47ZggagcacgctatcccgttagacTTCACCGTCTTAGCCAGGATGcgctgccaactaccgcacatgTTAGTTACGCCTTGCATAACgactgactCTGGACTTGGTGGCTCA M88ggagcacgctatcccgttagacATTCTAGGGTCAGGCAACTAGGcgctgccaactaccgcacatgTTGTTTGTTCTATTCTATGGTCTTCCgactgactgacTTATTCCTGCTTCTTCTGC M95ggagcacgctatcccgttagacGAGTGGAAATCAAGATGCCAAGcgctgccaactaccgcacatgTGCACCTGTTTTGTGTAAGAGgactgactgactgacGAAAGACTACCATATTAGTGY binary H multiplex: M50ggagcacgctatcccgttagacCGGCAACAGTGAGGACAGTcgctgccaactaccgcacatgTGGTCCAAGGGCTGCTGGAGgaAAAGGGCTCTGGTAAGAC M101ggagcacgctatcccgttagacTGCCTCTTGCTTACTCTTGCTcgctgccaactaccgcacatgTTGCAATCGGAAGCCTCAATCTgactgGGAGATTTACTGAATCAGTG M119ggagcacgctatcccgttagacGGGAAATGCCAAGGTAAATGcgctgccaactaccgcacatgTTATGGGTTATTCCAATTCAGgactgactgactCCAATTCAGCATACAGGCY binary I multiplex: M33ggagcacgctatcccgttagacTTTGAGATAAGCCGCTAAACTTATTGcgctgccaactaccgcacatgTTAGCCCCCAAGAGAGACAACTgacTTATCTCATAAGTTACTAGTTA M41ggagcacgctatcccgttagacTAGTATAATAGGCTGGGTGCTGcgctgccaactaccgcacatgACATGAGTTCAAATGATTCTTCgaGCCAACATGGTGAAACTG M44ggagcacgctatcccgttagacTGCAGGAATCCCTGAGCATAAcgctgccaactaccgcacatgCATGGCTGACAGCTAGGAAAgactgactgacCTAACCTTCTAGTACACTG M54ggagcacgctatcccgttagacAAGACTGAGGCCTCCTCTGGTcgctgccaactaccgcacatgACCATCTCCTCACCTCTCCAAgactgactgactgactgaCCCTCAGGCAGCCGCAC M75ggagcacgctatcccgttagacTGCTAACAGGAGAAATAAATTACAGACcgctgccaactaccgcacatgATATTGAACAGAGGCATTTGTGAgactgactgactgactgacGACAATTATCAAACCACATCCmt Variant multiplex: 10400ggagcacgctatcccgttagacTTGATCTAGAAATTGCCCTCCTcgctgccaactaccgcacatgTCATAATTTAATGAGTCGAAATCATgactgactTGTTTAAACTATATACCAATTC 15043ggagcacgctatcccgttagacTTCATCCGCTACCTTCACGCcgctgccaactaccgcacatgTGTTGTTTGATCCCGTTTCGTGgaCCTCTTCCTACACATCGG 10398ggagcacgctatcccgttagacTTGATCTAGAAATTGCCCTCCTcgctgccaactaccgcacatgTCATAATTTAATGAGTCGAAATCATgactgactgactgactgacCTACAAAAAGGATTAGACTGA 15301ggagcacgctatcccgttagacTTCATCCGCTACCTTCACGCcgctgccaactaccgcacatgTGTTGTTTGATCCCGTTTCGTGgactgactgactgactgactgactCTTTACCTTTCACTTCATCTT 10310ggagcacgctatcccgttagacTTGATCTAGAAATTGCCCTCCTcgctgccaactaccgcacatgTCATAATTTAATGAGTCGAAATCATgactgactgactgactgactgactgactgaGCCCTACAAACAACTAACCT 6455ggagcacgctatcccgttagacTAGGAACAGGTTGAACAGTCTAcgctgccaactaccgcacatgTGAAAAATCAGAATAGGTGTTGGgactgaAATACCAAACGCCCCTCTT 9824ggagcacgctatcccgttagacTCCATTTCCGACGGCATCTACcgctgccaactaccgcacatgTATTAAGGCGAAGTTTATTACTCgactgactgactgactgCACAGGCTTCCACGGACTa All but “Variant” markers are Y-binary markers.b Lowercase letters indicate universal (ZIP) primers; letters in italics indicate spacer primers; bold uppercase letters indicate locus-specific primers.c Lowercase letters indicate variable-length tag primers; uppercase letters indicate locus-specific primers. Open table in a new tab Together, these markers define 41 Y-chromosomal lineages, of which 10 are found in the Malagasy, 16 are found within Island Southeast Asia and Oceania, and 8 are found in East African populations (Luis et al. Luis et al., 2004Luis JR Rowold DJ Regueiro M Caeiro B Cinnioğlu C Roseman C Underhill PA Cavalli-Sforza LL Herrera RJ The Levant versus the Horn of Africa: evidence for bidirectional corridors of human migrations.Am J Hum Genet. 2004; 74: 532-544Abstract Full Text Full Text PDF PubMed Scopus (174) Google Scholar). The Y-chromosomal lineages in East Africa are nonoverlapping with those found in Island Southeast Asia and Oceania (see fig. 1). As a consequence of this population differentiation, it is simple to apportion lineages found in the Malagasy to either an African or an Asian origin. All but two Malagasy lineages can be found in either East African or Southeast Asian populations. The two unaccounted-for lineages are single chromosomes belonging to haplogroups L* and R1b. Haplogroup L* is found at appreciable frequencies only in populations bordering the northern Indian Ocean, and haplogroup R1b reaches highest frequencies in northwestern Europe (Jobling and Tyler-Smith Jobling and Tyler-Smith, 2003Jobling MA Tyler-Smith C The human Y chromosome: an evolutionary marker comes of age.Nat Rev Genet. 2003; 4: 598-612Crossref PubMed Scopus (661) Google Scholar). We believe these two lineages most likely reflect recent admixture events as a result of Indian Ocean trading links (Duplantier et al. Duplantier et al., 2002Duplantier JM Orth A Catalan J Bonhomme F Evidence for a mitochondrial lineage originating from the Arabian peninsula in the Madagascar house mouse (Mus musculus).Heredity. 2002; 89: 154-158Crossref PubMed Scopus (34) Google Scholar) and European colonization, respectively. To identify which Island Southeast Asian or Oceanic population represents the most likely source population for the Asian lineages found in the Malagasy, we computed pairwise FST distances, using the Arlequin software, to determine the closest populations, in terms of genetic distance to the Malagasy. This analysis indicates that, among the populations we sampled, the two populations from Borneo are the best candidates for the likely source of these lineages (table 2). This genetic proximity between the Malagasy and Borneo populations reflects the presence of appreciable frequencies of lineages O1b and O2a* in both populations, as well as a relative lack of chromosomes belonging to O3 lineages. The closest single Island Southeast Asian or Oceanic population to the Malagasy is that from Banjarmasin.Table 2Pairwise Genetic Distances (FST) to the Malagasy on the Basis of Y-Haplogroup FrequenciesPopulation LocationDistance (FST)East AfricaaFrom the publication by Luis et al. (2004)..083Banjarmasin.094Kota Kinabalu.102Taiwan.170Majuro.237Philippines.249Vanuatu.276Western Samoa.283Papua New Guinea.313Kapingamarangi.316Cook Islands.386a From the publication by Luis et al. (Luis et al., 2004Luis JR Rowold DJ Regueiro M Caeiro B Cinnioğlu C Roseman C Underhill PA Cavalli-Sforza LL Herrera RJ The Levant versus the Horn of Africa: evidence for bidirectional corridors of human migrations.Am J Hum Genet. 2004; 74: 532-544Abstract Full Text Full Text PDF PubMed Scopus (174) Google Scholar). Open table in a new tab To explore the statistical significance of these observations, we devised a permutation test to assess whether the genetic distance between the Malagasy (A) and one population (B) is significantly smaller than that between the Malagasy and another population (C). In this test, the individual haplotypes observed in populations B and C are pooled and are randomly reassigned 10,000 times into two simulated populations (B′ and C′) with the same sample sizes as B and C. The P value of the difference in genetic distance—FST(A:B)-FST(A:C)—is then calculated as the fraction of simulated population pairs in which the difference in genetic distance between each of the populations and the Malagasy is greater than that observed in the real data—(FST[A:B′]-FST[A:C′])>(FST[A:B]-FST[A:C]). By use of this test, it was observed that there is no significant difference between the two Borneo populations (P=.8374) but that the resultant pooled Borneo population is significantly closer to the Malagasy than any other Island Southeast Asian population (P<.001). The phylogeny of mtDNA variation present in modern humans can be crudely characterized as comprising L lineages, present almost exclusively in Africa, and M and N lineages, present almost exclusively outside of Africa. Thus, classifying mt genomes into these major clades has significant power for discriminating between African and Asian origins. We devised a novel multiplex using the single base–extension method described above to type seven coding-region base substitutions (transitions at positions 15043, 10400, 10398, 15301, 6455, 9824, and 10310) that define the M and N lineages, as well as the R9 sublineage within haplogroup N and the M7 sublineage within haplogroup M (Kivisild et al. Kivisild et al., 2002Kivisild T Tolk HV Parik J Wang Y Papiha SS Bandelt HJ Villems R The emerging limbs and twigs of the East Asian mtDNA tree.Mol Biol Evol. 2002; 19: 1737-1751Crossref PubMed Scopus (326) Google Scholar), both of which are known to be present in Island Southeast Asian populations (primers used in this multiplex assay are detailed in table 1). Among 37 Malagasy mt genomes, we found 23 that belong to M and N lineages and 14 that belong to L lineages (fig. 2 and table 3).Table 3mt Haplotypes Found in Study PopulationsAllele at PositionVariant Sites in HVSIPopulation and HaplotypeFrequency150436455104009824103981530110310Lineage16085–1636216085–16350Mutation Distance to Best MatchWeighted Average Deviation of Best Matches from CoG (miles)Malagasy: 11GCGTGAGL16223, 16265T03,423 21GCGTGAGL16209, 16223, 1631101,551 31GCGTGAGL16182C, 11683C, 16189, 16223, 16278, 16290, 16294, 163090,819 41GCGTGAGL16223, 16278, 1636201,217 59GCGTGAGL16093, 16223, 16278, 1636201,238 61GCGTAAGL16185, 16223, 1632702,349 71ACATGAGM(xM7)16086, 16148, 16223, 16259, 16278, 1631902,800 88ACATGAGM(xM7)16223, 16263, 1631112,200 94ACATGAGM(xM7)16221, 16223, 16291, 1636212,500 101ACATGAGM(xM7)16221, 16223, 16291, 16311, 1636210 113ATACGAGM716223, 16295, 1636201,200 123GCGTAGGN(xR9)16189, 16217, 16247, 1626102,700 131GCGTAGAR916218, 16241, 16255, 16304, 16311240 142GCGTAGAR916220C, 16265, 16298, 163621360Banjarmasin: 1116093, 16319 2116311 3216093, 16311 4116129, 16263 5116129, 16185, 16260, 16298 6116129, 16234, 16290, 16311 7116172, 16173, 16278, 16311 8116093, 16220C, 16223, 16298 9216108, 16111, 16129, 16162, 16172, 16183C, 16189, 16223, 16304 10116111, 16168, 16172, 16183C, 16189, 16311 11116129, 16172, 16223, 16294, 16304 12216136, 16183C, 16189, 16217, 16223 13116182C, 16183C, 16189, 16217, 16223, 16261 14116192, 16223, 16234, 16288, 16304, 16309 15116223, 16249, 16288, 16295, 16304 16116086, 16147, 16183C, 16184A, 16189, 16217, 16223 17116086, 16148, 16259, 16278, 16319 18116093, 16184A, 16278Kota Kinabalu: 1116111, 16129, 16223, 16266, 16304 2116111, 16129, 16235, 16300 3116111, 16168, 16172, 16183C, 16189, 16263, 16286, 16311 4116126, 16129, 16183C, 16189, 16223, 16278 5116126, 16129, 16297 6116129, 16172, 16192A, 16223, 16294, 16304 7116129, 16172, 16223, 16304 8116129, 16209, 16272 9116140, 16182C, 16183C, 16189, 16217, 16223, 16274, 16335 10116140, 16182C, 16183C, 16189, 16223, 16266A 11116140, 16183C, 16189, 16223, 16243, 16294 12116140, 16183C, 16189, 16223, 16266A 13116157, 16223, 16256, 16304, 16311, 16335 14416185, 16291 15116189, 16192, 16294G, 16297 16116220C heteroplasmy 17116220C, 16223, 16258C, 16265, 16298 18216223 19116223, 16304 20216278, 16295 21216291 22216295 23116295, 16346C 24116311 25116093, 16129, 16209, 16272 26116093, 16136, 16295, 16337 27116093, 16148, 16182C, 16183C, 16189 28116093, 16295 291Philippines: 1116129, 16172, 16223, 16304, 16311 2216111, 16129, 16140, 16183C, 16189, 16223, 16234, 16243 3116126, 16223, 16231, 16284, 16311 4216126, 16223, 16231, 16311 5116129, 16172, 16223, 16243, 16304, 16311 6216129, 16172, 16223, 16304, 16311 7116140, 16183C, 16189, 16223, 16243 8116145, 16176, 16223, 16224, 16233, 16311 9116182C, 16183C, 16189, 16217, 16223, 16261, 16293 10116192, 16223, 16278, 16325 11116220C, 16223, 16240, 16265, 16298, 16335 12316220C, 16223, 16265, 16298, 16335 13116269, 16271 14116291, 16311 15116291 16116295 172 18216093, 16182C, 16183C, 16189, 16217, 16223, 16261, 16293Note.—Most variant sites are transitions; transversions are indicated by a letter given after the variant site, which indicates the derived state. Open table in a new tab Note.— Most variant sites are transitions; transversions are indicated by a letter given after the variant site, which indicates the derived state. To further localize the geographical origins of Asian mtDNA lineages found in the Malagasy, we studied the hypervariable segment I (HVSI) sequence of the mt genome, for which a large volume of comparative data is available, we amplified and sequenced HVSI, using primers TTAACTCCACCATTAGCACC and GAGGATGGTGGTCAAGGGAC (Forster et al. Forster et al., 2002aForster L Forster P Lutz-Bonengel S Willkomm H Brinkmann B Natural radioactivity and human mitochondrial DNA mutations.Proc Natl Acad Sci USA. 2002a; 99: 13950-13954Crossref PubMed Scopus (96) Google Scholar) (between positions 16093 and 16362) in mtDNA from these 37 Malagasy individuals, and, by combining these data with the coding SNP haplotypes described above, we defined 14 distinct maternal lineages in the Malagasy (fig. 2). A recently developed method for identifying the likely ancestry of a set of mt sequences is to perform a “center of gravity” (CoG) analysis of individual sequence types observed within a population (Röhl et al. Röhl et al., 2001Röhl A Brinkmann B Forster L Forster P An annotated mtDNA database.Int J Legal Med. 2001; 115: 29-39Crossref PubMed Scopus (54) Google Scholar; Forster et al. Forster et al., 2002bForster P Cali F Röhl A Metspalu E D’Anna R Mirisola M De Leo G Flugy A Salerno A Ayala G Kouvatsi A Villems R Romano V Continental and subcontinental distributions of mtDNA control region types.Int J Legal Med. 2002b; 116: 99-108Crossref PubMed Scopus (37) Google Scholar). In our CoG analysis, the best matches to an HVSI sequence type were identified within a manually curated database of HVSI sequences associated with a precise geographical location. A CoG was then calculated by weighted interpolation of all best-match locations (see fig. 2). The relative lack of published Island Southeast Asian HVSI data could hamper a CoG analysis. To counteract this sampling bias, we added 82 HVSI sequences from Banjarmasin (n=21), Kota Kinabalu (n=36), and the Philippines (n=25) to the analysis. These sequence types are given in table 3. Exact matches within our database of 19,371 HVSI sequences can be found for all six maternal lineages in the Malagasy that appear to be Africa derived. By contrast, exact matches can be found for only three of eight Asia-derived maternal lineages. The CoGs observed in the Malagasy fall within either Island Southeast Asia or sub-Saharan Africa. These CoGs accord exactly with the lineage classifications: all sequence types that belong to L haplogroups are found in Africa, and all sequence types that belong to M and N haplogroups are found in Island Southeast Asia. The relatively broad distribution of the Asian CoGs suggests that the present level of geographical resolution afforded by a CoG analysis is not sufficient to enable us to identify a single likely source population in Island Southeast Asia. It does, however, allow us to exclude the possibility that a Pacific Island population was the sole source of these mt lineages. We calculated Nei’s gene diversity (using the Arlequin software) in HVSI sequences from the Malagasy and compared it with diversity apparent in the three Island Southeast Asian populations described above, as well as in published data on Mozambique (Pereira et al. Pereira et al., 2001Pereira L Macaulay V Torroni A Scozzari R Prata MJ Amorim A Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade.Ann Hum Genet. 2001; 65: 439-458Crossref PubMed Scopus (136) Google Scholar) and Oceanic populations (Hurles et al. Hurles et al., 2003bHurles ME Maund E Nicholson J Bosch E Renfrew C Sykes BC Jobling MA Native American Y chromosomes in Polynesia: the genetic impact of the Polynesian slave trade.Am J Hum Genet. 2003b; 72: 1282-1287Abstract Full Text Full Text PDF PubMed Scopus (34) Google Scholar). The Malagasy appear to have diversity that is significantly lower than that seen in Island Southeast
Referência(s)