mtDNA Data Mining in GenBank Needs Surveying
2009; Elsevier BV; Volume: 85; Issue: 6 Linguagem: Inglês
10.1016/j.ajhg.2009.10.023
ISSN1537-6605
AutoresYong‐Gang Yao, Antonio Salas, Ian Logan, Hans‐Jürgen Bandelt,
Tópico(s)Identification and Quantification in Food
ResumoTo the Editor: Since the first sequencing of the complete human mtDNA genome,1Anderson S. Bankier A.T. Barrell B.G. de Bruijn M.H. Coulson A.R. Drouin J. Eperon I.C. Nierlich D.P. Roe B.A. Sanger F. et al.Sequence and organization of the human mitochondrial genome.Nature. 1981; 290: 457-465Crossref PubMed Scopus (7178) Google Scholar both the sequencing techniques and the quality of commercial kits have improved greatly. This has led to a growing number of reports for complete mtDNA sequences from the fields of molecular anthropology, medical genetics, and forensic science; and there are now over 6700 complete or near-complete mtDNA sequences available for study.2van Oven M. Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.Hum. Mutat. 2009; 30: E386-E394Crossref PubMed Scopus (1181) Google Scholar However, in comparison to the pioneer manual-sequencing efforts in the early nineties, the overall mtDNA data quality, especially in the medical field, is still far from satisfactory.3Bandelt H.-J. Yao Y.-G. Bravi C.M. Salas A. Kivisild T. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.J. Hum. Genet. 2009; 54: 174-181Crossref PubMed Scopus (30) Google Scholar Sequencing errors and inadvertent mistakes in the reported mtDNA data are not infrequent.4Bandelt H.-J. Achilli A. Kong Q.-P. Salas A. Lutz-Bonengel S. Sun C. Zhang Y.-P. Torroni A. Yao Y.-G. Low "penetrance" of phylogenetic knowledge in mitochondrial disease studies.Biochem. Biophys. Res. Commun. 2005; 333: 122-130Crossref PubMed Scopus (64) Google Scholar, 5Bandelt H.-J. Olivieri A. Bravi C. Yao Y.-G. Torroni A. Salas A. 'Distorted' mitochondrial DNA sequences in schizophrenic patients.Eur. J. Hum. Genet. 2007; 15: 400-402Crossref PubMed Scopus (24) Google Scholar, 6Bandelt H.-J. Yao Y.-G. Salas A. Kivisild T. Bravi C.M. High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients.Biochem. Biophys. Res. Commun. 2007; 352: 283-291Crossref PubMed Scopus (39) Google Scholar, 7Salas A. Carracedo Á. Macaulay V. Richards M. Bandelt H.-J. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics.Biochem. Biophys. Res. Commun. 2005; 335: 891-899Crossref PubMed Scopus (119) Google Scholar, 8Salas A. Yao Y.-G. Macaulay V. Vega A. Carracedo Á. Bandelt H.-J. A critical reassessment of the role of mitochondria in tumorigenesis.PLoS Med. 2005; 2: e296Crossref PubMed Scopus (178) Google Scholar, 9Yao Y.-G. Macaulay V. Kivisild T. Zhang Y.-P. Bandelt H.-J. To trust or not to trust an idiosyncratic mitochondrial data set.Am. J. Hum. Genet. 2003; 72: 1341-1346Abstract Full Text Full Text PDF PubMed Scopus (32) Google Scholar, 10Yao Y.-G. Salas A. Bravi C.M. Bandelt H.-J. A reappraisal of complete mtDNA variation in East Asian families with hearing impairment.Hum. Genet. 2006; 119: 505-515Crossref PubMed Scopus (80) Google Scholar Deficient mtDNA data sets of complete genomes can have important consequences for the conclusions achieved in many studies and may also pose problems for any subsequent reanalyses. Most recently, Pereira and colleagues11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar discussed the overall picture of the mtDNA genome diversity in worldwide human populations with a comprehensive reanalysis of 5140 published complete or near-complete (lacking some control region information) mtDNA sequences. Their study represents an important advance in defining the effects of gene structures on limiting mtDNA diversity and may have valuable implications for mtDNA studies in the medical field.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar However, all of the data used in the study by Pereira et al.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar were directly retrieved from GenBank without any scrutiny for problematic or flawed data that should have been excluded. Many of the mtDNA sequences analyzed in their study11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar have in fact already been questioned in the literature or even corrected by their authors, but unfortunately, in several instances the new corrected versions of the sequences have not been made generally available or updated in GenBank. In Table 1, we list some of the problematic data sets and single sequences used by Pereira et al. in their study.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar Among them is the original data set of Herrnstadt et al.,12Herrnstadt C. Elson J.L. Fahy E. Preston G. Turnbull D.M. Anderson C. Ghosh S.S. Olefsky J.M. Beal M.F. Davis R.E. et al.Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.Am. J. Hum. Genet. 2002; 70: 1152-1171Abstract Full Text Full Text PDF PubMed Scopus (433) Google Scholar which was announced by the authors13Herrnstadt C. Preston G. Howell N. Errors, phantoms and otherwise, in human mtDNA sequences.Am. J. Hum. Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar as having been corrected, although the new sequences have never been entered into GenBank. Portions of those coding-region data (in either corrected or uncorrected form) were augmented by the associated control-region data and published in several papers; thus, none of these expanded data can be downloaded from GenBank but have to be retrieved from the figures in the corresponding articles. To cite a more recent example, the African mtDNA data set published by Gonder et al.14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google Scholar is of particularly poor quality. These sequences are incompletely recorded (as already mentioned by Behar et al.15Behar D.M. Villems R. Soodyall H. Blue-Smith J. Pereira L. Metspalu E. Scozzari R. Makkan H. Tzur S. Comas D. et al.The dawn of human matrilineal diversity.Am. J. Hum. Genet. 2008; 82: 1130-1140Abstract Full Text Full Text PDF PubMed Scopus (284) Google Scholar); the most extreme instance of this is the haplogroup L0k1 sequence EF184609 that lacks as many as 25 expected variants scattered along the whole pathway from the haplogroup root to the revised Cambridge reference sequence (rCRS).16Andrews R.M. Kubacka I. Chinnery P.F. Lightowlers R.N. Turnbull D.M. Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA.Nat. Genet. 1999; 23: 147Crossref PubMed Scopus (2374) Google Scholar Also, several different phantom mutations appear throughout the data set; in particular, five sequences have been affected by phantom base changes to G within the short 9949–9978 stretch. We have annotated problems in 14 sequences by way of example, but nearly all sequences of Gonder et al.14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google Scholar may suffer from overlooked variants, except for the three sequences from the well-described West Eurasian haplogroups J1 and N1. Additional details are given in the Supplemental Data, available online.Table 1List of Some Flawed Data and Uncorrected Sequences Employed in the Study by Pereira Et Al.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google ScholarGenBank DataCause of ErrorReferenceErrors Detected or CorrectedDQ156212, DQ156214NUMT contaminationMontiel-Sosa et al.27Montiel-Sosa F. Ruiz-Pesini E. Enríquez J.A. Marcuello A. Díez-Sánchez C. Montoya J. Wallace D.C. López-Pérez M.J. Differences of sperm motility in mitochondrial DNA haplogroup U sublineages.Gene. 2006; 368: 21-27Crossref PubMed Scopus (86) Google ScholarYao et al.28Yao Y.-G. Kong Q.-P. Salas A. Bandelt H.-J. Pseudomitochondrial genome haunts disease studies.J. Med. Genet. 2008; 45: 769-772Crossref PubMed Scopus (89) Google ScholarDQ112878NUMT contaminationKivisild et al.29Kivisild T. Shen P. Wall D.P. Do B. Sung R. Davis K. Passarino G. Underhill P.A. Scharfe C. Torroni A. et al.The role of selection in the evolution of human mitochondrial genomes.Genetics. 2006; 172: 373-387Crossref PubMed Scopus (348) Google ScholarYao et al.28Yao Y.-G. Kong Q.-P. Salas A. Bandelt H.-J. Pseudomitochondrial genome haunts disease studies.J. Med. Genet. 2008; 45: 769-772Crossref PubMed Scopus (89) Google ScholarDQ112952Missed mutationKivisild et al.29Kivisild T. Shen P. Wall D.P. Do B. Sung R. Davis K. Passarino G. Underhill P.A. Scharfe C. Torroni A. et al.The role of selection in the evolution of human mitochondrial genomes.Genetics. 2006; 172: 373-387Crossref PubMed Scopus (348) Google Scholarthis studyDQ341068.1Artefactual recombinationTorroni et al.30Torroni A. Achilli A. Macaulay V. Richards M. Bandelt H.-J. Harvesting the fruit of the human mtDNA tree.Trends Genet. 2006; 22: 339-345Abstract Full Text Full Text PDF PubMed Scopus (333) Google ScholarBehar et al.;15Behar D.M. Villems R. Soodyall H. Blue-Smith J. Pereira L. Metspalu E. Scozzari R. Makkan H. Tzur S. Comas D. et al.The dawn of human matrilineal diversity.Am. J. Hum. Genet. 2008; 82: 1130-1140Abstract Full Text Full Text PDF PubMed Scopus (284) Google Scholar DQ341068.2 (updated May 5, 2009)AP008259, AP008269, AP008278, AP008306, AP008552, AP008776, AP008777, AP008798, AP008799, AP008801, AP008803Artefactual recombinationTanaka et al.23Tanaka M. Cabrera V.M. González A.M. Larruga J.M. Takeyasu T. Fuku N. Guo L.J. Hirose R. Fujita Y. Kurata M. et al.Mitochondrial genome variation in eastern Asia and the peopling of Japan.Genome Res. 2004; 14: 1832-1850Crossref PubMed Scopus (397) Google ScholarKong et al.21Kong Q.-P. Salas A. Sun C. Fuku N. Tanaka M. Zhong L. Wang C.-Y. Yao Y.-G. Bandelt H.-J. Distilling artificial recombinants from large sets of complete mtDNA genomes.PLoS ONE. 2008; 3: e3016Crossref PubMed Scopus (41) Google ScholarVariousMissed mutationsMaca-Meyer et al.31Maca-Meyer N. González A.M. Larruga J.M. Flores C. Cabrera V.M. Major genomic mitochondrial lineages delineate early human expansions.BMC Genet. 2001; 2: 13Crossref PubMed Scopus (252) Google ScholarPalanichamy et al.32Palanichamy M.G. Sun C. Agrawal S. Bandelt H.-J. Kong Q.-P. Khan F. Wang C.-Y. Chaudhuri T.K. Palla V. Zhang Y.-P. Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia.Am. J. Hum. Genet. 2004; 75: 966-978Abstract Full Text Full Text PDF PubMed Scopus (277) Google ScholarVariousPhantom mutations and documentation errorsHerrnstadt et al.12Herrnstadt C. Elson J.L. Fahy E. Preston G. Turnbull D.M. Anderson C. Ghosh S.S. Olefsky J.M. Beal M.F. Davis R.E. et al.Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.Am. J. Hum. Genet. 2002; 70: 1152-1171Abstract Full Text Full Text PDF PubMed Scopus (433) Google ScholarHerrnstadt et al.;13Herrnstadt C. Preston G. Howell N. Errors, phantoms and otherwise, in human mtDNA sequences.Am. J. Hum. Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar Bandelt et al.19Bandelt H.-J. Kong Q.-P. Richards M. Macaulay V. Estimation of mutation rates and coalescence times: some caveats.in: Bandelt H.-J. Macaulay V. Richards M. Human Mitochondrial DNA and the Evolution of Homo sapiens. Springer-Verlag, Berlin, Heidelberg2006: 47-90Crossref Google ScholarVariousMissed mutationsRajkumar et al.33Rajkumar R. Banerjee J. Gunturi H.B. Trivedi R. Kashyap V.K. Phylogeny and antiquity of M macrohaplogroup inferred from complete mt DNA sequence of Indian specific lineages.BMC Evol. Biol. 2005; 5: 26Crossref PubMed Scopus (51) Google ScholarSun et al.34Sun C. Kong Q.-P. Palanichamy M.G. Agrawal S. Bandelt H.-J. Yao Y.-G. Khan F. Zhu C.-L. Chaudhuri T.K. Zhang Y.-P. The dazzling array of basal branches in the mtDNA macrohaplogroup M from India as inferred from complete genomes.Mol. Biol. Evol. 2006; 23: 683-690Crossref PubMed Scopus (127) Google ScholarVariousVariousGonder et al.14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google ScholarBehar et al.;15Behar D.M. Villems R. Soodyall H. Blue-Smith J. Pereira L. Metspalu E. Scozzari R. Makkan H. Tzur S. Comas D. et al.The dawn of human matrilineal diversity.Am. J. Hum. Genet. 2008; 82: 1130-1140Abstract Full Text Full Text PDF PubMed Scopus (284) Google Scholar this studyAY963586.1Editing error in GenBank submissionBandelt et al.4Bandelt H.-J. Achilli A. Kong Q.-P. Salas A. Lutz-Bonengel S. Sun C. Zhang Y.-P. Torroni A. Yao Y.-G. Low "penetrance" of phylogenetic knowledge in mitochondrial disease studies.Biochem. Biophys. Res. Commun. 2005; 333: 122-130Crossref PubMed Scopus (64) Google ScholarAY963586.3 (updated June 29, 2009)EF660912–EF661013Phantom mutations and missed mutationsGasparre et al.17Gasparre G. Porcelli A.M. Bonora E. Pennisi L.F. Toller M. Iommarini L. Ghelli A. Moretti M. Betts C.M. Martinelli G.N. et al.Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors.Proc. Natl. Acad. Sci. USA. 2007; 104: 9001-9006Crossref PubMed Scopus (193) Google ScholarThis studyAM260596–AM260597Missed mutationsAnnunen-Rasila et al.35Annunen-Rasila J. Finnilä S. Mykkänen K. Pöyhönen J.S. Veijola J. Poyhonen M. Viitanen M. Kalimo H. Majamaa K. Mitochondrial DNA sequence variation and mutation rate in patients with CADASIL.Neurogenetics. 2006; 7: 185-194Crossref PubMed Scopus (18) Google ScholarThis studyAY289073Missed mutations and recombinationIngman and Gyllensten36Ingman M. Gyllensten U. Mitochondrial genome variation and evolutionary history of Australian and New Guinean aborigines.Genome Res. 2003; 13: 1600-1606Crossref PubMed Scopus (129) Google ScholarThis studyAY195745, AY195756, AY195767, AY195775Phantom mutations and missed mutationsMishmar et al.37Mishmar D. Ruiz-Pesini E. Golik P. Macaulay V. Clark A.G. Hosseini S. Brandon M. Easley K. Chen E. Brown M.D. et al.Natural selection shaped regional mtDNA variation in humans.Proc. Natl. Acad. Sci. USA. 2003; 100: 171-176Crossref PubMed Scopus (739) Google ScholarBrandstätter et al.;38Brandstätter A. Sänger T. Lutz-Bonengel S. Parson W. Béraud-Colomb E. Wen B. Kong Q.-P. Bravi C.M. Bandelt H.-J. Phantom mutation hotspots in human mitochondrial DNA.Electrophoresis. 2005; 26: 3414-3429Crossref PubMed Scopus (65) Google Scholar this studyEU095205, EU095208, EU095250Phantom mutations and missed mutationsFagundes et al.39Fagundes N.J. Kanitz R. Eckert R. Valls A.C. Bogo M.R. Salzano F.M. Smith D.G. Silva Jr., W.A. Zago M.A. Ribeiro-dos-Santos A.K. et al.Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas.Am. J. Hum. Genet. 2008; 82: 583-592Abstract Full Text Full Text PDF PubMed Scopus (247) Google ScholarPerego et al.;40Perego U.A. Achilli A. Angerhofer N. Accetturo M. Pala M. Olivieri A. Kashani B.H. Ritchie K.H. Scozzari R. Kong Q.-P. et al.Distinctive Paleo-Indian migration routes from Beringia marked by two rare mtDNA haplogroups.Curr. Biol. 2009; 19: 1-8Abstract Full Text Full Text PDF PubMed Scopus (224) Google Scholar this studyAY339437, AY339463.2, AY339546, AY339549,AY339581.2, AY339582Phantom mutations and missed mutationsFinnilä et al.41Finnilä S. Lehtonen M.S. Majamaa K. Phylogenetic network for European mtDNA.Am. J. Hum. Genet. 2001; 68: 1475-1484Abstract Full Text Full Text PDF PubMed Scopus (285) Google ScholarThis studyAF46968, AF346973, AF347006Missed mutations, phantom mutations, and recombinationIngman et al.42Ingman M. Kaessmann H. Pääbo S. Gyllensten U. Mitochondrial genome variation and the origin of modern humans.Nature. 2000; 408: 708-713Crossref PubMed Scopus (984) Google ScholarKong et al.;21Kong Q.-P. Salas A. Sun C. Fuku N. Tanaka M. Zhong L. Wang C.-Y. Yao Y.-G. Bandelt H.-J. Distilling artificial recombinants from large sets of complete mtDNA genomes.PLoS ONE. 2008; 3: e3016Crossref PubMed Scopus (41) Google Scholar this studyVariousPhantom indels and missed mutationsKumar et al.43Kumar S. Padmanabham P.B. Ravuri R.R. Uttaravalli K. Koneru P. Mukherjee P.A. Das B. Kotal M. Xaviour D. Saheb S.Y. et al.The earliest settlers' antiquity and evolutionary history of Indian populations: evidence from M2 mtDNA lineage.BMC Evol. Biol. 2008; 8: 230Crossref PubMed Scopus (27) Google ScholarThis studyEU597580Missed mutationHartmann et al.44Hartmann A. Thieme M. Nanduri L.K. Stempfl T. Moehle C. Kivisild T. Oefner P.J. Validation of microarray-based resequencing of 93 worldwide mitochondrial genomes.Hum. Mutat. 2009; 30: 115-122Crossref PubMed Scopus (62) Google ScholarThis studyDQ826448, DQ834253-DQ834261VariousPhan et al. (unpubl. data)aUnpublished data were released by GenBank, and detailed annotation of the potential errors is given in the Supplemental Data.This studyDQ418488, DQ437577, DQ462232–DQ462234, DQ519035VariousThe State Key Laboratory of Forensic Sciences (unpubl. data) aUnpublished data were released by GenBank, and detailed annotation of the potential errors is given in the Supplemental Data.This studyDQ358973–DQ358977Documentation errors (position 750)Detjen et al. (unpubl. data)aUnpublished data were released by GenBank, and detailed annotation of the potential errors is given in the Supplemental Data.This studyEF446784, EF488201Poor sequencing quality (artefactual heteroplasmy)Noer et al. (unpubl. data)aUnpublished data were released by GenBank, and detailed annotation of the potential errors is given in the Supplemental Data.This studya Unpublished data were released by GenBank, and detailed annotation of the potential errors is given in the Supplemental Data. Open table in a new tab Again, if one examines the ten Vietnamese complete mtDNA sequences that were submitted to GenBank by Phan et al. and used in the Pereira et al. study,11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar it is possible to see errors of many kinds. First, all sequences miss three expected variants (A263G, 315+C [or written as 315insC], and C14766T). Second, there are many phantom mutations that are not observed elsewhere. Third, several sequences are incomplete; e.g., the haplogroup M7b1 sequence DQ826448 lacks an additional nine expected variants by oversight or artefactual recombination. This sequence also has a base-shift error and harbors six suspicious transversions. Finally, the haplogroup N9a sequence (DQ834258) has a problem with artefactual recombination. Detailed annotations for these Vietnamese mitochondrial genomes and a few more GenBank complete mtDNA sequences with similar problems are listed in the Supplemental Data. It is likely that most conclusions in the Pereira et al. study11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar would essentially remain unaltered after the flawed data sets or single problematic sequences were filtered out. Nonetheless, the results reported in their tables would benefit from a reanalysis using an improved version of the complete genome database. It depends on the particular aspect under study as to whether a small residue of errors would matter or not. A good example of where it would cause problems is with the estimation of the transition:transversion ratio, because transversions are relatively rare and flawed data are often enriched in transversions (see phantom mutations in the Supplemental Data). The number of artefactual transversions from some of the data sets does appear to be raised, in particular in the sequences from Gasparre et al.17Gasparre G. Porcelli A.M. Bonora E. Pennisi L.F. Toller M. Iommarini L. Ghelli A. Moretti M. Betts C.M. Martinelli G.N. et al.Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors.Proc. Natl. Acad. Sci. USA. 2007; 104: 9001-9006Crossref PubMed Scopus (193) Google Scholar (Table 1 and Supplemental Data). In addition, misalignment of seven sequences (DQ341085–DQ341090 and EU600343) in the Pereira et al. study11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar has produced at least another 21 artefactual transversions at positions 292, 296–299, 300, 302, and 303. Similarly, the insertion 5436insG in DQ246818 has been shifted by four base pairs and scored as C5437G 5440insC, so that a transversion is created artificially. Suboptimal alignment induced further artificial transversions: e.g., the two sequences AY922293 and AY922275 are identical in the 54–60 region (GTTATT versus GTATTTT in the rCRS) and yet the former was interpreted as 55insT-59delTT and the latter as 56T-57A-60delT in that region by Pereira et al.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar Inconsistent alignment is also seen in the long C stretch in regions 16184–16193 and 303–315 in the Pereira et al. study.11Pereira L. Freitas F. Fernandes V. Pereira J.B. Costa M.D. Costa S. Máximo V. Macaulay V. Rocha R. Samuels D.C. The diversity present in 5140 human mitochondrial genomes.Am. J. Hum. Genet. 2009; 84: 628-640Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar Another instance in which a small amount of error could have a significant influence involves the estimation of the positional rate spectrum along the molecule. For instance, the change C12705T (characteristic of non-R status) is a rare mutation but was overlooked by Gonder et al.14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google Scholar half a dozen of times, and the mutation T10810C (characteristic of non-L2′6 status) was overlooked an additional eight times.14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google Scholar Similarly, the estimated rate of any mutation scored between the roots of frequent haplogroups in the mtDNA phylogeny gets inflated by the use of incomplete or recombinant sequences. Thus, the incorporation of flawed data considerably distorts the estimation of rates for a number of positions. The same effect may be caused by systematic documentation errors, as in the case of the 14766 transition, which has often been misrecorded because of the discrepancy at 14766 between rCRS and a partly corrected CRS (which was in use for a long time).3Bandelt H.-J. Yao Y.-G. Bravi C.M. Salas A. Kivisild T. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.J. Hum. Genet. 2009; 54: 174-181Crossref PubMed Scopus (30) Google Scholar, 10Yao Y.-G. Salas A. Bravi C.M. Bandelt H.-J. A reappraisal of complete mtDNA variation in East Asian families with hearing impairment.Hum. Genet. 2006; 119: 505-515Crossref PubMed Scopus (80) Google Scholar Moreover, for parts of the mtDNA phylogeny in which numerous mutations are missed in the data used, estimation of haplogroup coalescent times becomes distorted. The consequences of using wrong data can be dramatic under particular circumstances, as we have discussed before.3Bandelt H.-J. Yao Y.-G. Bravi C.M. Salas A. Kivisild T. Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.J. Hum. Genet. 2009; 54: 174-181Crossref PubMed Scopus (30) Google Scholar, 4Bandelt H.-J. Achilli A. Kong Q.-P. Salas A. Lutz-Bonengel S. Sun C. Zhang Y.-P. Torroni A. Yao Y.-G. Low "penetrance" of phylogenetic knowledge in mitochondrial disease studies.Biochem. Biophys. Res. Commun. 2005; 333: 122-130Crossref PubMed Scopus (64) Google Scholar, 5Bandelt H.-J. Olivieri A. Bravi C. Yao Y.-G. Torroni A. Salas A. 'Distorted' mitochondrial DNA sequences in schizophrenic patients.Eur. J. Hum. Genet. 2007; 15: 400-402Crossref PubMed Scopus (24) Google Scholar, 6Bandelt H.-J. Yao Y.-G. Salas A. Kivisild T. Bravi C.M. High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients.Biochem. Biophys. Res. Commun. 2007; 352: 283-291Crossref PubMed Scopus (39) Google Scholar, 7Salas A. Carracedo Á. Macaulay V. Richards M. Bandelt H.-J. A practical guide to mitochondrial DNA error prevention in clinical, forensic, and population genetics.Biochem. Biophys. Res. Commun. 2005; 335: 891-899Crossref PubMed Scopus (119) Google Scholar, 8Salas A. Yao Y.-G. Macaulay V. Vega A. Carracedo Á. Bandelt H.-J. A critical reassessment of the role of mitochondria in tumorigenesis.PLoS Med. 2005; 2: e296Crossref PubMed Scopus (178) Google Scholar, 9Yao Y.-G. Macaulay V. Kivisild T. Zhang Y.-P. Bandelt H.-J. To trust or not to trust an idiosyncratic mitochondrial data set.Am. J. Hum. Genet. 2003; 72: 1341-1346Abstract Full Text Full Text PDF PubMed Scopus (32) Google Scholar, 10Yao Y.-G. Salas A. Bravi C.M. Bandelt H.-J. A reappraisal of complete mtDNA variation in East Asian families with hearing impairment.Hum. Genet. 2006; 119: 505-515Crossref PubMed Scopus (80) Google Scholar, 18Bandelt H.-J. Kong Q.-P. Parson W. Salas A. More evidence for non-maternal inheritance of mitochondrial DNA?.J. Med. Genet. 2005; 42: 957-960Crossref PubMed Scopus (59) Google Scholar, 19Bandelt H.-J. Kong Q.-P. Richards M. Macaulay V. Estimation of mutation rates and coalescence times: some caveats.in: Bandelt H.-J. Macaulay V. Richards M. Human Mitochondrial DNA and the Evolution of Homo sapiens. Springer-Verlag, Berlin, Heidelberg2006: 47-90Crossref Google Scholar, 20Bandelt H.-J. Salas A. Contamination and sample mix-up can best explain some patterns of mtDNA instabilities in buccal cells and oral squamous cell carcinoma.BMC Cancer. 2009; 9: 113Crossref PubMed Scopus (39) Google Scholar, 21Kong Q.-P. Salas A. Sun C. Fuku N. Tanaka M. Zhong L. Wang C.-Y. Yao Y.-G. Bandelt H.-J. Distilling artificial recombinants from large sets of complete mtDNA genomes.PLoS ONE. 2008; 3: e3016Crossref PubMed Scopus (41) Google Scholar Fortunately, the standard and quality of sequencing from the large laboratories (with long-standing experience) has improved over the years, and the results from these laboratories are now setting the standard against which all smaller institutions should compare themselves. This does not preclude the possibility that single sequences from data sets released by large laboratories may have minor problems. Bioinformatics-based projects are more and more popular, drawing conclusions from whatever can be retrieved from GenBank (e.g., Gonder et al.'s data14Gonder M.K. Mortensen H.M. Reed F.A. de Sousa A. Tishkoff S.A. Whole-mtDNA genome sequence analysis of ancient African lineages.Mol. Biol. Evol. 2007; 24: 757-768Crossref PubMed Scopus (190) Google Scholar were also employed by Atkinson et al.22Atkinson Q.D. Gray R.D. Drummond A.J. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory.Mol. Biol. Evol. 2008; 25: 468-474Crossref PubMed Scopus (186) Google Scholar). However, the common practice of mining mtDNA data from GenBank or other genomic resources should be carried out with the necessary caution in order to avoid erroneous claims in future studies. For instance, one could foresee that the use of the original incorrect sequences by Tanaka et al.23Tanaka M. Cabrera V.M. González A.M. Larruga J.M. Takeyasu T. Fuku N. Guo L.J. Hirose R. Fujita Y. Kurata M. et al.Mitochondrial genome variation in eastern Asia and the peopling of Japan.Genome Res. 2004; 14: 1832-1850Crossref PubMed Scopus (397) Google Scholar would easily lead to erroneous signals of mtDNA recombination.21Kong Q.-P. Salas A. Sun C. Fuku N. Tanaka M. Zhong L. Wang C.-Y. Yao Y.-G. Bandelt H.-J. Distilling artificial recombinants from large sets of complete mtDNA genomes.PLoS ONE. 2008; 3: e3016Crossref PubMed Scopus (41) Google Scholar To eliminate errors in the published mtDNA data or at least to exclude the suspicious GenBank entries from any subsequent reanalyses, we call for a stringent scrutiny of reported data and a bookkeeping annotation of errors in the public databases, such as in Phylotree.org (maintained by Mannis van Oven)2van Oven M. Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.Hum. Mutat. 2009; 30: E386-E394Crossref PubMed Scopus (1181) Google Scholar and some personally owned websites (e.g. Ian Logan's website). For the benefit of science, submissions to GenBank should be revised as promptly as possible by the authors responsible for the data in question. And, importantly, when submitting a new paper for publication, authors should provide evidence that their data has been checked for the more common errors that come from poor sequencing technique and data handling, as well as for discrepancies between the actual submissions to GenBank and what has been shown or inferred in the paper. But instances will remain in which authors either do not react or claim that they did everything right (as in the prominent case analyzed by Bandelt and Kivisild24Bandelt H.-J. Kivisild T. Quality assessment of DNA sequence data: autopsy of a mis-sequenced mtDNA population sample.Ann. Hum. Genet. 2006; 70: 314-326Crossref PubMed Scopus (21) Google Scholar and Parson25Parson W. The art of reading sequence electropherograms.Ann. Hum. Genet. 2007; 71: 276-278Crossref PubMed Scopus (11) Google Scholar). Therefore, when one plans to perform a cumulative reanalysis of mtDNA data, one cannot avoid making a substantiated, though partly subjective, decision as to which data are to be included and which are to be excluded, as has been exemplified in a recent paper by Soares et al.26Soares P. Ermini L. Thomson N. Mormina M. Rito T. Röhl A. Salas A. Oppenheimer S. Macaulay V. Richards M.B. Correcting for purifying selection: an improved human mitochondrial molecular clock.Am. J. Hum. Genet. 2009; 84: 740-759Abstract Full Text Full Text PDF PubMed Scopus (504) Google Scholar This work was supported by Yunnan Province () and the Chinese Academy of Sciences (), as well as from grants from National Natural Science Foundation of China (30925021), the Ministerio de Ciencia e Innovación (SAF2008-02971), and Fundación de Investigación Médica Mutua Madrileña (2008/CL444). We thank two anonymous reviewers for their helpful comments on the early version of the manuscript. Download .pdf (.2 MB) Help with pdf files Document S1. One Appendix The URLs for data presented herein are as follows:GenBank, http://www.ncbi.nlm.nih.gov/Genbank/Ian Logan's website, http://www.ianlogan.co.ukPhyloTree.org, http://www.phylotree.org/ Response to Yao et al.Pereira et al.The American Journal of Human GeneticsDecember 11, 2009In BriefTo the Editor: We are also concerned about errors in GenBank sequences, and that is why we took precautions to evaluate the effects of potential sequence errors.1 But many of the potential errors reported by Yao et al. are highly subjective. They defined "phantom mutations" as (with exceptions) the exclusive presence of rare transversions in a specific data set. Although it is reasonable to be skeptical of such variations, surely such rare variations do actually occur without being errors. To deal with potential sequence errors, we took the step of doing the analysis twice; once for all reported variations and once for only variations present in more than 0.1% of the sequences. Full-Text PDF Open Archive
Referência(s)