Erroneous Claims about the Impact of Mitochondrial DNA Sequence Database Errors
2003; Elsevier BV; Volume: 73; Issue: 4 Linguagem: Inglês
10.1086/378780
ISSN1537-6605
AutoresAgnar Helgason, Kāri Stefánsson,
Tópico(s)Metabolomics and Mass Spectrometry Studies
ResumoTo the Editor: In a recent letter to the Journal, Herrnstadt et al. (Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar, p. 1,585) assert that sequence errors in an mtDNA study of Icelanders contributed to an “erroneous conclusion about the genetic diversity of these people.” Herrnstadt et al. (Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar) do not explicitly refer to the study in question, nor do they provide any evidence to support their allegation. Rather, they cite a recent article by Arnason (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) that states that errors in sequences obtained from a database had a material effect on the results and, hence, on conclusions of an article published in the Journal by Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar). We demonstrate here, contrary to this claim and its reiteration by Forster (Forster, 2003Forster P To err is human.Ann Hum Genet. 2003; 67: 2-4Crossref PubMed Scopus (77) Google Scholar) and Herrnstadt et al. (Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar), that sequence errors have no impact on the conclusions of Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) about the genetic diversity of Icelanders. Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) analyzed a total of 4,064 mtDNA control-region sequences: 2,969 from hypervariable segment 1 (HVS1) in 26 populations and 1,095 from hypervariable segment 2 (HVS2) in 10 populations. A subset of these sequences was obtained from HVRBASE (Handt et al. Handt et al., 1998Handt O Meyer S von Haeseler A Compilation of human mtDNA control region sequences.Nucleic Acids Res. 1998; 26: 126-129Crossref PubMed Scopus (67) Google Scholar), among which were 140 HVS1 sequences from Denmark and Germany (Richards et al. Richards et al., 1996Richards M Côrte-Real H Forster P Macaulay V Wilkinson-Herbots H Demaine A Papiha S Hedges R Bandelt H-J Sykes B Paleolithic and neolithic lineages in the European mitochondrial gene pool.Am J Hum Genet. 1996; 59 (erratum 59:747): 185-203PubMed Google Scholar). Arnason (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) points out that some of these latter sequences were incorrectly recorded in HVRBASE. After correcting the erroneous sequences, Arnason recalculates genetic diversity statistics presented in table 1 of Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) and concludes that “[t]he estimation of statistics and relative rank of countries with respect to diversity and effective population size is materially affected by the errors” (Arnason Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar, p. 9). He moreover concludes that “[c]laims about a special genetic homogeneity of Icelanders relative to European populations would be suspect to the extent that they depended on anomalous data instead of the primary data” (Arnason Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar, p. 14). In an accompanying editorial comment, Forster (Forster, 2003Forster P To err is human.Ann Hum Genet. 2003; 67: 2-4Crossref PubMed Scopus (77) Google Scholar, p. 2) elaborates: “Arnason demonstrates [that HVRBASE] is riddled with copying errors in the case of the Danes and Germans, resulting in a qualitatively different ranking of Icelandic genetic diversity” (emphasis added). Our comparison of the German and Danish sequences, stored in HVRBASE, with the original sequences submitted to GenBank has revealed that 14 of 33 Danish and 29 of 107 German HVS1 sequences contained database transcription errors. The impact of these errors on the summary statistics (gene diversity and mean pairwise differences, θk and θS, respectively) calculated by Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) for Icelanders, Danes, and Germans can be evaluated through a comparison with Arnason’s (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) recalculations (see [table 1).Table 1Rank of Icelanders, Danes, and Germans for HVS1 Summary Statistics among 26 European PopulationsResults ofHelgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar)Arnason (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar)RankRankPopulationNkGene DiversityPairwise DifferenceθkθSNkGene DiversityPairwise DifferenceθkθSIcelanders4471259814135201287–991311Danes3125521324331923242525Germans4182196142142321710–152121Note.—The population with the highest value has a rank of 1, and the population with the lowest value has a rank of 26. N is sample size, and k is the number of distinct haplotypes. θk and θS are population-mutation rate parameters that are based on the number of haplotypes and the number of variable sizes, respectively. Open table in a new tab Note.— The population with the highest value has a rank of 1, and the population with the lowest value has a rank of 26. N is sample size, and k is the number of distinct haplotypes. θk and θS are population-mutation rate parameters that are based on the number of haplotypes and the number of variable sizes, respectively. Table 1 shows that the rank of the Danish sample changes for all summary statistics, a finding that is not surprising, given that almost half of the Danish sequences contained errors. In the case of the Germans, we observe noticeable changes with respect to gene diversity and mean pairwise differences but no changes in rank for θk and θS. What is most important in the present context is that table 1 shows that the erroneous Danish and German sequences have almost no effect on the relative position of Icelanders among the 26 European populations for all four summary statistics. This clearly contradicts the aforementioned claims, made by Arnason (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) and repeated by Forster (Forster, 2003Forster P To err is human.Ann Hum Genet. 2003; 67: 2-4Crossref PubMed Scopus (77) Google Scholar) and Herrnstadt et al. (Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar), that sequence errors in the study by Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) have an impact on conclusions about the relative genetic diversity of the Icelanders. Arnason’s (Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) conclusion that Icelanders are among the most genetically heterogeneous populations in Europe was, in fact, entirely based on his choice to interpret gene diversity and mean pairwise differences as the most informative statistics for detecting differences between the mtDNA heterogeneity of Icelanders and other European populations. However, as Helgason et al. (Helgason et al., 2003Helgason A Nicholson G Stefánsson K Donnelly P A reassessment of genetic diversity in Icelanders: strong evidence from multiple loci for relative homogeneity caused by genetic drift.Ann Hum Genet. 2003; 67: 281-297Crossref PubMed Scopus (72) Google Scholar) have shown by means of simulation, such pairwise statistics are largely insensitive to the impact of different effective population sizes during periods such as the past 1,100 years (the time since the settlement of Iceland). Population genetic simulations demonstrate that these pairwise statistics show little change on average but considerable increase in variance as effective population size decreases (Helgason et al. Helgason et al., 2003Helgason A Nicholson G Stefánsson K Donnelly P A reassessment of genetic diversity in Icelanders: strong evidence from multiple loci for relative homogeneity caused by genetic drift.Ann Hum Genet. 2003; 67: 281-297Crossref PubMed Scopus (72) Google Scholar). Hence, it is not surprising to observe increased values of gene diversity and mean pairwise differences in populations experiencing high levels of drift due to small effective size. In contrast, Helgason et al. (Helgason et al., 2003Helgason A Nicholson G Stefánsson K Donnelly P A reassessment of genetic diversity in Icelanders: strong evidence from multiple loci for relative homogeneity caused by genetic drift.Ann Hum Genet. 2003; 67: 281-297Crossref PubMed Scopus (72) Google Scholar) show that summary statistics that are based on the number of haplotypes or the number of segregating sequence sites (such as θk and θS) are highly sensitive to differences in effective population size during recent population history. On the basis of simulations and analyses of autosomal, Y-chromosome, and mtDNA data sets, the results of Helgason et al. (Helgason et al., 2003Helgason A Nicholson G Stefánsson K Donnelly P A reassessment of genetic diversity in Icelanders: strong evidence from multiple loci for relative homogeneity caused by genetic drift.Ann Hum Genet. 2003; 67: 281-297Crossref PubMed Scopus (72) Google Scholar) strongly indicate that Icelanders are among the most genetically homogeneous European populations and that this homogeneity is the result of a relatively small effective population size. We agree with Herrnstadt et al. (Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar) that more effort should be given to quality control in the sequencing process and database construction. Indeed, as an additional quality-control measure in the Helgason et al. (Helgason et al., 2000Helgason A Sigurðardóttir S Gulcher JR Ward R Stefánsson K mtDNA and the origin of the Icelanders: deciphering signals of recent population history.Am J Hum Genet. 2000; 66: 999-1016Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar) study, we blindly sequenced and proofread control region sequences from maternal relatives for >25% of the 402 Icelanders included in the study. Consequently, our Icelandic mtDNA sequences are likely to be less affected by errors than most data sets in the literature. However, in today’s world of large-scale sequencing projects, errors, whether introduced by the sequencing process itself or by transcription into databases, are likely to be an inescapable fact of life. Bearing this in mind, it is important and productive to carefully assess the impact of sequence errors on scientific results and conclusions. It is less productive to make (Arnason Arnason, 2003Arnason E Genetic heterogeneity of Icelanders.Ann Hum Genet. 2003; 67: 5-16Crossref PubMed Scopus (34) Google Scholar) or uncritically repeat (Forster Forster, 2003Forster P To err is human.Ann Hum Genet. 2003; 67: 2-4Crossref PubMed Scopus (77) Google Scholar; Herrnstadt et al. Herrnstadt et al., 2003Herrnstadt C Preston G Howell N Errors, phantom and otherwise, in human mtDNA sequences.Am J Hum Genet. 2003; 72: 1585-1586Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar) incorrect claims about the impact of such sequence errors.
Referência(s)