Surnames as neutral alleles: observations in Sardinia.

1983; National Institutes of Health; Volume: 55; Issue: 2 Linguagem: Inglês

Autores

Zei G, Guglielmino Cr, E Siri, Antonio Moroni, Cavalli-Sforza Ll,

Tópico(s)

Fractal and DNA sequence analysis

Resumo

Surnames can be considered as alleles of a locus which are usually transmitted patrilineally. The great abundance of surnames makes them very useful for evaluating kinship between populations, even if such kinship estimates are of limited time depth. A set of data from the island of Sardinia shows very good agreement with the KarlinMcGregor distribution of neutral and also with Fisher's logarithmic distribution. The latter can be derived from the former; it is practically indistinguishable from it if N is large. It is easier to compute and the only parameter to estimate, v, can be obtained by a simple formula, v measures mutation plus immigration. Of the 11 dioceses, eight showed a very good fit and three showed an excess of surnames in the class of surnames represented only once. This is most probably due to the recent incorporation of the unique surnames in industrialized areas, as the correlations with statistics of development show. The logarithms of surname kinship (isonymy) show a nonlinear decrease with geographic distance between dioceses, at least in part due to heterogeneity of the slopes for surnames of different frequency. Tree analysis and principal components of isonymy are in good agreement; they show a major north-south differentiation, corresponding to the major axis of the island, and in agreement with data on genetic differentiation. East-west divergence is somewhat less important, as shown by the smaller amount of variation associated with the second principal component, and the third principal component is correlated with degree of industrial development. Methods of study of surnames can gain substantially by the recognition that they usually have patrilineal transmission, but otherwise behave like a haploid genetic trait. At least for most surnames there is not likely to be any serious bias in reproduction or mortality, so that they can be considered as selectively neutral. If this is correct, surnames can be analyzed with the methods derived for the study of neutral alleles in genetics. Thus, formally, every surname can be considered as an allele of a genetic locus; the greater the number of alleles, the greater the information of the locus, and therefore surnames can be, in principle, extremely informative on origins, admixture, and migration patterns of people. In comparison with true genetic loci, of course, they are limited by the fact that they inform us only on male migration. The time depth of information on origins of people is determined by the time of origin of the markers instituto di Genetica Biochimica ed Evoluzionistica, C.N.R., Pavia, Italy 2Instituto di Ecologia, Parma, Italy 3Department of Genetics, Stanford University, Stanford, CA 94305 Human Biology, May 1983, Voi. 55, No. 2, pp. 357-365. © Wayne State University Press, 1983 This content downloaded from 157.55.39.253 on Wed, 08 Jun 2016 05:13:38 UTC All use subject to http://about.jstor.org/terms 358 G. Zei, С. Я. Guglielmino, E. Siri, A. Moroni and L. L. Cavalli-Sforza themselves, which in the case of genes is usually of thousands and tens of thousands of years, but in the case of surnames is rarely earlier than the late Middle Ages, at least in Europeans. In the present paper we analyzed a set of data obtained in an unpublished census of consanguineous marriages in Sardinia. The surnames studied belong to the husbands in consanguineous marriages; because of the usually high isonymy of consanguineous mates, wives were excluded. Consanguineous marriages (N = 48,470), mostly between first or second cousins, were used. Dispensations were requested from bishops in all 11 dioceses of Sardinia (Figure 1) for the years 1930-1959. To the extent that consanguineous matings are not a random sample of the population, this selection may impose some bias on our results. However, tests in another part of Italy where both consanguineous and total matings were available showed no important discrepancies in surname distributions, so that our results can be considered as reasonably representative of all surnames. We first studied the fit of theoretical distributions of neutral alleles to the observed frequency distributions of surnames represented in a given number of individuals. We then compared the immigration frequencies obtained by fitting such theoretical distributions with those obtained from other sources. Next, in the light of current theories regarding the relationship between genetic kinship and geographic distance, we studied kinship estimates obtained from surnames as a function of geographic distance. Finally, we applied other standard methods of population genetic analysis, such as principal component and phylogenetic analysis, to the kinship data. Distribution of Neutral Alleles Karlin and McGregor (1967) have given the distribution of neutral alleles that are expected in a population of N (haploid) individuals each carrying one of к different alleles (surnames, in our case), subject to a process of death at random. Dead individuals are replaced by new ones carrying the same allele (the same surname) unless a mutation occurs with probability v. It was shown by Yasuda et al. (1974) that this distribution fits very well the observed surname data of the Parma Valley where it was possible to obtain immigration data. As v represents the sum of surname mutation (which is rare), plus immigration (which is frequent), we will assume in what follows that v estimates immigration. In the present paper we have also employed the logarithmic distribution first given by Fisher et al. (1943) for species abundance in insect samples instead of the KarlinMcGregor distribution which is more cumbersome to compute. One can This content downloaded from 157.55.39.253 on Wed, 08 Jun 2016 05:13:38 UTC All use subject to http://about.jstor.org/terms Surnames as Biological Markers 359 Fig. 1. Dioceses of the island of Sardinia easily show that the Fisher distribution can be derived from the KarlinMcGregor distribution as N » 100 the two distributions are already in very good agreement. Using Fisher s distribution, the number of surnames represented by к individuals in a sample of N is expected to be

Referência(s)