Artigo Acesso aberto Revisado por pares

Mapping Trait Loci by Use of Inferred Ancestral Recombination Graphs

2006; Elsevier BV; Volume: 79; Issue: 5 Linguagem: Inglês

10.1086/508901

ISSN

1537-6605

Autores

Mark J Minichiello, Richard Durbin,

Tópico(s)

Genetic Associations and Epidemiology

Resumo

Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.’s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data. Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.’s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data. Unraveling the genetic basis of complex disease is one of the main goals of human genetics. In the case-control association study design,1Cordell HJ Clayton DG Genetic association studies.Lancet. 2005; 366: 1121-1131Abstract Full Text Full Text PDF PubMed Scopus (396) Google Scholar, 2Palmer LJ Cardon LR Shaking the tree: mapping complex disease genes with linkage disequilibrium.Lancet. 2005; 366: 1223-1234Abstract Full Text Full Text PDF PubMed Scopus (177) Google Scholar nonfamilial individuals are genotyped for a panel of SNPs that capture most but not all of the genetic variation in a population. Each individual is labeled as either a “case” (affected by the disease) or as a “control” (unaffected), and, by analyzing the segregation of SNP alleles between cases and controls, it is possible to identify loci with statistical association with the disease. One of the simplest analyses for case-control data is Pearson’s χ2 test applied to each marker. This tests for nonindependence between genotype and phenotype, and, in certain circumstances, it will successfully identify disease associations—such as when causative polymorphisms are typed or are in strong linkage disequilibrium (LD) with typed markers.3Devlin B Risch N A comparison of linkage disequilibrium measures for fine-scale mapping.Genomics. 1995; 29: 311-322Crossref PubMed Scopus (859) Google Scholar, 4Pritchard JK Przeworski M Linkage disequilibrium in humans.Am J Hum Genet. 2001; 69: 1-14Abstract Full Text Full Text PDF PubMed Scopus (901) Google Scholar But, by the testing of each marker independently, information about the population history is discarded (in particular, information about the coinheritance of markers) that, if exploited, can yield a substantial increase in power. A potentially more powerful approach is to interpret the pattern of variation by considering the evolutionary processes that produced it.5Nordborg M Tavaré S Linkage disequilibrium: what history has to tell us.Trends Genet. 2002; 18: 83-90Abstract Full Text Full Text PDF PubMed Scopus (357) Google Scholar, 6McVean GA A genealogical interpretation of linkage disequilibrium.Genetics. 2002; 162: 987-991PubMed Google Scholar In this article, we present an algorithm for reconstructing the genealogical history of a population sample and show how these genealogies can be used to fine map disease loci. Additionally, we use the genealogies to dissect the association signal—estimating the frequencies of untyped polymorphisms and searching for allelic heterogeneity and epistasis. The formalism we use for representing these genealogies is the ancestral recombination graph (ARG).7Griffiths RC Marjoram P An ancestral recombination graph.in: Donnelly P Tavaré S Progress in population genetics and human evolution. Springer Verlag, New York1997: 257-270Crossref Google Scholar For a population of chromosome sequences, the ARG describes how they are related to each other—through mutation, recombination, and coalescence—back to a common ancestor (fig. 1A). Note that we are using the term “ARG” to mean the data structure for representing genealogical histories. The distribution of these under the Wright-Fisher model with recombination is described by a stochastic process called the “coalescent-with-recombination” model.7Griffiths RC Marjoram P An ancestral recombination graph.in: Donnelly P Tavaré S Progress in population genetics and human evolution. Springer Verlag, New York1997: 257-270Crossref Google Scholar, 8Hudson RR Properties of a neutral allele model with intragenic recombination.Theor Popul Biol. 1983; 23: 183-201Crossref PubMed Scopus (541) Google Scholar, 9Nordbord M Coalescent theory.in: Balding DJ Bishop MJ Cannings C Handbook of statistical genetics. John Wiley & Sons, Chichester2001Google Scholar, 10Stephens M Inference under the coalescent.in: Balding DJ Bishop MJ Cannings C Handbook of statistical genetics. John Wiley & Sons, Chichester2001Google Scholar For each position on the chromosome, there is a genealogical tree, called a “marginal tree,” embedded in the ARG. As one moves along the chromosome, the topologies of consecutive marginal trees shift according to the impact of historical recombination events (fig. 1B and 1C). In this way, historical recombination events define the chromosomal region that each marginal tree spans, and, since many recombination events have occurred in population history, the resolution is very fine. If there is a disease-predisposing mutation at a particular chromosomal location, it would have occurred on some internal branch of the marginal tree at that location. So, one way to find disease associations is to scan across the marginal trees, looking for those with branches that discriminate well between cases and controls—that is, that have a large number of cases beneath them and significantly fewer controls. Such a clustering of the cases underneath a branch suggests that a causative mutation arose on that branch. If the true ARG were known, it would provide the optimal amount of information for mapping—no extra information would be available from the genotypes. Not only would disease-associated regions be identified, but the ARG would give the ages of the causative mutations, would specify the haplotypic background of those mutations, and so forth. It would also be possible to optimally impute missing data. But, unfortunately, the true ARG is unknowable, and inference under the coalescent-with-recombination model has proven computationally prohibitive. This is in part because there are infinitely many ARGs compatible with any set of genotype data, and very many of these are of comparable likelihood.11McVean GA Cardin NJ Approximating the coalescent with recombination.Philos Trans R Soc Lond B Biol Sci. 2005; 360: 1387-1393Crossref PubMed Scopus (219) Google Scholar, 12Song YS Lyngsø R Hein J Counting all possible ancestral configurations of sample sequences in population genetics.IEEE/ACM Trans Comput Biol Bioinform. 2006; 3: 239-251Crossref PubMed Scopus (12) Google Scholar The difficulties involved in coalescent-based inference have partly motivated the development of faster haplotype-clustering methods.13Templeton AR Boerwinkle E Sing CF A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila.Genetics. 1987; 117: 343-351PubMed Google Scholar, 14Molitor J Marjoram P Thomas D Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques.Am J Hum Genet. 2003; 73: 1368-1384Abstract Full Text Full Text PDF PubMed Scopus (78) Google Scholar, 15Durrant C Zondervan KT Cardon LR Hunt S Deloukas P Morris AP Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes.Am J Hum Genet. 2004; 75: 35-43Abstract Full Text Full Text PDF PubMed Scopus (164) Google Scholar, 16Templeton AR Maxwell T Posada D Stengard JH Boerwinkle E Sing CF Tree scanning: a method for using haplotype trees in phenotype/genotype association studies.Genetics. 2005; 169: 441-453Crossref PubMed Scopus (72) Google Scholar, 17Waldron ER Whittaker JC Balding DJ Fine mapping of disease genes via haplotype clustering.Genet Epidemiol. 2006; 30: 170-179Crossref PubMed Scopus (40) Google Scholar These cluster the haplotype sequences (for small nonrecombining regions) and perform statistical tests on these clusters. The clustering hierarchy is often organized as a cladogram, which is assumed to approximate the marginal tree for that region. However, compared with the ARG, cladograms are a coarse approximation of population evolution, and there is often difficulty in modeling the relationships between similar haplotypes and in handling rare haplotypes. Additionally, it is often assumed that haplotypes are observed directly and that one can define nonrecombining haplotype blocks, which, in general, is not the case. We have developed an ARG-based mapping method that has computational efficiency nearing that of haplotype-clustering methods. We achieve this by using a heuristic approach for ARG inference and are thereby able to construct ARGs for thousands of individuals typed for hundreds of SNPs; this is sufficiently fast that the analysis may be windowed over the whole genome, fitting the scale of proposed large-scale case-control studies. However, in this article, we focus our attention on fine mapping and interpretation of a signal at a potentially associated locus, in part because there are currently no publicly available genomewide-association-study data sets, experimental or simulated. Because the algorithm is heuristic, we do not claim to sample ARGs from the coalescent-with-recombination model; instead, we suggest that we infer plausible ARGs, a claim that can be tested by seeing how well these ARGs infer properties of causative polymorphisms. In this way, our method fills the gap between methods that are based on more-sophisticated coalescent models18Larribe F Lessard S Schork NJ Gene mapping via the ancestral recombination graph.Theor Popul Biol. 2002; 62: 215-229Crossref PubMed Scopus (29) Google Scholar, 19Morris AP Whittaker JC Balding DJ Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies.Am J Hum Genet. 2002; 70: 686-707Abstract Full Text Full Text PDF PubMed Scopus (100) Google Scholar, 20Zollner S Pritchard JK Coalescent-based association mapping and fine mapping of complex trait loci.Genetics. 2005; 169: 1071-1092Crossref PubMed Scopus (96) Google Scholar but require prohibitive computation and haplotype-based methods that model less precisely the structure and evolution of a disease locus. A related problem to constructing plausible ARGs is that of constructing minimal ARGs21Myers SR Griffiths RC Bounds on the minimum number of recombination events in a sample history.Genetics. 2003; 163: 375-394PubMed Google Scholar, 22Gusfield D Eddhu S Langley C Optimal, efficient reconstruction of phylogenetic networks with constrained recombination.J Bioinform Comput Biol. 2004; 2: 173-213Crossref PubMed Scopus (134) Google Scholar, 23Song YS Hein J Constructing minimal ancestral recombination graphs.J Comput Biol. 2005; 12: 147-169Crossref PubMed Scopus (74) Google Scholar—that is, those with the smallest number of recombination events required to derive a sample of sequences. An algorithm that is similar to our ARG inference method has been developed independently for this problem.24Lyngsø R Song YS Hein J Minimum recombination histories by branch and bound.Proceedings of Workshop on Algorithms in Bioinformatics 2005, Lecture Notes in Computer Science. 2005; 3692: 239-250Google Scholar Our emphasis is, however, on inference of plausible ARGs rather than minimal ones. To develop an intuition for how our ARG inference algorithm works, we will give an informal description in two stages—first, by describing a way to construct genealogical trees for nonrecombining chromosome sequences and then by extending this to include recombination, so that ARGs can be constructed for any set of sequences. When the sequences are nonrecombining, we only need to use coalescences and mutations to describe their genealogy, and there are efficient algorithms for this.25Gusfield D Efficient algorithms for inferring evolutionary trees.Networks. 1991; 21: 19-28Crossref Scopus (281) Google Scholar, 26Griffiths RC Tavaré S Unrooted genealogical tree probabilities in the infinitely-many-sites model.Math Biosci. 1995; 127: 77-98Crossref PubMed Scopus (78) Google Scholar Working backward in time, two haplotype sequences can coalesce into a parent sequence (that is, their lineages merge into one) only if they are identical. Since the goal of our algorithm is to coalesce back to a single common ancestor, we perform coalescences whenever possible. Unless all the sequences are identical, we will also need to infer mutation events and remove mutant alleles from ancestral sequences. We will assume the infinite-sites model throughout, which stipulates that there are no back or recurrent mutations. Consequently, a mutant allele can be removed from a set of ancestral sequences only if it occurs on exactly one of those sequences. By performance of mutations and coalescences as described, ancestral populations are defined, and, if there were no recombinations, it will be possible to coalesce back to a single common ancestor. If recombination did occur, it may not be possible to construct a tree for the sequenced region, and, instead, an ARG must be inferred. To infer recombination events, our algorithm looks for pairs of sequences that are identical over a contiguous region (fig. 1D–1G). We assume that such a shared tract is inherited intact from an ancestor and that the sequence mismatches at either end of the tract were caused by historical mutation or recombination events. If recombination events are added at both ends of a shared tract, the tract becomes decoupled from the genetic material to the left and right of it and is then free to coalesce. To understand this, consider working backward in time, putting a recombination event on a sequence. This results in two parental sequences, a left parent and a right parent, that are only defined to the left and right of the recombination breakpoint, respectively—they failed to pass on the rest of their genetic material to the next generation. Since undefined regions have no constraint on what they can coalesce with, the number of mismatching alleles preventing a coalescence is reduced, possibly to zero. By incorporating recombination into genealogy construction, it is always possible to construct an ARG that coalesces back to a single common ancestor. What follows is a more detailed description of the algorithm. It can infer ARGs from population data with missing genotypes and unknown haplotype phase, although, for ease of exposition, we initially describe the simpler case of perfect phase-known data. The algorithm works backward in time from the contemporary, typed population of chromosome sequences to a single ancestor sequence. Each step back in time, accomplished with a recombination, mutation, or coalescence, defines an ancestral population of sequences. We denote the set of sequences at time T as ST, and the sequences are, in the phase-known case, strings of length m from the alphabet {0,1,.}, where m is the number of markers, 0 is one of the SNP alleles, 1 is another allele, and “.” denotes an undefined allele—undefined because it was not inherited by any sequences in the contemporary, typed population. The allelic state of a SNP on sequence C is denoted C[i], where i is the marker position, numbered from 1, so 1≤i≤m. We define C1[i]∼C2[i] if and only if C1[i]=C2[i], or C1[i]=., or C2[i]=. We define a complement operator, ¬, such that, if C[i]=0, then ¬C[i]=1 and vice versa, and . is its own complement. There is a shared tract between sequences C1 and C2, over the contiguous set of markers a,…,b, (1) if C1[i]∼C2[i] for all a≤i≤b; (2) if there is at least one i for which C1[i]=C2[i]≠.; (3) if a>1, then C1[a-1]≠C2[a-1] and neither is . ; and (4) if b<m, then C1[b+1]≠C2[b+1] and neither is . . Item (1) requires that the two sequences have the same allelic state over the shared tract; item (2) requires that, for at least one position in the tract, both sequences are defined; and items (3) and (4) require that the shared tract is maximal. We denote such a shared tract as {C1,C2}[a,b]. The algorithm is initialized at time T=1 (T is incremented as we move back in time) by setting S1 to be the set of contemporary, typed sequences. The algorithm proceeds by finding which coalescences, mutations, and recombinations can be performed, determining this according to the rules below. Applying one of these operations defines an ancestral population ST+1, which is constructed from ST by use of the transitions also described below. The algorithm continues in this way until it arrives at a population with only one sequence. Coalescence. Rule. If there exist two sequences, C1 and C2, in ST such that, for all i, C1[i]∼C2[i], then C1 and C2 can be coalesced into an ancestor.Transition. ST+1=ST∖{C1,C2}∪{C′} where C′[i]=C1[i] when C1[i]≠. and C′[i]=C2[i] otherwise. (By ST∖{C1,C2}∪{C′}, we mean ST with the sequences C1 and C2 removed and the sequence C′ added in.)Mutation. Rule. If there exists a sequence C1 in ST and a marker i, where, for all C2 in ST∖{C1}, we have C2[i]=¬C1[i] or . , then we can remove the derived allele (C1[i]) from the population.Transition. ST+1=ST∖{C1∪{C′}, where C′[i]=¬C1[i] and C′[j]=C1[j] for all j≠i.Recombination. Rule. When the rules for coalescence and mutation are not satisfied, we must perform a recombination (or a pair of recombinations) instead. We denote a recombination breakpoint as α,β, meaning that it occurs between markers α and β. Picking a shared tract {C1,C2}[a,b] from those available in ST, we aim to put recombinations on the lineages of C1 and C2 such that one recombination parent of C1 and one recombination parent of C2 satisfy the rule for coalescence. To do this, we must put a breakpoint at a-1,a if a≠1 and put a breakpoint at b,b+1 if b≠m.Transition. From the tract {C1,C2}[a,b], pick (1) a valid breakpoint α,β, where either α,β=a-1,a or α,β=b,b+1, and (2) a recombinant sequence CR, where either CR=C1 or CR=C2. Then, ST+1=ST∖{CR∪{C′1,C′2}, where C′1[i]=CR[i] for i≤α and C′1[i]=. otherwise, and C′2[i]=CR[i] for all i≥β and C′2[i]=. otherwise. If both a-1,a and b,b+1 are valid breakpoints (i.e., a≠1 and b≠m), we must put the second recombination (taking us to state ST+2) on an appropriate ancestor of C1 or C2. See figure 1D–1G for an example. These rules define the constraints on the algorithm that must be enforced if it is to produce legal ARGs. However, at any stage of the algorithm, there may be several different coalescences, mutations, or recombinations that satisfy the rules. We choose between these, using the heuristics below, and the stochastic elements mean that different ARGs are generated each time the algorithm is run. Heuristics. (1) Perform a recombination only if no mutations or coalescences are possible. (2) If it is possible to add multiple mutations and/or multiple coalescences at the same time, the order in which these are done is chosen arbitrarily. (3) Coalesce sequences only if they have an overlapping region of defined material—that is, the two sequences must match for at least one position that is not . . This restriction reflects ideas in the sequentially Markovian coalescent-with-recombination model.11McVean GA Cardin NJ Approximating the coalescent with recombination.Philos Trans R Soc Lond B Biol Sci. 2005; 360: 1387-1393Crossref PubMed Scopus (219) Google Scholar (4) Recombinations are added at the ends of longer shared tracts first. During the recombination step, we choose a shared tract {C1,C2}[a,b] such that the base-pair distance between markers a and b is maximized, reflecting that longer shared tracts tend to arise from more-recent recombination events. However, because this is only a tendency, not absolute, we break this heuristic with a certain probability (which, throughout this article, is 0.1), and, in these cases, a randomly selected tract is used to position recombination breakpoint(s). (5) The first coalescence after a recombination is based on the shared segment that was used to decide the location of that recombination. By extending our algorithm, it is possible to resolve haplotype phase and impute missing data while constructing an ARG. Handling the missing data is the simpler of the two cases. A missing character is allowed to coalesce with any other character (0,1,., or another missing character), and, when it coalesces with a state-known character (0 or 1), the missing character becomes fixed to that state and this assignment is propagated down the ARG to the leaves. Phasing the data is similar, except that a record of the diploid pairings of chromosomes is kept. A phase-unknown character may not coalesce with the corresponding phase-unknown character on its sister chromosome (because the individual is heterozygous at that position). When a phase-unknown character coalesces with a state-known character, its phase becomes fixed, as does the character on its sister chromosome, although to the complement state. When phase-unknown characters from two chromosomes coalesce, these chromosomes and their sisters become dependent on each other; neither of those chromosomes may coalesce with the sister of the other one, and, when one of the chromosomes has a character phase resolved, that character is also resolved on the other chromosome and to the complement state on the two sister chromosomes. Of course, many more than four chromosomes can become involved in such interdependencies. An ARG generated as described above defines a marginal tree for each chromosome position (fig. 1A–1C). For a given position, the marginal tree can be extracted from the ARG by tracing the genealogy of that position back in time from the leaves. When a recombination is encountered, the genealogy follows the path of the left recombination parent if the breakpoint is to the right of the position in question; otherwise, it follows the right recombination parent. We can test a position for association by seeing whether its marginal tree has a branch on which we can place a hypothetical causative mutation that suitably explains the observed disease states of the genotyped individuals—such as mutation 2 in figure 1. (Note that, although such a branch extends over an interval of markers in the ARG, localization is refined by recombination events lower down the ARG; these change the number of case and control chromosomes under the branch at each position. Therefore, our method gives a different score at each marker.) Our test is as follows: since the true ARG is unknown, we infer an ensemble of 100 plausible ARGs. These are generated by running the ARG inference algorithm 100 times, and stochastic choices made during ARG construction (such as which pairs of sequences to coalesce first) mean that these ARGs are all different. For each marker, the 100 marginal trees are extracted from the ARGs. For each marginal tree, hypothetical disease-predisposing mutations are put on each branch in turn. These cause the case-control individuals (the leaves of the tree) to be bipartitioned into those with the mutant allele and those with the ancestral allele. A χ2 test can then be used to detect nonindependence between inferred allelic state and disease state. If there are n leaves, then there are n-3 nonequivalent, nonunary bipartitions of a tree, and, hence, n-3 χ2 test statistics for a tree. Under the assumption that the region spanned by one tree harbors, at most, one causative mutation, we take the maximum of these n-3 test statistics, calling this the “best-cut score.” After finding the best-cut score for each of the 100 trees, we take the mean, giving an association score for the marker (this assumes that all the inferred ARGs are equally likely). Although we test for nonindependence between alleles and disease, the test could easily be modified to test for association between genotype and disease. Similarly, a regression could be performed, rather than a χ2 test, allowing our method to be applied to quantitative phenotype data. Or we could calculate the likelihood of the data given the tree, although this would require an explicit disease and mutation model. Also, we need not assume that there is only one causative mutation on a tree.20Zollner S Pritchard JK Coalescent-based association mapping and fine mapping of complex trait loci.Genetics. 2005; 169: 1071-1092Crossref PubMed Scopus (96) Google Scholar We calculate the statistical significance of the mapping score at each marker—the markerwise P value—by permuting the assignments of case and control labels of the individuals and repeating the test above. By performance of multiple permutations, an empirical null distribution is generated from which the P value can be calculated.27Churchill GA Doerge RW Empirical threshold values for quantitative trait mapping.Genetics. 1994; 138: 963-971Crossref PubMed Google Scholar For P values exceeding the precision of the permutations, we fit an extreme value distribution to the empirical distribution.28Dudbridge F Koeleman BP Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies.Am J Hum Genet. 2004; 75: 424-435Abstract Full Text Full Text PDF PubMed Scopus (130) Google Scholar Since multiple markers are being tested for association, there is a multiple-testing issue, which we can correct for by calculating, for each marker, an experimentwise P value: the probability that any of the typed markers show such a strong association signal by chance. Again, this is done by permutation; after shuffling the case and control labels, the maximum association score of all the markers is recorded, thus defining an empirical experimentwise null distribution. Once again, an extreme value distribution can be fitted, to estimate small P values. To evaluate the performance of our method under a variety of disease models, we simulated suites of case-control studies. Each suite contains 50 studies simulated under the same model, which was parameterized according to (1) the recombination model of the population from which the cases and controls were sampled, (2) the tagging SNP (tSNP) ascertainment scheme, (3) whether the sequences are phased or unphased and the amount of missing data, and (4) the disease model parameters: genotype relative risk, disease-allele frequency, and size of study. The case-control studies were sampled from one of two populations, which we call “constant” and “hot.” Both populations contain 20,000 1-Mb chromosome sequences, which were simulated using the FREGENE forward simulator29Hoggart CJ Clark T Lampariello R De Iorio M Whittaker J Balding D FREGENE: software for simulating large genomic regions. Department of Epidemiology and Public Health, Imperial College, University of London, London2005Google Scholar (BARGEN Web site) and are available from the Margarita Web site. The constant population was simulated using the simple (i

Referência(s)