Artigo Acesso aberto Revisado por pares

Inferring regulatory mechanisms from patterns of evolutionary divergence

2011; Springer Nature; Volume: 7; Issue: 1 Linguagem: Inglês

10.1038/msb.2011.60

ISSN

1744-4292

Autores

Itay Tirosh, Naama Barkai,

Tópico(s)

Genetic Mapping and Diversity in Plants and Animals

Resumo

Perspective13 September 2011Open Access Inferring regulatory mechanisms from patterns of evolutionary divergence Itay Tirosh Itay Tirosh Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Naama Barkai Corresponding Author Naama Barkai Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Itay Tirosh Itay Tirosh Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Naama Barkai Corresponding Author Naama Barkai Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Author Information Itay Tirosh1 and Naama Barkai 1 1Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel *Corresponding author. Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +972 8934 4429; Fax: +972 8934 4108; E-mail: [email protected] Molecular Systems Biology (2011)7:530https://doi.org/10.1038/msb.2011.60 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info The number of sequenced species is increasing at a staggering rate, calling for new approaches for incorporating evolutionary information in the study of biological mechanisms. Evolutionary conservation is widely used for assigning a function to new proteins and for predicting functional coding or non-coding sequences. Here, we argue for a complementary approach that focuses on the divergence of regulatory programs. Regulatory mechanisms can be learned from patterns of evolutionary divergence in regulatory properties such as gene expression, transcription factor binding or nucleosome positioning. We review examples of this concept using yeast as a model system, and highlight a hybrid-based approach that is highly instrumental in this analysis. The basic approach: why comparing related species could help in identifying regulatory mechanisms The ability to rapidly and cheaply sequence full genomes is revolutionizing biological research. However, the static genomic sequences conceal a highly dynamic program of DNA-associated processes such as transcription, binding of transcription factors or positioning of nucleosomes. This 'regulatory program' is a defining property of an organism. Understanding this regulation and how it can be predicted from the genomic sequence are key challenges of genomic research. Comparative genome analysis of related species is highly instrumental in this regard, as it is used routinely for identifying conserved sequences and by that to predict cis-regulatory elements and other functional non-coding sequences (Kellis et al, 2003; Xie et al, 2005). Here, we will discuss the possibility of extending this comparative approach in two ways: first, by comparing also the regulatory programs of different species or strains (rather than just their genomic sequences) and second, by focusing on evolutionary divergence (rather than conservation). The question of how gene expression evolves between closely related species has been studied extensively (King and Wilson, 1975; Carroll, 2005). Widespread inter-species differences were identified in high-throughput comparisons (Rifkin et al, 2003; Gilad et al, 2006; Khaitovich et al, 2006; Tirosh et al, 2006; Yanai and Hunter, 2009), and expression profiles vary greatly even between strains (or individuals) of the same species (Brem et al, 2002; Oleksiak et al, 2002; Denver et al, 2005; Kliebenstein et al, 2006; Landry et al, 2006). Evolutionary differences were also identified in other genomic properties, including transcription factor (TF) binding sites (Borneman et al, 2007; Kasowski et al, 2010; Schmidt et al, 2010; Zheng et al, 2010), nucleosome positioning (Field et al, 2009; Tsankov et al, 2010; Tirosh et al, 2010b), histone modifications (Nagarajan et al, 2010) and protein phosphorylation (Beltrao et al, 2009), leading to the conclusion that regulatory programs change quite rapidly. Divergence of regulatory programs is encoded in the divergence of the genomic sequences. Inter-species comparison, therefore, provides a way to connect sequence and regulatory divergence. In this view, analyzing how the regulatory program diverged is analogous to studying how the regulatory program responds to multiple genetic perturbations. However, while genetic perturbations are normally studied one at a time (e.g., single-gene knockout), inter-species analysis uncovers the combined effects of numerous genetic perturbations. The disadvantage of this approach is that it becomes difficult to associate a regulatory change with specific genetic (sequence) perturbation. The advantage is that by combining many genetic perturbations, one can uncover general trends connecting the genetic perturbations with the regulatory programs. In the sections below, we provide the following examples which employed this idea: Patterns of expression divergence revealed a promoter architecture that governs gene expression variability. Patterns of expression divergence in strains deleted of chromatin regulators revealed that chromatin regulators function as genetic capacitors of gene expression. Patterns of divergence of antisense-containing genes suggested that antisense transcription induces a threshold-dependent transcriptional switch. Patterns of divergence in nucleosome positioning describes both cis and trans influences on nucleosome positioning. Patterns of divergence of mRNA degradation rate suggested a mechanistic coupling between transcription and mRNA degradation. Patterns of allelic expression in a cross between diverged mice strains indicated instances of genomic imprinting. The first five examples used related budding yeast strains (#3), or species (#1, 2, 4, 5) which share approximately the same set of genes, are almost completely syntenic and have high sequence similarity (∼80–90% identical nucleotide sequences; Kellis et al, 2003). The last example used different mice strains. Before describing these examples in detail, we discuss two general issues. First, we consider the relative contribution of natural selection (versus random drift) in shaping the observed patterns of divergence. Second, we discuss the general 'hybrid approach' for dissecting regulatory changes into those that are generated in cis versus those generated in trans. The contributions of positive selection versus neutral drift in shaping the regulatory divergence As described above, we view the regulatory divergence between species as the result of a large collection of mutations. This enables us to deduce general trends connecting changes in genomic sequence to changes in the regulatory program. An underlying assumption here is that mutations accumulate largely at random. Yet, at least in some cases, recurrent patterns of regulatory divergence may reflect the signature of natural selection rather than the outcome of a random sampling of genetic mutations. For example, if certain aspects of regulation undergo frequent adaptive evolution (or conversely are kept constant by purifying selection) then these may appear as a recurrent pattern. The contribution of positive selection to evolution of gene regulation received much attention (Khaitovich et al, 2004; Yanai et al, 2004; Jordan et al, 2005). The emerging conclusion is that adaptive regulatory changes are an exception rather than the rule, and that most of the observed regulatory changes are neutral. First, regulatory changes among closely related species are much more widespread than expected from the apparent differences in physiology between the species (often encompassing half of all genes) and second, most regulatory changes are small in magnitude and therefore are unlikely to carry phenotypic consequences: typical yeast inter-species differences in gene expression are in the order of 1.5-fold (Tirosh et al, 2009b), while a 2-fold reduction (heterozygote deletion) in most yeast genes has no effect on growth rates (Giaever et al, 1999). These and other considerations (Khaitovich et al, 2006) suggest that most of the changes observed in the regulatory program are neutral and therefore that the recurrent patterns may serve as good proxies for identifying regulatory mechanisms. Nevertheless, a 'selective' explanation should still be considered when inferring a regulatory mechanism, as described in each of the sections below. The 'hybrid approach' for dissecting regulatory changes Another key challenge is to associate regulatory changes with the causal genetic mutations. One way of doing that is using the genetical-genomics approach (Rockman and Kruglyak, 2006). The idea is to perform linkage analysis, comparing the segregation of regulatory changes with that of sequence polymorphisms in a panel of dozens of segregates from a cross between two strains. This approach is highly informative, but is also labor intensive. Furthermore, it can only be used for analyzing differences between strains of the same species but not for analyzing inter-species differences, as, by definition, inter-specific F1 hybrids are sterile and do not produce F2 segregants. An alternative general approach which was used extensively in the studies we describe below is to 'mix' the two genomes within one organism. This is done by generating a hybrid, which contains full copies of both genomes (Wittkopp et al, 2004; Figure 1). In the context of the hybrid, orthologous genes appear as different alleles of the same gene, and are subject to regulation by the same trans environment (e.g., same regulatory proteins). Thus, allele-specific differences in regulation must reflect mutations that are linked in cis to the gene itself and distinguish between its two orthologous alleles (i.e., mutations in the gene or its flanking regulatory sequences). The additional differences that are observed between orthologs in the two species, but not within the hybrid, correspond to the effects of trans mutations. Figure 1.Hybrid analysis distinguish between the effects of mutations in cis or trans. Comparison of gene expression between two species reveals differences due to the combined effects of cis and trans mutations (top). Analysis of a hybrid formed by mating the two species allows classification of the differences into those due to cis-acting and trans-acting mutations: two hybrid alleles (that correspond to orthologous genes of the parental species) reside in the same nucleus and are regulated by the same set of trans factors, thus avoiding any differential expression due to trans-acting mutations (bottom right). However, cis-acting mutations discriminate between the two alleles and thus maintain the inter-species differences (bottom left). Download figure Download PowerPoint Genome-wide analysis of allele-specific expression was initially performed in humans (Yan et al, 2002; Verlaan et al, 2009), followed by extensive analysis of intra- and inter-specific F1 hybrids in flies (Wittkopp et al, 2004, 2008; Landry et al, 2005; McManus et al, 2010), yeasts (Sung et al, 2009; Tirosh et al, 2009b; Emerson et al, 2010a) and mice (Wang et al, 2010; Gregg et al, 2010a, 2010b). A related approach involves the substitution of individual chromosomes between flies (Lemos et al, 2008; Wang et al, 2008). Similarly, a mouse model of Down syndrome that contains an extra human chromosome 21 was used to study the role of cis- and trans-regulatory mutations in mammals (Wilson et al, 2008). Collectively, these studies demonstrate that inter-species expression divergence is dominated by cis mutations, in contrast to expression divergence between strains, which is generated primarily by trans mutations (Wittkopp et al, 2008). At individual genes, the distinction between cis and trans effects is not sufficient to identify the causal mutations, but provides a useful classification of regulatory changes that hints where to look for the specific mutations. Importantly, cis mutations affecting different genes reflect independent evolutionary events and this can increase the significance of observed recurrent patterns (Bullard et al, 2010). A more complete discussion on the properties and roles of cis- and trans-acting mutations can be found in recent reviews (Thompson and Regev, 2009; Emerson and Li, 2010b). 'Expression variability' and how it is linked to promoter architecture Genes diverge at different rates. This was first observed in analysis of protein sequences: some proteins diverge quickly, whereas others remain largely the same between organisms. There are multiple reasons for that (Pal et al, 2006). For example, essential proteins are relatively conserved, likely due to the need to maintain their function. Highly expressed proteins are especially well conserved, possibly because they are subject to more stringent constraints for proper folding (Drummond and Wilke, 2008). One of the first questions that were asked when comparing the regulatory program is whether these differences in evolutionary rates generalize also to regulatory properties. Not surprisingly, the rate by which gene expression diverges also differs between genes (Figure 2A). What was surprising, though, was the low correlation between this rate and evolutionary rate of the associated proteins. Although weak correlation was observed among mammalians (Khaitovich et al, 2005), no correlation was identified among yeasts (Tirosh and Barkai, 2008a). Thus, the fact that proteins are conserved in sequence did not imply their conservation in expression. Similarly, genes that diverged rapidly in expression often encoded for proteins with conserved sequence. The two modes of evolution, therefore, reflect distinct constraints. In yeast, only essentiality correlated with conservation both in expression and in sequence, while all other determinants were specific to one mode of evolution (Tirosh and Barkai, 2008a). Additional determinants that were either common or specific to the two modes of evolution were found in flies (Lemos et al, 2005) and in mammals (Liao and Zhang, 2006). Figure 2.Expression divergence reflects an intrinsic tendency for expression variability that is encoded in promoter structure. (A) Comparison of genome-wide expression patterns among multiple species identifies genes whose expression patterns remain well conserved (low divergence) and genes whose expression diverged extensively (high divergence). (B) Expression divergence is correlated with responsiveness to environmental changes (data taken from Tirosh et al, 2006), and both measures are higher among OPN genes that contain a TATA-box (purple) than among DPN genes that lack a TATA-box (cyan) (Tirosh and Barkai, 2008b); empty black circles represent genes which do not fit with these two gene classes (e.g., intermediate nucleosome pattern between DON and OPN). (C) Schemes depicting the typical promoter structure of OPN genes with a TATA-box (top, purple) and DPN genes without a TATA-box (bottom, cyan). OPN promoters lack NFR, contain multiple TF binding sites (squares), a TATA-box and fuzzy nucleosomes (i.e., nucleosome positions vary across time and within a population; marked with a double-headed arrow). DPN promoters contain NFR, fewer TF binding sites and no TATA-box, and well-positioned nucleosomes. Download figure Download PowerPoint One of the notable findings was that divergence of gene expression strongly correlates with other measures of expression variability on completely different time scales (Tirosh et al, 2006; Figure 2B). Genes that diverged rapidly in expression tended also to vary more when environmental conditions were modified. Moreover, these genes were more 'noisy', displaying a wider variance in expression between identical cells subject to the same conditions (Newman et al, 2006). These observations suggested that evolutionary divergence is one facet of a more general property of expression variability: some genes are capable of broad changes in expression, whereas in others this capacity is limited (Tirosh et al, 2009a). Importantly, a 'selective' explanation to the increased expression divergence of these genes was ruled out as these results were reproduced in analysis of mutation accumulation strains: strains that evolved in the laboratory while maintaining very low effective population sizes, thereby allowing non-lethal mutations to accumulate randomly with minimal effects of natural selection (Denver et al, 2005; Rifkin et al, 2005; Landry et al, 2007). If expression variability is indeed a general property that differs between genes, it might be encoded in the promoter sequence. Indeed, bioinformatics analysis revealed a strong association between expression variability (on all time scales) and particular promoter structures (Figure 2B and C). First, promoters of variable genes have a TATA-box at much higher frequency than less-variable genes (Tirosh et al, 2006). Second, the organization of nucleosomes in variable genes typically lacks the Nucleosome Free Region (NFR) immediately upstream of the transcription start site, which is a characteristic of most genes. We, therefore, denoted this promoter structure as 'OPN' for Occupied Proximal Nucleosome, in contrast to Depleted Proximal Nucleosome (DPN) genes (Tirosh and Barkai, 2008b). Notably, the association of this promoter organization with expression variability was not specific to yeast as similar observations were made in other organisms (Tirosh and Barkai, 2008b; Gilchrist et al, 2010). A key challenge will be to understand how TATA and OPN promoters support higher expression variability. A hint to this question comes from the hybrid approach for distinguishing cis and trans effects. Although most of the expression divergence between S. cerevisiae and S. paradoxus reflect cis effects, the increased divergence associated with TATA and OPN was instead due to trans effects (Tirosh et al, 2009b). Coupled with the observation that these promoters are typically bound by (and affected by deletion of) a larger number of regulators compared with other promoters (Landry et al, 2007; Choi and Kim, 2008; Tirosh and Barkai, 2008b; Venters et al, 2011), these results suggest that there are simply more regulators (and thus possible trans-mutations) affecting these promoters. For example, in the OPN organization binding sites are more often covered by nucleosomes, an organization that may facilitate competition between the binding of TFs and nucleosomes and could thus increase the influence of various chromatin regulators. Differences in expression variability between genes may be related to the general interplay between 'robustness' and 'evolvability'. An organism needs to be robust, namely maintain a reliable function under different conditions or when subjected to mutations. At the same time, it needs to maintain the ability to evolve in order to adapt to new environments, but this requires sensitivity to genetic mutations. The ability to control the plasticity of gene expression through its promoter structure may help in this interplay. Accordingly, genes that are required for the robustness of the organism will be maintained as lowly variable and their expression program will evolve slowly, whereas those that facilitate adaptation to new environments will be maintained as highly variable. In support of that, we observed that in yeast, genes of high expression variability and those with TATA and OPN promoters preferentially encoded proteins that interact with the environment, such as transporters, and as such may mediate the response and adaptation to environmental changes (Tirosh et al, 2006, 2009a; Zhang et al, 2009b). Chromatin regulators as 'capacitors' of gene expression variations Another idea that arose from thinking about the interplay between robustness and evolvability is that of 'genetic capacitors'. Robustness of the wild-type organism may be facilitated by proteins that act as 'genetic capacitors' to reduce the effect of mutations. These capacitors enable the accumulation of mutations, or polymorphisms, that have no phenotypic consequences in normal conditions. The thought is that these mutations can support evolvability if capacitors are repressed under harsh conditions, so that the phenotypic effect of the accumulated mutations suddenly emerges. This model was proposed many years ago (Waddington, 1942), but only recently a striking example for a candidate 'genetic capacitor', the heat-shock protein Hsp90, was identified (Rutherford and Lindquist, 1998; Queitsch et al, 2002). A central question is whether chaperones, such as Hsp90, are unique in their capacity to buffer mutations or whether other protein capacitors can be identified. One line of thought suggested that, in fact, any regulatory protein with large-scale effects may serve in this capacity (Bergman and Siegal, 2003; Hermisson and Wagner, 2004; Levy and Siegal, 2008). The reason is that in the complex regulatory networks of cells, many epistatic effects are expected which means that mutations which do not have an effect in the wild-type organism, might still have a phenotypic consequence in the background of additional mutation. Large-scale regulators are particularly expected to be involved in such epistatic effects, and will therefore behave as effective capacitors: mutations that are neutral in the wild-type background will accumulate, but these mutations may have an effect when one of these regulators will be deleted or its function compromised. This idea was tested using an inter-species comparative approach (Tirosh et al, 2010a). The two yeast species, S. cerevisiae and S. paradoxus, differ in sequence and expression. The buffering hypothesis predicts, however, that many of the sequence differences are in fact buffered and therefore do not affect expression in wild-type cells. An expression effect will be observed when the capacitor protein is deleted, revealing the impact of hidden genetic changes. Comparing the expression profiles of S. cerevisiae and S. paradoxus, both for wild-type strains and for strains where specific chromatin regulators have been deleted, confirmed that this is the case (Figure 3). Deletions of each of the chromatin regulators that were examined had increased the amount of inter-species expression differences, consistent with the regulators acting as capacitors of gene expression variations. Furthermore, the hybrid analysis confirmed that these regulators buffer gene expression by acting in trans, as expected if they act primarily by influencing upstream regulatory signals. Figure 3.Chromatin regulators buffer gene expression variability. Comparison of inter-species expression differences between wild-type and (chromatin regulators) deletion strains shows increased expression differences among the deletion strains, indicating that chromatin regulators normally buffer the effects of hidden inter-species genetic variability. Download figure Download PowerPoint Note that in this example we support a 'selective' explanation whereby natural selection generated a bias toward mutations that affect gene expression in a mutant background, but not in the wild-type. Nonetheless, these results demonstrate that chromatin regulators effectively buffer gene expression and therefore that compromising their activity exposes hidden genetic variability. Function of antisense transcription Antisense transcription occurs frequently throughout the genomes of various organisms (He et al, 2008; Guell et al, 2009; Xu et al, 2009; Yassour et al, 2011). Several studies have shown that antisense transcription can, in certain cases, repress transcription of the sense genes by several mechanisms (Hongay et al, 2006; Camblong et al, 2007; Berretta et al, 2008), yet the frequency and mode of such repression remain poorly understood. Notably, the repressive effect of antisense on the sense transcript cannot be determined directly from steady-state expression levels but requires analysis of sense and antisense expression levels upon perturbations. Steinmetz and colleagues compared sense and antisense expression among 2 diverged S. cerevisiae strains and 48 of their segregants, thus revealing the effects of numerous genetic changes (Xu et al, 2011). This analysis showed that genes associated with antisense transcription show an increased variability in expression among segregants, suggesting an effect similar to that of the TATA-box and occupied nucleosome patterns described above. Interestingly, increased variability of antisense-containing genes was due to similar induction but more efficient repression: antisense-containing genes were often completely 'switched-off', while repression of other genes was more limited. In contrast, the induction (i.e. maximal levels) of antisense-containing genes did not differ from those of other genes, and thus the dynamic range and the variability of antisense-containing genes was typically larger than that of other genes. A 'selective' explanation to this effect (i.e. that natural selection favors mutations that repress antisense-containing genes) seems unlikely, and instead these results suggest a model in which antisense transcription induces a threshold-dependent switch of sense transcription. According to this model, low sense transcription (e.g., in the absence of activation) is easily inhibited by the antisense transcription, but higher sense transcription (e.g., upon activation) 'overcomes' this inhibition and eliminates the antisense effect. This model was further supported by direct experiments which demonstrated an inhibitory antisense effect that is abolished upon induction of the sense gene (Xu et al, 2011). In future work, it might also be useful to compare sense and antisense expression among different species and their hybrid. Characterizing determinants of nucleosome positioning Most studies on regulatory evolution focused on gene expression but recent work began to extend these studies to other properties such as TF binding, histone modifications and nucleosome occupancy. We will focus here on comparative studies of nucleosome occupancy, which provided insights into mechanisms that determine the positioning of nucleosomes along the genome. Nucleosomes, the basic building block of chromatin, decrease DNA accessibility and thus the ability of regulatory proteins to bind specific DNA regions and exert their regulatory function (Li et al, 2007). A key issue is what determines the positioning of nucleosomes along the DNA. In particular, there is a contemporary debate about the relative importance of the local DNA sequence (the 'genomic code'; Segal et al, 2006), compared with the contribution of trans factors such as chromatin remodelers, modifiers and TFs (Kaplan et al, 2009; Zhang et al, 2009a). Comparative analysis provided insights into these questions (Tirosh et al, 2010b; Figure 4). Nucleosome patterns of S. cerevisiae differ in many sites from the orthologous S. paradoxus patterns. Notably, the hybrid analysis mapped ∼70% of the differences as cis (i.e., resulting from changes in the local DNA sequence) and ∼30% as trans (i.e., resulting from mutations that effect regulatory proteins). These results provide a general estimation for the relative roles of local DNA sequence versus regulatory proteins in controlling nucleosome positioning, although this analysis might be biased by natural selection. The ability to compare genomic sequences at positions of cis-dependent differences was further informative for evaluating the role of sequence patterns in controlling nucleosome positions: among the various patterns that were proposed to control nucleosome positioning, only the presence of AT-rich elements had a significant effect, suggesting that AT-rich elements are the dominant feature of local DNA sequence with respect to nucleosome positioning. A 'selective' explanation is highly unlikely in this case, as variation in other sequence patterns was observed but was not associated with changes in nucleosome positioning. Recent analysis of sequence-derived models of nucleosome positioning has independently reached a similar conclusion (Tillo and Hughes, 2009). Figure 4.Inter-species and hybrid analysis uncovers determinants of nucleosome positioning. Inter-species comparison (blue and red correspond to S. cerevisiae and S. paradoxus, respectively) and hybrid analysis (black curves) of nucleosome positioning characterizes changes due to cis and trans mutations (Tirosh et al, 2010b). Nucleosomes found in only one of the species and in the corresponding hybrid allele reflect the effect of local (cis) mutations (right). Nucleosome found in only one of the species but in both hybrid alleles reflect the trans effect of distal mutations through the activity of a chromatin-related protein or RNA (left). Sequence analysis at positions of cis changes can suggest which sequence patterns influence nucleosome positioning. For example, the inset shows sequences of the two species within a region that is bound by a nucleosome only in S. paradoxus, demonstrating that inter-species substitutions changed the frequency of AT bases between 22/25 (S. cerevisiae) and 18/25 (S. paradoxus). This is consistent with a nucleosome-disfavoring effect of AT-rich sequences, as observed systematically for inter-species nucleosome differences, and is also observed in other studies (Tillo and Hughes, 2009). Download figure Download PowerPoint Interestingly, local divergence of AT-rich sequences not only affected the positioning of the closest nucleosomes but also

Referência(s)