Artigo Acesso aberto Revisado por pares

On the relation between promoter divergence and gene expression evolution

2008; Springer Nature; Volume: 4; Issue: 1 Linguagem: Inglês

10.1038/msb4100198

ISSN

1744-4292

Autores

Itay Tirosh, Adina Weinberger, Dana Bezalel, Mark Kaganovich, Naama Barkai,

Tópico(s)

RNA Research and Splicing

Resumo

Article15 January 2008Open Access On the relation between promoter divergence and gene expression evolution Itay Tirosh Itay Tirosh Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Adina Weinberger Adina Weinberger Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Dana Bezalel Dana Bezalel Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Mark Kaganovich Mark Kaganovich Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Naama Barkai Corresponding Author Naama Barkai Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Itay Tirosh Itay Tirosh Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Adina Weinberger Adina Weinberger Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Dana Bezalel Dana Bezalel Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Mark Kaganovich Mark Kaganovich Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Naama Barkai Corresponding Author Naama Barkai Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel Search for more papers by this author Author Information Itay Tirosh1, Adina Weinberger1, Dana Bezalel1, Mark Kaganovich1 and Naama Barkai 1,2 1Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel 2Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel *Corresponding author. Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel. Tel.: +972 8 934 4429; Fax: +972 8 934 4108; E-mail: [email protected] Molecular Systems Biology (2008)4:159https://doi.org/10.1038/msb4100198 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Recent studies have characterized significant differences in the cis-regulatory sequences of related organisms, but the impact of these differences on gene expression remains largely unexplored. Here, we show that most previously identified differences in transcription factor (TF)-binding sequences of yeasts and mammals have no detectable effect on gene expression, suggesting that compensatory mechanisms allow promoters to rapidly evolve while maintaining a stabilized expression pattern. To examine the impact of changes in cis-regulatory elements in a more controlled setting, we compared the genes induced during mating of three yeast species. This response is governed by a single TF (STE12), and variations in its predicted binding sites can indeed account for about half of the observed expression differences. The remaining unexplained differences are correlated with the increased divergence of the sequences that flank the binding sites and an apparent modulation of chromatin structure. Our analysis emphasizes the flexibility of promoter structure, and highlights the interplay between specific binding sites and general chromatin structure in the control of gene expression. Synopsis It is widely accepted that phenotypic differences between closely related species often arise from differences in gene expression, but the principles of evolution of gene expression are largely unknown. Recent studies have used two complementary approaches to characterize the process of gene expression evolution. First, the genome-wide expression program of related species was measured using microarray technology. Second, instances of loss (or gain) of cis-regulatory elements were characterized by analyzing the promoter sequences of orthologous genes in related species. In this work, we wished to understand the connection between these two approaches. Specifically, we asked to what extent the loss of an apparently functional binding site in gene promoter can predict a change in gene expression. As a first step, we used publicly available data. Surprisingly, we found that genes predicted to have lost an apparently functional binding site from their gene promoter still maintained their expression pattern (Figure 1). This was found for both yeast and mammals, and also when using different expression data sets (e.g. tissue-specific expression in human versus chimpanzee or in human versus mouse). We reasoned that the apparent lack of influence of binding-site divergence on gene expression could result from interactions between the multiple regulators that regulate the expression of typical genes. To examine this result in a more controlled setting, we focused on the yeast mating response, which is governed by a single transcription factor (TF) (STE12), and measured the transcription response to pheromone in four closely related yeast species. This analysis identified ∼400 genes that are differentially expressed between the species. Since the promoter sequence to which STE12 binds is well characterized, we were able to characterize also the instances in which this sequence was lost from a gene promoter in only one of the species. Notably, in this case, changes in STE12-binding sequences could explain ∼50% of the expression differences between the species (Figure 4). This is a much larger fraction compared to the more complicated scenario described before, but is still far from being complete. To try and understand the origin of the remaining fraction of gene expression differences that were not explained by the divergence of STE12-binding sites, we analyzed promoter regions flanking the binding sites. Interestingly, we found that these flanking sequences are highly diverged among genes with conserved STE12-binding sites but diverged expression patterns. These flanking sequences could thus influence the chromatin structure and accessibility of STE12-binding sites. Using a computational model that predicts nucleosome positions from the underlying DNA sequence, we showed that divergence of these flanking sequences could indeed alter the accessibility of several STE12-binding sites and thus generate the observed expression changes. Taken together, our results suggest a complex interplay between the evolutionary divergence of cis-regulatory elements in gene promoters and the divergence of the associated gene expression. First, only rarely does the loss of an apparently functional binding site leads to the loss of gene expression. Promoters appear to be highly flexible and can tolerate such changes, perhaps through interactions with adjacent binding sites to other TFs. Second, expression can change even when the binding sites are conserved. This may reflect other changes in promoter sequence, and is likely to be mediated by changes in chromatin structure. This and other studies thus emphasize the influence of chromatin structure in driving gene expression evolution, but further studies are required to more rigorously describe its role. Introduction The unique phenotype of each organism is defined by a combination of its gene content and the regulation of these genes. Evolution of protein sequence and its contribution to phenotypic adaptation has been studied extensively (Pal et al, 2006), while most of what we know about regulatory evolution comes from the study of individual genes (Wray, 2007). Regulatory evolution reflects changes in genomic sequences that influence (either directly or indirectly) gene expression. Among the multiple mechanisms that control gene expression, the binding of transcription factors (TFs) to sequence-specific binding sites within the upstream promoters is arguably the best-characterized regulatory scheme. The short lengths of TF-binding sites, and their sensitivity to even a small number of mutations, make them ideal candidates for driving gene expression divergence (ED) in cis. Recent studies have begun to compare the promoters of related organisms in search for such regulatory differences. These analyses are hindered by the difficulty to distinguish differences in functional (e.g. TF-binding sites) from non-functional promoter elements. Nonetheless, leveraging on prior knowledge of TF-binding motifs, several studies predicted the gain and loss of thousands of TF-binding sites both in yeast and mammalian species (Donaldson and Gottgens, 2006; Doniger and Fay, 2007). Furthermore, two very recent studies have used chromatin immunoprecipitation to directly identify differences in TF binding among related species (Borneman et al, 2007; Odom et al, 2007). While these studies rely on the premise that observed differences in TF-binding sequences represent gene ED and ultimately phenotypic evolution, this was not directly examined. In fact, various evidences have indicated that extensive promoter divergence, including differences in TF-binding sequences, may evolve neutrally, with no influence on gene expression. First, several studies found that promoters from different species which have extensively diverged still drive the same reporter expression patterns (Ludwig et al, 1998; Takahashi et al, 1999; Romano and Wray, 2003; Ruvinsky and Ruvkun, 2003; Oda-Ishii et al, 2005; Fisher et al, 2006; Wang et al, 2007). Second, it was shown that changes in TF-binding sequences are poorly correlated with divergence of gene expression among yeast paralogs (Zhang et al, 2004). Third, Doniger and Fay (2007) introduced mutations in binding sequences that were found in Saccharomyces paradoxus and S. mikatae to the orthologous promoters in S. cerevisiae and examined their effect on gene expression using reporter assays. The expected effects on gene expression were found only in 3 out of the 11 cases that were examined. Finally, a comprehensive analysis of ∼1% of the human genome (The ENCODE Project Consortium, 2007) has shown that a large percentage of the functional regulatory elements are not conserved among mammals, suggesting the neutral evolution of these sequences. Thus, it appears that promoters are highly flexible, and are capable of maintaining stabilized gene expression pattern through many realizations of sequences, even when binding motifs are concerned. Here, we examine the impact of changes in TF-binding sequences on the associated gene expression on a genome-wide scale using comparative expression data sets of related organisms (Ranz et al, 2003; Rifkin et al, 2003; Su et al, 2004; Khaitovich et al, 2005; Gilad et al, 2006; Tirosh et al, 2006) combined with comparative data sets of predicted TF-binding sites. We find that most predicted changes in TF-binding sequences, in both yeast and mammals, have only little effect on gene expression. To examine the connection between changes in TF-binding sequences and ED in a more controlled setting, we measured the transcription response of three closely related yeast species to mating pheromone. Analysis of this response allows us to assess the relative contribution of specific cis-regulatory elements and of general promoter structure to the divergence of gene expression. Results Reported changes in TF-binding sequences have only little detectable impact on gene expression A recent study has characterized thousands of matches to binding site motifs that were conserved in the promoters of both chimpanzee and mouse (and are thus likely to be functional in both organisms) but are mutated, and do not match the binding site motif, in human (Donaldson and Gottgens, 2006). To examine the impact of these mutations on gene expression, we assembled two data sets comparing gene expression between human and either chimpanzee or mouse. The first data set compares the expression levels of human and chimpanzee genes across five tissues (Khaitovich et al, 2005) and the second data set compares the expression patterns of human and mouse genes across 30 orthologous tissues (Su et al, 2004). In both cases, we found that the ED of genes with diverged sequence motifs in their proximal promoters (1 kb) was indistinguishable from the ED of genes with conserved motifs (Supplementary Figure 1). Figure 1.Expression divergence of yeast genes with diverged TF sequence motifs. (A) The percentage of genes with conserved, intermediate or diverged expression among those with conserved or diverged motifs as predicted by Doniger and Fay (2007) and by a similar analysis (see Materials and methods and Supplementary Figure 2). The difference between any pair of the three sets is not statistically significant (P>0.05). (B) Average expression divergence for genes with conserved or diverged motifs for various TFs. Some stress-related TFs (e.g. GCN4, DAL82) have relatively high ED of genes with diverged motifs, but in none of these cases it is significantly higher than the respective ED of genes with conserved motifs. (C) Percentage of S. cerevisiae-bound promoters at two different binding P-values (Harbison et al, 2004) among promoters with different patterns of motif conservation and divergence. The difference between each pair of different patterns is significant (P<0.05). (D) Expression divergence between human and mouse liver cells of genes with conserved, diverged or no binding by four liver-related TFs. Download figure Download PowerPoint Analysis of mammalian regulatory divergence is hindered by several limitations, including the inherent complexity of mammalian promoters, the presence of multiple (often far away) enhancers, and the poor knowledge of TFs and their binding specificities. To try and circumvent these problems, we turned to yeast, whose promoters are significantly shorter (∼600 bp) and well defined. Doniger and Fay (2007) have recently analyzed the conservation of sequence motifs in promoters of closely related yeast species and predicted the loss of TF-binding sites. We examined the impact of predicted changes in TF-binding sites on ED using a comparative data set that we have recently reported, where we examined the genome-wide expression programs of the same species to several environmental stresses (Tirosh et al, 2006). Also here, genes predicted to loose TF-binding sites (i.e. whose promoters contained diverged sequence motifs) had the same level of ED as genes with conserved sequence motifs (Figure 1A). To further verify these results, we have also analyzed the promoters of these species and generated another set of predicted changes in TF-binding sites (see Materials and methods). Notably, both our analysis and that of Doniger and Fay (2007) predicted the loss of a TF-binding site only if there were no other matches to that sequence motif at the same promoter, thus effectively removing cases of 'binding site turnover' (Dermitzakis and Clark, 2002). However, also for these predictions we observed similar levels of ED (Figure 1A; see Supplementary Figure 2 for results with different parameters). Since the expression data are taken from stress conditions, it could be expected that only changes in sequence motifs for TFs that participate in the stress response will have an impact in this data set. We thus separately analyzed the impact of changes in sequence motifs for each TF (Figure 1B). Although changes in sequence motifs for stress-related TFs were, in general, associated with higher ED than other TFs, none of these TFs were significantly associated with high ED. One possible explanation for the lack of correlation between the divergence of sequence motifs and that of gene expression is that the reported sequence variations do not affect the binding of the respective TFs to promoters. To explore this possibility, we examined the binding of multiple TFs, with conserved or diverged motifs, to S. cerevisiae promoters (Harbison et al, 2004) (Figure 1C). We found that S. cerevisiae promoters with diverged motifs are bound by the respective TF less often than promoters with conserved motifs but more often than promoters without sequence motifs for that TF in any of the species (Figure 1C). Thus, in certain cases, TFs retain their binding to promoters despite divergence of the respective sequence motifs, although this may also represent the differences between the promoter regions examined by sequence analysis and those experimentally tested. However, in other cases, promoters with diverged motifs are not bound by the respective TF, suggesting that TF binding has also diverged. Notably, also for these promoters, the percentage of genes with diverged expression is not higher than average (Supplementary Figure 2). Thus, despite the apparent loss of TF binding, gene expression remained conserved, perhaps through compensation by other regulatory elements. Since divergence of sequence motifs corresponds only partially to divergence of TF binding, interspecies differences in TF binding should be experimentally determined. The binding of four TFs (FOXA2, HNF1A, HNF4A and HNF6) to 4000 orthologous gene pairs in human and mouse liver cells was recently analyzed by chromatin immunoprecipitation (Odom et al, 2007) and extensive differences were identified between the binding of these TFs to orthologous genes. To examine the impact of these differences, we compared the expression levels of human and mouse liver cells (Xing et al, 2007). However, we found no connection between differences in TF binding and ED (Figure 1D). Genome-wide analysis of the mating response in three yeast species The difficulty to predict expression changes from analysis of sequence motifs could stem from the coordinated activity of multiple TFs that affect gene expression through combinatorial regulation. The binding of multiple TFs to multiple promoter elements may conceal the influence of specific differences, but at the same time provide more raw material for regulatory changes and is, in general, correlated with an increased ED (Tirosh et al, 2006; Landry et al, 2007). We thus sought to analyze a simpler situation where gene expression is defined primarily by a single TF. The yeast mating response appears suitable. This response is relatively isolated and is induced by the activation of the STE12 TF whose consensus binding motif, as well as target promoters, is well defined (Roberts et al, 2000; Zeitlinger et al, 2003; Bardwell, 2005). Mating in yeast occurs when two haploid cells of the opposite mating types (a and α) fuse and form a single diploid cell. Each cell secrets a unique pheromone (a-factor or α-factor, respectively) that is sensed by the other cell, triggering it to initiate the mating response. Mating involves extensive changes in the gene expression program (Roberts et al, 2000), with more than 200 upregulated genes and additional genes downregulated. Many of the upregulated genes have specific roles in mating and are regulated by STE12, while the downregulated ones are associated with cell-cycle arrest. The α-factor peptide secreted by S. cerevisiae has been isolated and synthesized in vitro. Currently, most studies of the mating response are performed by subjecting haploid yeast cells of the a-type to this synthetic α-factor (e.g. Roberts et al, 2000). Since the gene coding for the α-factor peptide is highly conserved within the sensu–stricto complex, we used the synthetic α-factor from S. cerevisiae to elicit the mating response in three closely related species: S. cerevisiae, S. paradoxus and S. mikatae. As expected, all species responded to this α-factor by arresting their cell cycle and growing a visible shmoo (Supplementary Figure 3). We measured the gene expression response to α-factor using microarrays containing complete-ORF probes for the ∼6000 S. cerevisiae genes. To control for technical variations, we performed biological repeats (three in S. cerevisiae and S. paradoxus and four in S. mikatae), with all experiments (for all species and repeats) executed in parallel. The sequence of S. paradoxus and S. mikatae genes is highly similar to those of S. cerevisiae (∼90 and ∼85% on average, respectively), and accordingly produced significant and reproducible hybridization. Notably, while absolute hybridization intensities are affected by sequence mismatches, our analysis is based solely on the ratios of hybridization intensities in samples taken with and without pheromone. Indeed, this cross-species hybridization platform was validated in both yeast and other organisms by us and others (Sartor et al, 2006; Tirosh et al, 2006; Oshlack et al, 2007). In each of the three species, α-factor induced significant (P<0.05) expression changes of more than 1000 genes. As expected, about 100 genes were upregulated by at least two-fold, and these genes were enriched with previously known mating-related genes (P<10−5 in each species). The response of each species was highly reproducible, with a genome-wide correlation of ∼0.9 among biological repeats (Figure 2). The correlations between the responses of the different species were significantly lower (r∼0.6–0.7), although clearly far from being random. Thus, the overall transcriptional response is conserved but also includes substantial species-specific differences. As additional controls, we compared our results with those of a previous study of the mating response in S. cerevisiae (Roberts et al, 2000) and with expression measurements of S. cerevisiae cells undergoing natural mating. As expected, both data sets had high correlations with the response of the three species to α-factor and especially with the response of S. cerevisiae (Supplementary Figure 4). Figure 2.Correlations between the mating expression program in different species. We isolated a-type cells from S. cerevisiae, S. paradoxus and S. mikatae, subjected them to S. cerevisiae α-factor and measured their genome-wide expression profiles using S. cerevisiae arrays. Each species was measured with three or four biological repeats. The correlations among these genomic responses were calculated over 3248 genes with a significant response in at least one experiment. Download figure Download PowerPoint We identified 408 genes that are differentially expressed between at least one pair of yeast species (see Materials and methods and Supplementary Table 1). Interestingly, these diverged genes had high ED also in the stress-related comparative data (Tirosh et al, 2006) (P=5 × 10−26), and were enriched with TATA-containing genes (35% compared with 22%, P=1.5 × 10−8). This suggests that genes vary in their tendency for ED, such that the same genes diverge in expression at different processes. Furthermore, this tendency may be related to the presence of TATA boxes, as we and others have previously suggested (Tirosh et al, 2006; Landry et al, 2007). In contrast, these diverged genes do not have higher then average protein sequence divergence (Wall et al, 2005) (P=0.46), suggesting a decoupling between evolution of protein sequence and expression. To better understand the pattern of differential expression, we classified the differentially expressed genes into 12 classes and separately analyzed the genes within each class (Figure 3). Each class corresponds to genes that are up- or downregulated only in a specific subset of the three species (Materials and methods). Interestingly, the largest class corresponds to 82 genes with an S. paradoxus-specific upregulation (Figure 3B), and the smallest class corresponds to 12 genes with the opposite pattern (Figure 3E; no upregulation only in S. paradoxus). Figure 3.Differential expression pattern in the mating response. Differentially expressed genes were classified into 12 patterns of up- or downregulation in a subset of the species (see Materials and methods). Each subfigure shows the log2 expression ratios of genes from two classes corresponding to up- and downregulation in a specific subset of species. (A) S. cerevisiae, (B) S. paradoxus, (C) S. mikatae, (D) S. paradoxus+S. mikatae, (E) S. cerevisiae+S. mikatae, (F) S. cerevisiae+S. paradoxus. Cer, par and mik indicate the columns corresponding to expression of S. cerevisiae, S. paradoxus and S. mikatae, respectively. The corresponding subset of species, number of genes within each class, enriched GO annotations and selected genes are indicated at the top of each subfigure. Red and green correspond to up- and downregulation, respectively. Download figure Download PowerPoint We analyzed the enrichment of gene ontology (GO) annotations and the presence of mating-related genes within each class (Figure 3). The most notable class was composed of genes that were upregulated in S. cerevisiae and S. paradoxus but not in S. mikatae (Figure 3F). This class is comprised of 29 genes and includes six mating-related genes (FIG2, AGA1 and PRM2,4,6 and FUS2). This class, as well as the entire set of differentially expressed genes, is also enriched with cell wall genes (P 0.5, yet other thresholds gave similar results. (B) Four examples of genes in which divergence of STE12 sequence motifs was associated with reduced response to α-factor. Conserved and mutated STE12 sequence motifs are shown beneath each gene; mutated positions are indicated by black and lowercase. (C) Four examples of genes in which the presence of STE12 sequence motifs is not correlated with the response to α-factor. Download figure Download PowerPoint A significant number of exceptions were still observed, however (Figure 4C). For example, RAM1 (β subunit of the farnesyltransferase complex which prenylates the a-factor) was upregulated in S. mikatae despite a mutation in the STE12 sequence motif, and was not upregulated in S. cerevisiae despite the conservation of its STE12 sequence motif. In yet other cases, differential expression was found despite the presence of conserved sequence motifs (e.g. YSY6). We next asked how much of the observed interspecies differential expression can be accounted for by differences in STE12 sequence motifs (Figure 5). Since STE12 controls the upregulation, but not downregulation, in response to α-factor, we examined the presence and divergence of STE12 sequence motifs in promoters of genes that are upregulated only in a subset of the three yeast species. We found that only 11% of this differential expression can be accounted for by divergence of STE12-binding sequences. If we restrict this analysis to genes with STE12 sequence motifs in at least one species, which are thus more likely to be directly regulated

Referência(s)