Artigo Acesso aberto Revisado por pares

Phosphorylation network rewiring by gene duplication

2011; Springer Nature; Volume: 7; Issue: 1 Linguagem: Inglês

10.1038/msb.2011.43

ISSN

1744-4292

Autores

Luca Freschi, Mathieu Courcelles, Pierre Thibault, Stephen W. Michnick, Christian R. Landry,

Tópico(s)

Genomics and Phylogenetic Studies

Resumo

Report5 July 2011Open Access Phosphorylation network rewiring by gene duplication Luca Freschi Luca Freschi Département de Biologie, Université Laval, Québec, Canada Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada PROTEO, The Quebec Research Network on Protein Function, Structure and Engineering, Université Laval, Québec, Canada Search for more papers by this author Mathieu Courcelles Mathieu Courcelles Département de Chimie, Institut de Recherche en Immunologie et Cancérologie (IRIC), Université de Montréal, Québec, Canada Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Pierre Thibault Pierre Thibault Département de Chimie, Institut de Recherche en Immunologie et Cancérologie (IRIC), Université de Montréal, Québec, Canada Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Stephen W Michnick Stephen W Michnick Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Christian R Landry Corresponding Author Christian R Landry Département de Biologie, Université Laval, Québec, Canada Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada PROTEO, The Quebec Research Network on Protein Function, Structure and Engineering, Université Laval, Québec, Canada Search for more papers by this author Luca Freschi Luca Freschi Département de Biologie, Université Laval, Québec, Canada Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada PROTEO, The Quebec Research Network on Protein Function, Structure and Engineering, Université Laval, Québec, Canada Search for more papers by this author Mathieu Courcelles Mathieu Courcelles Département de Chimie, Institut de Recherche en Immunologie et Cancérologie (IRIC), Université de Montréal, Québec, Canada Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Pierre Thibault Pierre Thibault Département de Chimie, Institut de Recherche en Immunologie et Cancérologie (IRIC), Université de Montréal, Québec, Canada Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Stephen W Michnick Stephen W Michnick Département de Biochimie, Université de Montréal, Montréal, Québec, Canada Search for more papers by this author Christian R Landry Corresponding Author Christian R Landry Département de Biologie, Université Laval, Québec, Canada Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada PROTEO, The Quebec Research Network on Protein Function, Structure and Engineering, Université Laval, Québec, Canada Search for more papers by this author Author Information Luca Freschi1,2,3, Mathieu Courcelles4,5, Pierre Thibault4,5, Stephen W Michnick5 and Christian R Landry 1,2,3 1Département de Biologie, Université Laval, Québec, Canada 2Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada 3PROTEO, The Quebec Research Network on Protein Function, Structure and Engineering, Université Laval, Québec, Canada 4Département de Chimie, Institut de Recherche en Immunologie et Cancérologie (IRIC), Université de Montréal, Québec, Canada 5Département de Biochimie, Université de Montréal, Montréal, Québec, Canada *Corresponding author. Département de Biologie, Institut de Biologie Integrative et des Systemes (IBIS), Universite Laval, 1030 Avenue de la Médecine, Québec, Canada G1V 0A6. Tel.: +1 418 656 3954; Fax: +1 418 656 7176; E-mail: [email protected] Molecular Systems Biology (2011)7:504https://doi.org/10.1038/msb.2011.43 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions Figures & Info Elucidating how complex regulatory networks have assembled during evolution requires a detailed understanding of the evolutionary dynamics that follow gene duplication events, including changes in post-translational modifications. We compared the phosphorylation profiles of paralogous proteins in the budding yeast Saccharomyces cerevisiae to that of a species that diverged from the budding yeast before the duplication of those genes. We found that 100 million years of post-duplication divergence are sufficient for the majority of phosphorylation sites to be lost or gained in one paralog or the other, with a strong bias toward losses. However, some losses may be partly compensated for by the evolution of other phosphosites, as paralogous proteins tend to preserve similar numbers of phosphosites over time. We also found that up to 50% of kinase–substrate relationships may have been rewired during this period. Our results suggest that after gene duplication, proteins tend to subfunctionalize at the level of post-translational regulation and that even when phosphosites are preserved, there is a turnover of the kinases that phosphorylate them. Introduction Genomes and organisms gain in complexity during evolution by gene duplication followed by the functional divergence of the duplicates (Hurles, 2004). Signaling and regulatory proteins are thought to have a particularly important role in the evolution of organismal complexity (Gough and Wong, 2010). We know very little about the early evolutionary steps that follow the duplication of regulatory proteins and of the substrates they regulate. Studies on short time scales and on well-characterized organisms are needed in order to estimate the contribution of the different evolutionary forces to the assembly of novel regulatory pathways and networks. Here, we address the evolution of phosphoregulatory networks by directly studying phosphoproteins and their associated protein kinases. Protein phosphorylation regulates several if not most of protein functions by affecting their stability, localization, activity and ability to interact (Moses and Landry, 2010). When maintained, paralogous proteins may diverge in function following two evolutionary paths, which are not mutually exclusive. First, one paralog may evolve new functions (neofunctionalization) (Conant and Wolfe, 2008). Second, degenerative mutations may accumulate in one or both paralogs leading to the loss of redundant functions (subfunctionalization) (Force et al, 1999; Lynch and Force, 2000). If we assume a model under which each phosphosite in a protein has a function (Holmberg et al, 2002), neofunctionalization would correspond to sites acquired after the duplication event and subfunctionalization to sites lost in one of the two paralogs. In the first case, new connections are created in the kinase–substrate network; in the second case, no new function has evolved and regulatory links are lost rather than created. We (Landry et al, 2009) and others (Lienhard, 2008) have recently suggested that a fraction of phosphorylation sites may have no specific functions and represent the result of kinase–substrate interactions that evolved neutrally or nearly neutrally. Accordingly, a fraction of the links that are created or lost after gene duplication in these networks would represent gains and losses of phosphosites without subfunctionalization or neofunctionalization of the proteins. In this study, we used the budding yeast Saccharomyces cerevisae phosphorylation network as a model. The lineage leading to the budding yeast underwent a whole-genome duplication (WGD) 100 million years (My) ago (Wolfe and Shields, 1997) that affected its signaling networks significantly: while only 10% of all genes (∼500 pairs) were maintained as duplicates, 30 and 33% of protein kinases and phosphatases have been retained as duplicates, respectively (Seoighe and Wolfe, 1999). Furthermore, phosphoproteins were significantly more likely to be retained as paralogs than nonphosphorylated proteins (Amoutzias et al, 2010). Finally, duplicated kinases and their regulatory proteins differ in sequence and functions (Musso et al, 2008) and many of them show accelerated amino acid changes after the WGD (Kellis et al, 2004). Using computational and experimental analyses, we examined the extent to which phosphosites diverged after gene duplication, we addressed whether there have been accelerated gains and losses of phosphosites among these phosphoproteins and whether kinase–substrate relationships have been modified since the WGD. Results and discussion Paralogous phosphoproteins substantially diverged after WGD Our data set consists of 2726 phosphosites (serines (S), 82%; threonines (T), 16%; tyrosines (Y), 2%) that belong to one or the other member of the 352 pairs of yeast WGD paralogs for which at least one of the two proteins is a phosphoprotein. In this work, we focused on S/T phosphosites as they make up 98% of all phosphosites. Among these sites, 2445 are unique to one paralog and 118 (that correspond to 236 phosphosites) occur at homologous positions, a number 7.4 times higher than expected by chance (P≪0.001; Supplementary Figure S1). Phosphosites diverge in two ways. First are cases where a S/T residue is phosphorylated in a protein and a residue that cannot be phosphorylated occupies the homologous position in its paralog (site-divergence). Site-divergence accounts for 69% of the sites that are unique to one paralog. Second, a S/T is phosphorylated in one paralog and its homologous position is conserved (S/T) but not observed to be phosphorylated (state-divergence). Eighty-six percent of homologous sites that are phosphorylated are in fact state-diverged. This measure of state-divergence is strongly upwardly biased by false-negative (FN) and false-positive identifications and also by the fact that phosphopeptides that match more than one protein are not included in this data set. We considered these issues by comparing the cross-study conservation with the cross-study reproducibility. We found that state-conservation between paralogs is around 36% for filtered peptides (considering only phosphopeptides that match a single position in the proteome) and 54% for unfiltered peptides (considering all phosphopeptides) (Figure 1A). Protein sequence, function, localization and/or recognition by protein kinases have diverged to such extent in 100 My that only 36–54% of their post-translational regulation by phosphorylation appears to be conserved despite a conservation of the actual residues. Figure 1.Conservation and divergence of phosphoregulation among WGD paralogs. (A) The state-conservation of paralogous proteins was estimated as a regression of the cross-study conservation on the cross-study reproducibility. A 1:1 relationship is expected if all phosphosites were state-conserved. Deviation from this 1:1 relationship provides an estimate of state-divergence. Filtered data: phosphopeptides that match a single protein; unfiltered data: all phosphopeptides. (B) Positive correlation in the number of phosphosites of WGD paralogous proteins. Red dots indicate average numbers in binned data and green dots the actual data. Green intensities indicate the number of points at these positions. (C) Proportion of paralogous pairs with significant conservation as a function of the window size considered. A site is considered as being conserved if there is a phosphorylated site in the other paralog within the window (excluding the exact position). (D) Case of putative local compensation. The fraction of conserved sites as a function of window size is shown. Blue: observed value; Grey: 95th quantile (100 permutations); Red: average of the expected distribution. (E) Fraction of paralogous phosphosites or phosphoproteins assigned to the same protein kinase. Assignments are based on PWMs from Mok et al (2010). The observed fraction is calculated using these assignments while the expected fraction is estimated after shuffling the assigned kinases among the pairs of paralogous sites. Ptacek: large-scale in vitro kinase–substrate interactions on microarrays (Ptacek et al, 2005). Ubersax: in vitro Cdc28–substrate interactions (Ubersax et al, 2003). (F) Distributions of the PWM scores for different classes of sites. Download figure Download PowerPoint Conservation and compensation of phosphosite loss by site-position turnover Surprisingly, despite the low level of site-conservation between paralogous proteins, there is a highly significant correlation in the number of phosphorylation sites between paralogs (ρ=0.35, P-value<2.2 × 10−16; Figure 1B). This correlation remains significant when the number of phosphosites is normalized by protein length (ρ=0.32 P-value <6.9 × 10−14) or the length of disordered regions (ρ=0.27 P-value<3.8 × 10−10), which both tend to be preserved between paralogs. The correlation is also significant when only site-diverged phosphosites are considered (ρ=0.28, P-value=2.0 × 10−11). This correlation suggests that stabilizing selection is acting to maintain the overall number of phosphosites. This result is in agreement with a recent study (Beltrao et al, 2009) reporting that the phosphorylation levels of orthologous protein complexes or pathways between Candida albicans and Saccharomyces cerevisiae tend to be conserved. The turnover of phosphosite position over time could be made possible by the fact that sites that appear at a position nearby a site that is lost can compensate for the loss (Serber and Ferrell, 2007), particularly when the charge of a region rather than that of a specific residue is important. The redundancy in the position of phosphosites has been previously proposed to explain the weak site-conservation among species (Landry et al, 2009), but so far there has been limited evidence for this (Ba and Moses, 2010; Moses and Landry, 2010). If this local turnover model is responsible for the overall conservation of the number of phosphosites, the proportion of conservation between paralogs should increase significantly if we consider regions of proteins rather than actual positions. We found that to be the case for a significant but limited number of paralogous pairs. We reconsidered the proportion of state-conserved sites as the proportion of sites in a protein that have a phosphosite in the homologous region of a given window size in its paralog. We first found that the window size that maximizes the signal is about 33 amino acids in length (Figure 1C). Then, we found that among the 167 pairs of paralogous proteins where both paralogs have at least one phosphosite, 11 of them (6.6%) showed a significant level of conservation at that window length (an example is shown in Figure 1D). This result may suggest either that compensation by nearby sites is relatively uncommon and is specific to some types of proteins, or that the relatively limited coverage of the yeast phosphoproteome leaves us with limited power to detect significant compensation. Another possibility is that such compensation takes place only in highly phosphorylated proteins. Indeed, we found that paralogous pairs for which there is significant functional compensation have significantly more phosphosites (mean: 9.28 versus 3.87; Wilcoxon test: P-value<9.5 × 10−11) and also tend to contain a larger proportion of disordered residues (mean: 53 versus 42%, P=0.01) when compared with all pairs. Life after WGD: rewiring the cellular regulatory networks Phosphosites are phosphorylated by a variety of kinases that recognize specific motifs surrounding the phosphorylatable residue. As for many eukaryotes, around 2% (120 total) of yeast protein-coding genes code for protein kinases (Zhu et al, 2000). We examined the conservation of the relationships between our set of phosphosites and yeast kinases by assigning each phosphosite to a kinase using empirically derived position weight matrices (PWM) for 61 yeast kinases (Supplementary Data set S1 from Mok et al, 2010). We first found that WGD paralogs are generally not biased in terms of the protein kinases that regulate them (ρ=0.99, P-value<2 × 10−16; Supplementary Figure S2). Second, we found that state-conserved sites are assigned to the same kinase 44% of the time, a 20-fold increase over what is expected if phosphosites were randomly matched between paralogs (P-value 0.02 and not being acetylated at the amino or carboxy terminus; then for all data sets we selected all the peptides that matched one exact hit on S. cerevisiae proteins using Blat searches (Kent, 2002). Peptides that matched more than one protein were eliminated because they could not be assigned unambiguously to a single protein. We used this data to assemble a first data set (Dataset 1). Thus, we compiled another data set using the same data about the phosphosites, but this time we did not apply the filtering step with Blat (Dataset 2). Finally, we compiled a third data set of manually curated phosphosites that have been shown to be phosphorylated in small-scale experiments and whose function has been determined (Ba and Moses, 2010) (Dataset 3). The compiled data and all the other data described below are available at: http://www.bio.ulaval.ca/landrylab/download/. We estimated the state-divergence of phosphosites between paralogous proteins by comparing cross-study conservation and reproducibility. Our data set comes from eight distinct studies, so there are 28 possible pairwise comparisons. We only considered sites that were S/T in both paralogs. For each pair of studies we considered two sets of concatenated paralogous proteins, para.1 and para.2. We counted the number of sites found in para.1 in study 1 and examined how many were also found in para.1 in study 2 (cross-study reproducibility) and para.2 in study 2 (cross-study conservation) (Supplementary Figure S7). We did the same comparison for these two studies between sites identified in para.2 of study 1 and also in para.2 of study 2 (cross-study reproducibility) and of para.1 of study 2 (cross-study conservation). Each pair of studies therefore yields two ratios of cross-study conservation/cross-study reproducibility and this ratio gives a measure of the extent of conservation between paralogs while taking into account the reproducibility of the two studies. A regression of the cross-study conservation on the cross-study reproducibility provides a rough estimate of the state-conservation between paralogs while taking reproducibility into account (Figure 1A). Local phosphosite turnover was tested as follows. We took all the pairs of WGD phosphoproteins where both paralogs had one or more phosphosites. For each phosphosite present in the first paralog, we examined a window of length l centered on the site, thus defining a range of positions along the sequence. Excluding all state-conserved sites (at the exact same position), we counted all the phosphosites present in the aligned second paralog inside the corresponding range of positions within the window. A site was conserved if for a given phosphosite in the first paralog there was at least one phosphosite in the second paralog inside the range of positions. We then determined the ratio of conserved sites over all sites for each window size. The random expectation was estimated using 100 randomizations of phosphosites as described below. The PWM used for the prediction of the protein kinases associated with each of the phosphosites were derived empirically by Mok et al (2010) through in vitro peptide screening using 61 of the 122 kinases from S. cerevisiae. While these data are incomplete, it is the best currently available as it relies on empirically derived consensus motifs rather than completely in silico predictions. In order to assign all of the phosphosites to their most likely corresponding kinases, we extracted all of the 15-mers of the yeast proteome that correspond to the phosphosite and their 14 flanking (±7) residues. All phosphosites were then scored by summing the logarithm of the values present in each kinase PWM matrix corresponding to each of the amino acids of the 15-mer. We then assigned a protein kinase to a particular site based on the highest score for that site (Supplementary Figure S8). Data on kinase–substrate interactions were obtained from Ptacek et al (2005) and Ubersax et al (2003). In the first case, the data represent microarray interactions between 87 different kinases and >4000 potential substrates. We estimated the fraction of paralogs that were phosphorylated by the same kinase, considering only paralogs that were both phosphorylated by at least one kinase. The second data comes from an in vitro experiment testing for interactions between Cdc28 and the yeast proteome. We calculated the number of times both paralogs were phosphorylated by the kinase among all cases where at least one of the two was phosphorylated. Gains and losses of phosphosites were inferred as described in Figure 2A. We estimated the expected numbers of gains and losses by randomly sampling S/T sites. We divided the phosphosites in the four classes according to the type of the residue (S or T) and the typ

Referência(s)