Artigo Acesso aberto Revisado por pares

Physical and Functional Modularity of the Protein Network in Yeast

2003; Elsevier BV; Volume: 2; Issue: 5 Linguagem: Inglês

10.1074/mcp.m300005-mcp200

ISSN

1535-9484

Autores

Thomas Wilhelm, Heinz‐Peter Nasheuer, Sui Huang,

Tópico(s)

Protein Structure and Dynamics

Resumo

While protein-protein interactions have been studied largely as a network graph without physicality, here we analyze two protein complex data sets of Saccharomyces cerevisiae to relate physical and functional modularity to the network topology. We study for the first time the number of different protein complexes as a function of the protein complex size and find that it follows an exponential decay with a characteristic number of about 7. This reflects the dynamics of complex formation and dissociation in the cell. The analysis of the protein usage by complexes shows an extensive sharing of subunits that is due to the particular organization of the proteome into physical complexes and functional modules. This promiscuity accounts for the high clustering in the protein network graph. Our results underscore the need to include the information contained in observed protein complexes into protein network analyses. While protein-protein interactions have been studied largely as a network graph without physicality, here we analyze two protein complex data sets of Saccharomyces cerevisiae to relate physical and functional modularity to the network topology. We study for the first time the number of different protein complexes as a function of the protein complex size and find that it follows an exponential decay with a characteristic number of about 7. This reflects the dynamics of complex formation and dissociation in the cell. The analysis of the protein usage by complexes shows an extensive sharing of subunits that is due to the particular organization of the proteome into physical complexes and functional modules. This promiscuity accounts for the high clustering in the protein network graph. Our results underscore the need to include the information contained in observed protein complexes into protein network analyses. Metabolic and signaling functions as well as global cell behavior arise from the collective action of proteins that engage in physical interactions. Thus, a first step in the functional characterization of the proteome is the identification of protein-protein interactions. This has most exhaustively been achieved for the budding yeast (Saccharomyces cerevisiae) proteome, resulting in large lists of interaction pairs (1von Mering C. Krause R. Snel B. Cornell M. Oliver S.G. Fields S. Bork P. Comparative assessment of large-scale data sets of protein-protein interactions.Nature. 2002; 417: 399-403Google Scholar, 2Deane C.M. Salwiński L. Xenarios I. Eisenberg D. Protein interactions. Two methods for assessment of the reliability of high throughput observations.Mol. Cell. Proteomics. 2002; 1: 349-356Google Scholar). This information has allowed the reconstruction of a crude map of the protein interaction network (Fig. 1A). Although such network maps are still devoid of any information on dynamics that would allow the simulation of cell behavior (3Huang S. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation.Pharmacogenomics. 2001; 2: 203-222Google Scholar, 4Smith A.E. Slepchenko B.M. Schaff J.C. Loew L.M. Macara I.G. Systems analysis of ran transport.Science. 2002; 295: 488-491Google Scholar), they have paved the way to the study of global topological properties of molecular networks that have shed light on basic evolutionary and organizational principles (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar, 6Ravasz E. Somera A.L. Mongru D.A. Oltvai Z.N. Barabási A.-L. Hierarchical organization of modularity in metabolic networks.Science. 2002; 297: 1551-1555Google Scholar, 7Wagner A. Fell D. The small world inside large metabolic networks.Proc. Roy. Soc. London, B. 2001; 268: 1803-1810Google Scholar). For instance, using yeast two-hybrid data it has been suggested that the connectivity distribution P(k), i.e. the probability that a protein interacts with k other proteins, follows a power law and therefore belong to the topology class of scale-free networks (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar). These topology studies relied on the abstract network graphs that were constructed with individual pairs of interactions identified separately, and as such do not represent a physical entity. They do not consider the fact that many protein-protein interactions in the cell take place in dynamic, multi-protein complexes (Fig. 1B). Thus, when placing the topology of the protein interaction graph into a physical context, questions automatically arise as to whether a highly connected protein (a “hub” in the scale-free network model), would simultaneously interact with all of its partners as denoted in the network graph and, in doing so, form one stable, observable protein complex. Furthermore, one would also like to know how the protein complexes (physical modules) relate to the clusters of highly connected nodes in the network graphs (topological modules). The recent systematic survey of stable protein complexes using high-throughput mass spectrometry of purified tagged yeast proteins now allows us to examine generic aspects of the large-scale properties of complex-mediated networks and to address this type of questions (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar, 9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar). The data sets are denoted according to the authors as i) the tandem affinity purification (TAP) 1The abbreviations used are: TAPtandem affinity purificationHMS-PCIhigh-throughput mass-spectrometric protein complex identificationPNprotein-protein interaction networkCNprotein complex interaction network. (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar) and ii) the high-throughput mass-spectrometric protein complex identification (HMS-PCI) (9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar) data sets. Their accuracy has been compared with that of other methods of protein-protein interaction detection (1von Mering C. Krause R. Snel B. Cornell M. Oliver S.G. Fields S. Bork P. Comparative assessment of large-scale data sets of protein-protein interactions.Nature. 2002; 417: 399-403Google Scholar). They have also been used to validate and complement existing yeast interaction datasets and to infer the function of individual proteins (10Bader G.D. Hogue W.V. Analyzing yeast protein-protein interaction data obtained from different sources.Nat. Biotechnol. 2002; 20: 991-997Google Scholar). Thus, while the protein complex data has been used to improve functional annotation of individual proteins, the additional, generic information in these datasets, notably the complex size distribution and the pattern of usage of subunits in various complexes, has not been studied explicitly. Here we investigate these two generic aspects of the population of yeast protein complexes, identify some characteristic features, and propose models to explain them. tandem affinity purification high-throughput mass-spectrometric protein complex identification protein-protein interaction network protein complex interaction network The number Ns of possible dissociations for a complex consisting of s proteins is Ns = ∑i=1s/2 (is) − (s/2s)/2 if s is even and Ns = ∑i=1(s−1)/2 (is) if s is odd. It follows the simple exact result Ns = 2Ns−1 + 1 = 2s−1 − 1. If all possible dissociations of a given complex occur on average with equal probability, it follows the exponential decay of the average lifetime 〈τs〉 of a complex with s proteins: 〈τs〉 ∝ Ns−1. If the number Ss of all complexes of size s, Ss = ∑insxs,i (ns = n0exp(−as) is the number of different complexes of size s (Fig. 2), and xs,i is the number of complexes of species i (consists of exactly the same type of s proteins)), is proportional to τs, it follows by using the above equations for Ns and 〈τs〉: Ss/Ss−1 ∑ 0.5. If for each complex size xs,i is normally distributed around 〈xs,i〉, it follows by using the equations for Ss and ns: 〈xs,i〉 exp(−as)/(〈xs−1,i〉 exp(−a(s −1))) ∑ 0.5 and therefore with the experimentally determined characteristic number of different protein complexes a = acompl.size (Fig. 2): 〈xs,i〉 ≈ 0.6〈xs−1,i〉. This finding suggests that the mean number of one type of protein complexes of a given size decreases by 40% if the size is increased by one. A “small PN” variant counts only the interactions A-B where protein A as tagged bait catches protein B and protein B as bait catches protein A in the mass spectroscopy protein complex analysis. In doing so, one obtains PNs with only 193 proteins and 191 interactions for TAP and 99 proteins and 67 interactions for HMS-PCI. A less stringent “medium PN” variant counts all interactions between the tagged bait protein and all of the members of the complex it pulls down as interaction partners. This leads to 1,365 proteins and 3,230 interactions for TAP and 1,544 proteins and 3,481 interactions for HMS-PCI. The “large PN” definition assumes that all proteins participating in a complex interact with each other (directly or indirectly). The intersection between TAP and HMS-PCI data sets yields 0 interactions for the “small” definition, 217 proteins and 191 interactions for the “medium” definition, and 452 proteins and 1,773 interactions for the “large” definition. The set union yields 279 proteins and 258 interactions for the “small,” 2,280 proteins and 6,520 interactions for the “medium,” and 2,280 proteins and 52,334 interactions for the “large” definition. In this network graph, one node represents a whole protein complex, and two complexes are connected if both contain one (or more) same protein, as proposed (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar). i) Connectance C = 2(number of actual links in the network)/(n(n −1)), n is the number of nodes (in the PN, for instance, n is the protein number); ii) diameter D of the largest cluster: D is the number of links in the shortest path between two nodes, averaged over all pairs of nodes; iii) clustering index cc = ∑ici/n with ci = 2ki/(ki(ki − 1)) (ki is the number of connections between the ni neighbors of node i) (11Watts D.J. Strogatz S.H. Collective dynamics of small-world networks.Nature. 1998; 393: 440-442Google Scholar). Small-world networks are highly clustered (like regular networks), but have nevertheless a small network diameter (like random networks) (11Watts D.J. Strogatz S.H. Collective dynamics of small-world networks.Nature. 1998; 393: 440-442Google Scholar). Both requirements are fulfilled by the PN and CN as can be seen in Tables I and II.Table IStatistics of experimental and simulated networksMeasureProtein networkComplex networkPNPN(LCL)“Null”“One”CNCN(LCL)“Null”“One”TAP Nodesn1,3651,2501,3601,360455412455455 Interactions I19,99519,81534,00030,0004,3124,3065,2003,700 Connectance C0.020.030.040.030.040.050.050.04 Clustering cc0.730.740.490.540.520.570.300.34 Diameter D2.852.63 Longest path77HMS-PCI Nodes n1,5441,5011,5401,530487469487487 Interactions I34,11234,07629,00026,0007,3687,3684,8003,300 Connectance C0.030.030.020.020.060.070.040.03 Clustering cc0.700.710.540.580.540.560.280.31 Diameter D2.572.34 Longest path66 Open table in a new tab Table IIStatistics of protein interaction networks (PN)MeasureTAPHMS-PCIOther data sets“Small”“Medium”“Small”“Medium”Y2HDIPTPNodes n193(15)1,365(1,250)99(7)1,544(1,501)1,8701,788434Interactions I191(38)3,230(3,150)67(7)3,481(3,456)2,2403,003868Connectance C0.01(0.36)0.003(0.004)0.01(0.33)0.003(0.003)0.0010.0020.009Clustering cc0.248(0.66)0.216(0.233)0.071(0)0.048(0.049)0.0680.1880.054Diameter D(1.94)(4.93)(1.81)(4.41)Longest path(4)(12)(3)(11)Stretch parameter b0.780.480.650.340.340.530.55 Open table in a new tab The diameters D of the PN and CN should be compared with that of the corresponding random and regular matrices. A corresponding random matrix is defined as a constructed network, such that it has the same connectance C as the experimental network, but all links between any two nodes are set by chance: each individual link has a probability of 1/C. Similarly, a corresponding regular matrix is a constructed network that also has the same connectance as the experimental network, but all Cn links of each node are drawn just to its nearest neighbors, such that a ring-like network structure with a high clustering index appears (11Watts D.J. Strogatz S.H. Collective dynamics of small-world networks.Nature. 1998; 393: 440-442Google Scholar). Then, for the experimental PNs (according to the “large” definition) the corresponding random matrix has the following values for the diameter D of the largest cluster: 2.50 (TAP) and 2.24 (HMS-PCI). The diameter for the corresponding regular matrix of PN has the values 24.3 (TAP) and 18.4 (HMS-PCI). Accordingly, for the “medium” definition of PNs one obtains the values 4.40 (TAP) and 4.82 (HMS-PCI) for the random matrices and 125.1 (TAP) and 164.5 (HMS-PCI) for the regular matrices. Note that for our “small” definition of PNs no large connected clusters appear. (For the very small connected subgraphs, we cannot reasonably attribute statistical features such as “small world.”) For the CN, which do not rely on any definition of protein interactions, one obtains the diameter values 2.26 (TAP) and 2.04 (HMS-PCI) for the corresponding random matrix and 10.4 (TAP) and 8.0 (HMS-PCI) for the corresponding regular matrix. All values are averages over 20 runs of simulations. The clustering index of random matrices equals their connectance C, whereas the clustering index of regular matrices is somewhat below one. The analysis of the TAP and HMS-PCI data shows that the number of different protein complexes decays exponentially with the size s of the protein complex f(s) ∝ exp(−as) (Fig. 2), for the TAP data astonishing exactly, whereas the HMS-PCI complexes follow this function up to s = 15 but have some more different large complexes (a is a constant). For TAP and HMS-PCI we find a characteristic number s* = 1/a = 7.3 and 6.4, respectively. Despite similar average number the HMS-PCI data exhibit more different larger complexes (step in the cumulative graph in Fig. 2, around s = 15). The exponential decay of the number of different protein complexes with size s may have implications on the underlying dynamics of complex formation and dissociation. We propose here a simple model that considers the observed “destabilizing effect” when a given complex grows by one subunit. With increasing size s of a complex (i.e. containing s proteins), it has Ns possible ways of dissociation, where Ns = 2s− 1 − 1 (for details, see “Experimental Procedures”). Assuming that Ss, the number of all complexes (of all species) of size s is proportional to the average life time, which in turn is inversely proportional to Ns, it follows that Ss/Ss− 1 = 0.5. Because Ss is related to the observed exponential decay f(s) with the characteristic number s*, we can estimate that the average number of complexes with a given composition decreases by 40% when the complex size increases by one additional subunit (see “Experimental Procedures”). It should now be possible to test this quantitative prediction experimentally in order to validate the physical interpretation of the complex size distribution. To map the physical protein complexes onto a protein-protein interaction graph that represents the topology of the protein network (PN), we have to extract the interaction information from the complex data. In contrast to the yeast-two hybrid data, where the elementary experimental finding is a pair that directly translates into a link in the interaction graph, the protein complex data allow various definitions of “interaction” to build a PN (see “Experimental Procedures”). However, the context of a given complex might enable inherently weak, direct physical interactions to take place, which would not be found in isolation or in other complexes, e.g. due to the presence of a scaffolding protein in that complex. For instance, while bait A might not be able to pull out protein B, bait C might pull out a complex that includes A and B that may or may not have direct physical contact. To embrace these scenarios of indirect and scaffold-protein-mediated interactions, we use here a “large PN” definition that counts “functional interactions” between all proteins participating in a complex (Fig. 1C), as it also was suggested (1von Mering C. Krause R. Snel B. Cornell M. Oliver S.G. Fields S. Bork P. Comparative assessment of large-scale data sets of protein-protein interactions.Nature. 2002; 417: 399-403Google Scholar). Our “medium PN” corresponds to the “spoke” model, while the “large PN” corresponds to the “matrix” model in a previous study (10Bader G.D. Hogue W.V. Analyzing yeast protein-protein interaction data obtained from different sources.Nat. Biotechnol. 2002; 20: 991-997Google Scholar). We used the latter, most encompassing PN definition for further analysis, because we are interested in the observed complexes as entities rather than in the interactions (Table I, Figs. 2 and 3). However, similar results with respect to the major network topology characteristics were obtained with the “small” and “medium PN” definition (Table II). The number of protein interactions I is an order of magnitude higher than that of networks determined by the combined two-hybrid experiments (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar) (I = 19,995 and 34,112 versus 2,240 interactions), although the number of proteins involved is smaller (1,365 and 1,544 versus 1,870 proteins) (cf.Tables I and II). Interestingly, we find that the distribution of connectivity k (number of interaction partners per protein) in the PN follows an exponential decay, i.e. the PN is not scale-free as reported for pairwise interaction data (Fig. 2). We obtain as the characteristic numbers of interactions per protein, k* = 30 for TAP and k* = 47 for HMS-PCI, thus the average characteristic connectivity of PN is 38. The larger number of interaction partners for the HMS-PCI data set is consistent with the finding that the HMS-PCI data set contains larger complexes as discussed above. Thus, the number of simultaneous (direct or indirect) physical interaction partners as defined by the coexistence in a protein complex (under a given culture condition) behaves differently from the number of interaction partners defined by isolated, pair-wise characterization, which appears to exhibit a power-law distribution (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar). To measure the extent of modular organization in the large PN graph, we calculated the clustering coefficient cc which quantifies for a given network the extent of formation of subnetworks (clusters) that are highly interconnected inter se. Column 2 of Table I shows that cc of PN is much higher than the clustering coefficient in corresponding random networks with the same connectance C = 2I/(n(n −1)), because ccrandom∑Crandom = CPN. This quantifies the high modular organization of the cellular protein interaction network. Because in the large PN definition all proteins in a complex are considered to interact with each other, the complex as a physical module will necessarily give rise to a maximally connected cluster (or clique) in the network graph. We thus asked whether the partitioning of the proteome into complexes of the observed size distribution alone explains the high clustering. To answer this question, we simulated the simplest model, called the “null model.” We generated 455 and 487 complexes with the exponential size distribution corresponding to the observed complex size distribution in the TAP and HMS-PCI data, respectively. “Proteins” were randomly drawn (without removing them) from a pool with ng proteins (ng is chosen to obtain the same number of “proteins” as in the experimental PN: ngTAP = 1,450, ngHMS-PCI = 1,700; a given protein can be assigned to more than one complex, but no two same proteins can occur in one complex). Then an interaction graph is extracted as defined above according to the “large PN” scheme, and the topology is analyzed. For TAP, this “null model” yields an exponential distribution for connectivity k that is similar to the observed one (Fig. 2), although the total number of interactions I in the simulation is higher than in the TAP data (Table I). In contrast, for HMS-PCI the null model yields a distribution of k that is clearly steeper than in the observed data, and, consistently, the number of interactions I in the data are higher than predicted by the model (Table I). This is in line with the notion that the HMS-PCI data contain more larger complexes than a pure exponential size distribution (as assumed for the “null” models), as e.g. shown by the TAP data, would allow. Interestingly, in both cases the simulated cc was significantly smaller than that in the corresponding experimental PN (ccnull = 0.49 versusccexp = 0.73 for TAP and ccnull = 0.54 versusccexp = 0.70 for HMS-PCI). Thus, the PN are strongly clustered, to an extent that cannot be accounted for by the physical arrangement of proteins into complexes that represent cliques in the interaction graph. In other words, the high cc value of the PN must be due to “higher-level” interactions between the physical complexes. Because the complexes detected by mass spectroscopy are by definition independent entities, such an apparent link between complexes in the network graphs must correspond to the sharing of the same protein by different complexes (Fig. 1B). We thus analyzed the topology of the “complex-complex interaction network” (CN). Fig. 3 shows that the connectivity distribution of the CN again follows an exponential decay. A comparison of columns 6 and 8 in Table I shows that the simulation gave rise to analogous results as for the PN: the null model yielded more interactions than observed for TAP and fewer for HMS-PCI. The clustering coefficients for both ccexp and ccnull are much higher than the cc values of the corresponding random networks (Table I). Again, as for the PN, in the CN the clustering coefficients of the experimentally determined networks, ccexp (0.52 and 0.54 for TAP and HMS-PCI, respectively), are still considerably higher than the simulated one, ccnull (0.30 and 0.28, respectively). The difference ccexp − ccnull is nearly the same for the two network types, PN and CN. The higher clustering in both the experimental PN and CN in comparison to the “null model” indicate that the latter does not fully determine the PN and CN topology. The finding that even at the higher-level of the CN the experimental cluster coefficient, ccexp, is considerably higher than the simulated one, ccnull, appears to point to a kind of “super-clustering.” In fact, protein complexes are not random aggregates of subunits but represent functional entities that perform specific cellular functions. Moreover, as recently suggested, complexes that perform similar cellular roles and belong to the same functional group (such as cell cycle, mRNA metabolism, transcription, etc.) extensively share proteins (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar, 9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar, 12Mewes H.W. Albermann K. Bähr M. Frishman D. Gleissner A. Hani J. Heumann K. Kleine K. Maierl A. Oliver S.G. Pfeiffer F. Zollner A. Overview of the yeast genome.Nature. 1997; 387: 7-65Google Scholar, 13Schwikowski B. Uetz P. Fields S. A network of protein-protein interactions in yeast.Nat. Biotechnol. 2000; 18: 1257-1261Google Scholar). Gavin et al. (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar) proposed 9 and Mewes et al. (12Mewes H.W. Albermann K. Bähr M. Frishman D. Gleissner A. Hani J. Heumann K. Kleine K. Maierl A. Oliver S.G. Pfeiffer F. Zollner A. Overview of the yeast genome.Nature. 1997; 387: 7-65Google Scholar) 11 of such functional groups. To account for the bias introduced by the sharing of proteins between functionally related complexes, we extended the “null model” to a “stage one model.” Herein, the pool of ng proteins (ng1TAP = 1,650, ng1HMS-PCI = 1,800) from which the complex subunits are drawn is now divided into gr functional groups of equal size. With a high probability pr we took the “proteins” for a given complex from the same group to capture the finding that a complex with a certain cellular function contains mostly proteins that have been assigned to the same functional category. The connectivity distribution of the corresponding PN extracted from the “stage one model” with gr = 10, pr = 0.9 is shown in Fig. 2 (crosses). In the case of the PN, when compared with the null model the new model only slightly changes the distribution of k by shifting the weight to the tail. In the case of the CN, the stage one model increases the decay of the exponential distribution of connectivity in CN as compared with the null model (Fig. 3) and strongly decreases I (Table I). This finding suggests that the increased promiscuity of complexes is not associated with an increase of new links between previously unconnected complexes, but instead results from the increase of number of links between already connected complexes, reflecting the sharing of multiple proteins. With gr = 10, pr = 0.9, the cluster coefficient for the stage one model, ccone, is only slightly (but significantly) higher than ccnull but still fails to produce the observed high value of ccexp of PN and CN. However, with higher values of pr and gr the clustering coefficient increases; for pr = 0.99 and gr = 100 the clustering coefficient reaches the experimental values, ccone ∑ccexp. Taking into account the combination of “functional” and spatial cellular compartmentalization, gr = 100 may not be an overestimation, because Ho et al. discriminated 34 functional and 15 spatial groups (9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar) and Schwikowski et al. discussed 42 functional and 9 spatial groups (13Schwikowski B. Uetz P. Fields S. A network of protein-protein interactions in yeast.Nat. Biotechnol. 2000; 18: 1257-1261Google Scholar). As mentioned above, in contrast to the CN, the PN depend on the assumed definition for protein interactions (see “Experimental Procedures”). Table II shows that the PNs corresponding to our “small” and “medium” definition do also belong to the class of small-world networks. The clustering coefficients cc of these PNs are more than one magnitude higher than that of the corresponding random networks cccrn. Note that the clustering coefficient of random networks equals the connectance of these networks: cccrn = C. However, it is remarkable that the TAP networks have a higher cc than the HMS-PCI networks. For the sake of comparison, we also add the analysis of three other protein interaction data sets: i) Y2H data for yeast protein interactions, as analyzed in (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar) (data is available at www.nd.edu/∼networks/cell), ii) the carefully curated yeast protein interaction data set discussed in Ref. 2Deane C.M. Salwiński L. Xenarios I. Eisenberg D. Protein interactions. Two methods for assessment of the reliability of high throughput observations.Mol. Cell. Proteomics. 2002; 1: 349-356Google Scholar (data is available at dip.doe-mbi.ucla.edu), and iii) protein interactions of the human signal transduction network of the TRANSPATH data base (data is available at www.transpath.de). Again, these PNs have much higher clustering coefficients than their corresponding random matrices (Table II). It has been claimed that protein networks are of the scale-free type (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar), i.e. the distribution of the number of connections per protein should follow a power-law. In contrast, we have shown that both the distributions for the PN (“large” definition, Fig. 2), and the CN (Fig. 3) clearly follow an exponential law p(k) ∝ exp(−ak). For all the networks analyzed in Table II, the corresponding distributions are between a power-law and a pure exponential law: all these connectivity distributions follow a stretched exponential distribution: p(k) ∝ exp(−akb), with b < 1. Using the protein complex data in yeast obtained with the TAP and HMS-PCI techniques (8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar, 9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar), we derived different protein-protein interaction networks. Our interaction definitions yielding the “medium” and “large” protein networks correspond to the recently published “spoke” and “matrix” model, respectively (10Bader G.D. Hogue W.V. Analyzing yeast protein-protein interaction data obtained from different sources.Nat. Biotechnol. 2002; 20: 991-997Google Scholar). We favor the “matrix” model, because each protein in a given complex interacts physically and/or functionally with each other protein in this complex. Accordingly, we studied more in detail our “large” protein interaction network. In agreement with other protein network studies (14Wagner A. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes.Mol. Biol. Evol. 2001; 18: 1283-1292Google Scholar, 15Snel B. Bork P. Huynen M.A. The identification of functional modules from the genomic association of genes.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 5890-5895Google Scholar), we also find the small-world property. As we have shown, this result does not depend on the assumed kind of definition for protein interactions. In contrast to others (5Jeong H. Mason S.P. Barabási A.-L. Oltvai Z.N. Lethality and centrality in protein networks.Nature. 2001; 411: 41-42Google Scholar, 14Wagner A. The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes.Mol. Biol. Evol. 2001; 18: 1283-1292Google Scholar, 15Snel B. Bork P. Huynen M.A. The identification of functional modules from the genomic association of genes.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 5890-5895Google Scholar), we cannot affirm that protein networks are of the scale-free type. We find that the distributions of the number of connections per protein clearly follow an exponential law, or a stretched exponential law. This may have implications for the evolution of protein networks because scale-free networks need some preferential attachment to arise; without preferential attachment exponential networks emerge (16Barabási A.-L. Albert R. Emergence of scaling in random networks.Science. 1999; 286: 509-512Google Scholar). The protein complex networks that do not rely on special definitions also show the exponential connectivity distribution. Our null model reveals that this is mainly due to the exponential distribution of the number of different protein complexes of a given size. However, in order to explain the high clustering, both in PNs and CNs, we had to expand the null model. Although the pr and gr values are somewhat arbitrary, as is the ontological classification of proteins into functional groups, the stage one model reveals an interesting property of protein complexes: The ingredient to be added to the minimal null model to reproduce the high clustering coefficients observed in the PN and the CN is the massive overlap of protein subunit usage by the complexes, caused by the use of highly similar combinations of proteins in complexes with similar cellular roles. This extensive promiscuity between complexes is what gives rise to high clustering coefficients in the network topology, and thus to the impression of modularity. We have shown that the statistical properties of the TAP and HMS-PCI data slightly differ in some details, especially the HMS-PCI data contain more large complexes. This may be due to the fact that the HMS-PCI data were obtained by overexpressing the tagged protein, which could have resulted in increased chance of pulling out weakly interacting proteins. Furthermore, in the TAP data the complexes are purified in a two-step procedure, in contrast to the one-step procedure used by HMS-PCI, which could also contribute to a finding of weaker interactions by HMS-PCI. Interestingly, the perhaps weaker interactions mainly occur in larger protein complexes. Knowledge-based analysis of the two data sets, TAP and HMS-PCI, showed that in HMS-PCI the bait often copurified complexes of independent origin, resulting in larger complexes. In contrast, TAP yielded single complexes in most cases. These complexes often consist of only core complexes with some already biochemically and immunologically characterized subunits or auxillary proteins missing (data not shown). Consider, for example, the biochemically and immunologically purified complex replication factor C (RFC), which has 5 subunits called RFC1 through RFC5. In TAP, the bait protein RFC2 pulled down RFC3 and RFC4 and one additional protein, EFD1 (supplemental data of Ref. 8Gavin A.-C. Bösche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.-M. Cruciat C.-M. Remor M. Höfert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.-A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Google Scholar). The same bait protein, RFC2, also copurified RFC3 and RFC4 in HMS-PCI, but pulled down 13 additional proteins as well (supplemental data of Ref. 9Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.-L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sørensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W.V. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Google Scholar). These questions must be examined more carefully in future studies. However, with our simple null and stage one model we can reproduce the main features of the underlying protein networks. Further refinements of these models can be done for more consistent future data. Our results show that the graph-theoretical analysis of clustering and modularity in the topology of protein interaction networks needs to take into account the observed physical modules (complexes) and their particular organization into functional modules and higher-order (complex-complex) networks by the shared usage of proteins. These aspects are lost in the usual graph representation of protein networks. The importance of the analysis of physical protein complexes is also underscored by the demonstration of the exponential distribution of the number of different complexes of a given size that reflects the physicochemical dynamics of complex formation and dissociation. This has been shown already for the exponential distribution of the number of domains in proteins (17Koonin E.V. Wolf Y.I. Karev G.P. The structure of the protein universe and genome evolution.Nature. 2002; 420: 218-223Google Scholar). Higher quality, exhaustive protein complex data in the near future will allow one to translate the topological maps of biochemical networks that contain potential interactions into complexes defined by actual physical interactions. We thank A. Beyer, F. Grosse, and J. Sühnel for critical reading of the manuscript and an anonymous referee for valuable comments.

Referência(s)