Comparative Proteomic Analysis of Eleven Common Cell Lines Reveals Ubiquitous but Varying Expression of Most Proteins
2012; Elsevier BV; Volume: 11; Issue: 3 Linguagem: Inglês
10.1074/mcp.m111.014050
ISSN1535-9484
AutoresTamar Geiger, Anja Wehner, Christoph Schaab, Jüergen Cox, Matthias Mann,
Tópico(s)Molecular Biology Techniques and Applications
ResumoDeep proteomic analysis of mammalian cell lines would yield an inventory of the building blocks of the most commonly used systems in biological research. Mass spectrometry-based proteomics can identify and quantify proteins in a global and unbiased manner and can highlight the cellular processes that are altered between such systems. We analyzed 11 human cell lines using an LTQ-Orbitrap family mass spectrometer with a “high field” Orbitrap mass analyzer with improved resolution and sequencing speed. We identified a total of 11,731 proteins, and on average 10,361 ± 120 proteins in each cell line. This very high proteome coverage enabled analysis of a broad range of processes and functions. Despite the distinct origins of the cell lines, our quantitative results showed surprisingly high similarity in terms of expressed proteins. Nevertheless, this global similarity of the proteomes did not imply equal expression levels of individual proteins across the 11 cell lines, as we found significant differences in expression levels for an estimated two-third of them. The variability in cellular expression levels was similar for low and high abundance proteins, and even many of the most highly expressed proteins with household roles showed significant differences between cells. Metabolic pathways, which have high redundancy, exhibited variable expression, whereas basic cellular functions such as the basal transcription machinery varied much less. We harness knowledge of these cell line proteomes for the construction of a broad coverage “super-SILAC” quantification standard. Together with the accompanying paper (Schaab, C. MCP 2012, PMID: 22301388) (17Schaab C. Geiger T. Stoehr G. Cox J. Mann M. Analysis of high-accuracy, quantitative proteomics data in the MaxQB database.Mol. Cell. Proteomics. 2012; 10.1074/mcp.M111.014068Abstract Full Text Full Text PDF Scopus (117) Google Scholar) these data can be used to obtain reference expression profiles for proteins of interest both within and across cell line proteomes. Deep proteomic analysis of mammalian cell lines would yield an inventory of the building blocks of the most commonly used systems in biological research. Mass spectrometry-based proteomics can identify and quantify proteins in a global and unbiased manner and can highlight the cellular processes that are altered between such systems. We analyzed 11 human cell lines using an LTQ-Orbitrap family mass spectrometer with a “high field” Orbitrap mass analyzer with improved resolution and sequencing speed. We identified a total of 11,731 proteins, and on average 10,361 ± 120 proteins in each cell line. This very high proteome coverage enabled analysis of a broad range of processes and functions. Despite the distinct origins of the cell lines, our quantitative results showed surprisingly high similarity in terms of expressed proteins. Nevertheless, this global similarity of the proteomes did not imply equal expression levels of individual proteins across the 11 cell lines, as we found significant differences in expression levels for an estimated two-third of them. The variability in cellular expression levels was similar for low and high abundance proteins, and even many of the most highly expressed proteins with household roles showed significant differences between cells. Metabolic pathways, which have high redundancy, exhibited variable expression, whereas basic cellular functions such as the basal transcription machinery varied much less. We harness knowledge of these cell line proteomes for the construction of a broad coverage “super-SILAC” quantification standard. Together with the accompanying paper (Schaab, C. MCP 2012, PMID: 22301388) (17Schaab C. Geiger T. Stoehr G. Cox J. Mann M. Analysis of high-accuracy, quantitative proteomics data in the MaxQB database.Mol. Cell. Proteomics. 2012; 10.1074/mcp.M111.014068Abstract Full Text Full Text PDF Scopus (117) Google Scholar) these data can be used to obtain reference expression profiles for proteins of interest both within and across cell line proteomes. Mammalian cell lines are the basis of much of the biological work that examines protein function and cell response to perturbations and they have been indispensable for many of the biological insights obtained in the last decades. In the majority of cases these cell lines were extracted from tumors of different origins, and were then adapted to growth in vitro. These cell lines serve as proxies not only of the original tumors or tissues but also for fundamental biological processes. A system-wide and comparative view of the proteomes of such cell lines can reveal commonalities and discrepancies between cell lines in general and highlight the biological processes and their variations across the cells.So far only very few proteomic studies have attempted to determine shared and distinct features of different cell lines. Burkard et al. defined a “central proteome” in a comparison of seven cell lines (1Burkard T.R. Planyavsky M. Kaupe I. Breitwieser F.P. Bürckstummer T. Bennett K.L. Superti-Furga G. Colinge J. Initial characterization of the human central proteome.BMC Syst. Biol. 2011; 5: 17Crossref PubMed Scopus (65) Google Scholar). It consisted of the 1124 proteins that were identified in all these cell systems and that were preferentially involved in protein expression, metabolism and proliferation. This study identified 2000–4000 proteins per cell line, and was therefore limited to the more abundant proteins in the cell. It also did not attempt to quantify expression differences between the proteomes. With Uhlen and coworkers we recently analyzed gene expression in three distinct human cell lines by next generation sequencing, quantitative proteomics and the antibodies provided by the Human Protein Atlas. RNA-seq, stable isotope labeling with amino acid in cell culture (SILAC)-based 1The abbreviations used are:SILACstable isotope labeling with amino acid in cell cultureFDRfalse discovery rateMS/MStandem mass spectrometryHCDHigher energy Collisional DissociationLTQLinear trap quadrupole. 1The abbreviations used are:SILACstable isotope labeling with amino acid in cell cultureFDRfalse discovery rateMS/MStandem mass spectrometryHCDHigher energy Collisional DissociationLTQLinear trap quadrupole. proteomics and antibody-based confocal microscopy all found a high degree of similarity in expressed genes (2Lundberg E. Fagerberg L. Klevebring D. Matic I. Geiger T. Cox J. Algenäs C. Lundeberg J. Mann M. Uhlen M. Defining the transcriptome and proteome in three functionally different human cell lines.Mol. Syst. Biol. 2010; 6: 450Crossref PubMed Scopus (269) Google Scholar). In that study, the depth of our proteomic analysis was limited to about 5000 proteins raising the question whether this limitation contributed to the high resemblance of the cell lines at the protein level. This issue could be addressed by performing more comprehensive mass spectrometric analysis of cell lines, and by increasing the number of analyzed cell lines to examine the generality of the large overlap of proteomes.Rapid developments in MS-based proteomics have enabled identification of increasing proportions of analyzed proteomes, aiding in the attempt to reach a comprehensive view of the system (3Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1584) Google Scholar, 4Swaney D.L. Wenger C.D. Coon J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics.J. Proteome Res. 2010; 9: 1323-1329Crossref PubMed Scopus (322) Google Scholar, 5Mallick P. Kuster B. Proteomics: a pragmatic perspective.Nat. Biotechnol. 2010; 28: 695-709Crossref PubMed Scopus (316) Google Scholar, 6Beck M. Claassen M. Aebersold R. Comprehensive proteomics.Current Opin. Biotechnol. 2011; 22: 3-8Crossref PubMed Scopus (71) Google Scholar). In the yeast model, which has a genome of 6000 genes, such a comprehensive proteomic analysis identified 4400 proteins (7de Godoy L.M. Olsen J.V. Cox J. Nielsen M.L. Hubner N.C. Fröhlich F. Walther T.C. Mann M. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.Nature. 2008; 455: 1251-1254Crossref PubMed Scopus (733) Google Scholar). The same degree of coverage has not yet been reached for human cells, whose genome consists of about 20,000 genes and whose proteomes are much more complex. Routine analyses of mammalian systems currently can lead to the identification of 4000–6000 proteins in a few days of analysis (8Wiśniewski J.R. Zougman A. Nagaraj N. Mann M. Universal sample preparation method for proteome analysis.Nat. Methods. 2009; 6: 359-362Crossref PubMed Scopus (4862) Google Scholar, 9Schwanhäusser B. Busse D. Li N. Dittmar G. Schuchhardt J. Wolf J. Chen W. Selbach M. Global quantification of mammalian gene expression control.Nature. 2011; 473: 337-342Crossref PubMed Scopus (3946) Google Scholar, 10Rigbolt K.T. Prokhorova T.A. Akimov V. Henningsen J. Johansen P.T. Kratchmarova I. Kassem M. Mann M. Olsen J.V. Blagoev B. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation.Sci. Signal. 2011; 4: rs3Crossref PubMed Scopus (358) Google Scholar), which corresponds to about 50% of the expressed proteome based on the common estimate that a single cell type expresses 10,000 proteins. Significantly higher numbers of identified proteins were so far only achieved by combining multiple diverse cell lines or tissues in one analysis (11Huttlin E.L. Jedrychowski M.P. Elias J.E. Goswami T. Rad R. Beausoleil S.A. Villén J. Haas W. Sowa M.E. Gygi S.P. A tissue-specific atlas of mouse protein phosphorylation and expression.Cell. 2010; 143: 1174-1189Abstract Full Text Full Text PDF PubMed Scopus (1188) Google Scholar), or by investing weeks of measurement for single samples (12Nagaraj N. Wisniewski J.R. Geiger T. Cox J. Kircher M. Kelso J. Pääbo S. Mann M. Deep proteome and transcriptome mapping of a human cancer cell line.Mol. Syst. Biol. 2011; 7: 548Crossref PubMed Scopus (738) Google Scholar, 13Beck M. Schmidt A. Malmstroem J. Claassen M. Ori A. Szymborska A. Herzog F. Rinner O. Ellenberg J. Aebersold R. The quantitative proteome of a human cell line.Mol. Syst. Biol. 2011; 7: 549Crossref PubMed Scopus (576) Google Scholar).Here we employ the latest proteomics technology in order to achieve a very extensive proteomic coverage of multiple human cell lines. The linear trap quadrupole (LTQ)-Orbitrap Velos mass spectrometer has improved higher-energy collisional dissociation (HCD) capabilities, and therefore enables acquisition of high resolution tandem MS (MS/MS) spectra without compromising the depth of analysis (14Olsen J.V. Schwartz J.C. Griep-Raming J. Nielsen M.L. Damoc E. Denisov E. Lange O. Remes P. Taylor D. Splendore M. Wouters E.R. Senko M. Makarov A. Mann M. Horning S. A dual pressure linear ion trap orbitrap instrument with very high sequencing speed.Mol. Cell. Proteomics. 2009; 8: 2759-2769Abstract Full Text Full Text PDF PubMed Scopus (375) Google Scholar). Here, we additionally make use of a novel “high field” Orbitrap analyzer with higher resolution and higher sequencing speed (15Makarov A. Denisov E. Lange O. Performance evaluation of a high-field Orbitrap mass analyzer.J. Am. Soc. Mass Spectrom. 2009; 20: 1391-1396Crossref PubMed Scopus (133) Google Scholar). This Orbitrap mass spectrometer is described in detail in another manuscript in this issue (16Michalski A. Damoc E. Lange O. Denisov E. Nolting D. Mueller M. Viner R. Schwartz J. Remes P. Belford M. Dunyach J.J. Cox J. Horning S. Mann M. Makarov A. Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes.Mol. Cell. Proteomics. 2011; 10.1074/mcp.O111.013698Google Scholar). We performed deep analysis of 11 cell lines in relatively short analysis time and obtained very extensive characterization of their proteomes. The data is deposited in the MaxQB database, which is the subject of an accompanying manuscript and which allows sophisticated analysis and visualization of these reference proteomes [www.biochem.mpg.de/maxqb] (17Schaab C. Geiger T. Stoehr G. Cox J. Mann M. Analysis of high-accuracy, quantitative proteomics data in the MaxQB database.Mol. Cell. Proteomics. 2012; 10.1074/mcp.M111.014068Abstract Full Text Full Text PDF Scopus (117) Google Scholar).DISCUSSION AND OUTLOOKHere we employed state of the art mass spectrometric technology and characterized the proteome of eleven common cell lines to a depth of over 10,000 proteins in each case. Remarkably, this depth of coverage was achievable without extensive fractionation and with a relatively straightforward proteomic workflow. Total measuring time at one proteome per day added up to only 33 days for triplicates of all eleven cell lines, comparing favorably even to transcriptome measurements using next generation sequencing technology. Although the depth of analysis shown here is currently limited to specialized groups, there are no principal obstacles to its application to a broad range of scientists. An interesting corollary of our study is that MS-based proteomics is not limited by the rate of protein identification per se. If diverse samples could be found in which all of the proteins were present in the 10,000 most abundant proteins, then proteins for all genes in the human genome could be sequenced rather quickly. Therefore, the major difficulty in obtaining an identified protein for every gene, as called for by the Human Proteome Organization (35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar), is in a suitable supply of proteomes. Already the single project described here can serve as a resource to determine expression of proteins of interest in specific cell lines for proteins from half of the human genome. It also provides reference peptides and their high resolution HCD fragmentation spectra for these proteins (see accompanying paper Schaab et al.). Furthermore, we expect our data to have many practical applications, such as in creating proteome standards as already demonstrated here.In biological research specific cell lines are chosen to investigate cellular processes that occur in their tissue of origin. In view of that fact, an unexpected finding of this study was the high degree of overall similarity of the proteomes of the diverse cell lines. For instance, the depth of the proteome detectable by our technology was very similar in all cases and label-free proteome correlations ranged from 0.68 to 0.83. Our findings do agree, however, with recent studies at the transcriptome and proteome levels that also found a large overlap of expressed genes in different cell types (2Lundberg E. Fagerberg L. Klevebring D. Matic I. Geiger T. Cox J. Algenäs C. Lundeberg J. Mann M. Uhlen M. Defining the transcriptome and proteome in three functionally different human cell lines.Mol. Syst. Biol. 2010; 6: 450Crossref PubMed Scopus (269) Google Scholar, 35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar, 36Pontén F. Gry M. Fagerberg L. Lundberg E. Asplund A. Berglund L. Oksvold P. Björling E. Hober S. Kampf C. Navani S. Nilsson P. Ottosson J. Persson A. Wernérus H. Wester K. Uhlén M. A global view of protein expression in human cells, tissues, and organs.Mol. Syst. Biol. 2009; 5: 337Crossref PubMed Scopus (147) Google Scholar). This high commonality of the proteome presumably results in part from the adaptation of cell lines to the in vitro growth. In this situation, cellular clones that proliferate rapidly are selected for whereas many cell type and tissue specific functions that are not crucial to their growth and survival may be lost. We have previously addressed this question directly by quantifying the proteome of a liver cell line against primary hepatocytes and our results support the above conclusions (38Pan C. Kumar C. Bohl S. Klingmueller U. Mann M. Comparative proteomic phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions.Mol. Cell. Proteomics. 2009; 8: 443-450Abstract Full Text Full Text PDF PubMed Scopus (348) Google Scholar).Despite the overall proteomics similarities, cell type specific clusters of protein expression are clearly present in the cell lines (Fig. 3A). Furthermore, statistical analysis of the expression profiles showed that a large proportion of the proteins changes significantly in at least one of the cell lines. Interestingly, when filtering for a robust number of quantification events per protein, we found that more than two thirds of the proteome is likely to change significantly and that this is not affected by protein abundance. Bioinformatic analysis of protein function revealed higher variability in redundant pathways whereas basal functions such as gene and protein expression tended to be more uniformly represented across the cell lines. These analyses can guide researchers in the choice of the optimal cell line for the biological interests. Our data shows that even functions carried out by abundant and ubiquitous proteins do not necessarily imply that these proteins need to be expressed at the same levels in all cell lines. Instead they often vary several fold.What do these results mean for the common notion of a “core” or “household” proteome composed of proteins that are needed by every cell type and that are highly abundant? At a minimum, deep proteomics reveals that a household proteome, is not as straightforward a concept as frequently believed. For instance, at least in cell lines, proteins tend to be present in very diverse cell types, not only for very common but also for more specialized functions. Furthermore, the household proteins themselves are not necessarily uniformly expressed. For a biologically desirable definition of the household proteome it may be necessary to study the proteomes of cell types in vivo, an undertaking that we expect to become technological possible within the next few years. Mammalian cell lines are the basis of much of the biological work that examines protein function and cell response to perturbations and they have been indispensable for many of the biological insights obtained in the last decades. In the majority of cases these cell lines were extracted from tumors of different origins, and were then adapted to growth in vitro. These cell lines serve as proxies not only of the original tumors or tissues but also for fundamental biological processes. A system-wide and comparative view of the proteomes of such cell lines can reveal commonalities and discrepancies between cell lines in general and highlight the biological processes and their variations across the cells. So far only very few proteomic studies have attempted to determine shared and distinct features of different cell lines. Burkard et al. defined a “central proteome” in a comparison of seven cell lines (1Burkard T.R. Planyavsky M. Kaupe I. Breitwieser F.P. Bürckstummer T. Bennett K.L. Superti-Furga G. Colinge J. Initial characterization of the human central proteome.BMC Syst. Biol. 2011; 5: 17Crossref PubMed Scopus (65) Google Scholar). It consisted of the 1124 proteins that were identified in all these cell systems and that were preferentially involved in protein expression, metabolism and proliferation. This study identified 2000–4000 proteins per cell line, and was therefore limited to the more abundant proteins in the cell. It also did not attempt to quantify expression differences between the proteomes. With Uhlen and coworkers we recently analyzed gene expression in three distinct human cell lines by next generation sequencing, quantitative proteomics and the antibodies provided by the Human Protein Atlas. RNA-seq, stable isotope labeling with amino acid in cell culture (SILAC)-based 1The abbreviations used are:SILACstable isotope labeling with amino acid in cell cultureFDRfalse discovery rateMS/MStandem mass spectrometryHCDHigher energy Collisional DissociationLTQLinear trap quadrupole. 1The abbreviations used are:SILACstable isotope labeling with amino acid in cell cultureFDRfalse discovery rateMS/MStandem mass spectrometryHCDHigher energy Collisional DissociationLTQLinear trap quadrupole. proteomics and antibody-based confocal microscopy all found a high degree of similarity in expressed genes (2Lundberg E. Fagerberg L. Klevebring D. Matic I. Geiger T. Cox J. Algenäs C. Lundeberg J. Mann M. Uhlen M. Defining the transcriptome and proteome in three functionally different human cell lines.Mol. Syst. Biol. 2010; 6: 450Crossref PubMed Scopus (269) Google Scholar). In that study, the depth of our proteomic analysis was limited to about 5000 proteins raising the question whether this limitation contributed to the high resemblance of the cell lines at the protein level. This issue could be addressed by performing more comprehensive mass spectrometric analysis of cell lines, and by increasing the number of analyzed cell lines to examine the generality of the large overlap of proteomes. stable isotope labeling with amino acid in cell culture false discovery rate tandem mass spectrometry Higher energy Collisional Dissociation Linear trap quadrupole. stable isotope labeling with amino acid in cell culture false discovery rate tandem mass spectrometry Higher energy Collisional Dissociation Linear trap quadrupole. Rapid developments in MS-based proteomics have enabled identification of increasing proportions of analyzed proteomes, aiding in the attempt to reach a comprehensive view of the system (3Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1584) Google Scholar, 4Swaney D.L. Wenger C.D. Coon J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics.J. Proteome Res. 2010; 9: 1323-1329Crossref PubMed Scopus (322) Google Scholar, 5Mallick P. Kuster B. Proteomics: a pragmatic perspective.Nat. Biotechnol. 2010; 28: 695-709Crossref PubMed Scopus (316) Google Scholar, 6Beck M. Claassen M. Aebersold R. Comprehensive proteomics.Current Opin. Biotechnol. 2011; 22: 3-8Crossref PubMed Scopus (71) Google Scholar). In the yeast model, which has a genome of 6000 genes, such a comprehensive proteomic analysis identified 4400 proteins (7de Godoy L.M. Olsen J.V. Cox J. Nielsen M.L. Hubner N.C. Fröhlich F. Walther T.C. Mann M. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.Nature. 2008; 455: 1251-1254Crossref PubMed Scopus (733) Google Scholar). The same degree of coverage has not yet been reached for human cells, whose genome consists of about 20,000 genes and whose proteomes are much more complex. Routine analyses of mammalian systems currently can lead to the identification of 4000–6000 proteins in a few days of analysis (8Wiśniewski J.R. Zougman A. Nagaraj N. Mann M. Universal sample preparation method for proteome analysis.Nat. Methods. 2009; 6: 359-362Crossref PubMed Scopus (4862) Google Scholar, 9Schwanhäusser B. Busse D. Li N. Dittmar G. Schuchhardt J. Wolf J. Chen W. Selbach M. Global quantification of mammalian gene expression control.Nature. 2011; 473: 337-342Crossref PubMed Scopus (3946) Google Scholar, 10Rigbolt K.T. Prokhorova T.A. Akimov V. Henningsen J. Johansen P.T. Kratchmarova I. Kassem M. Mann M. Olsen J.V. Blagoev B. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation.Sci. Signal. 2011; 4: rs3Crossref PubMed Scopus (358) Google Scholar), which corresponds to about 50% of the expressed proteome based on the common estimate that a single cell type expresses 10,000 proteins. Significantly higher numbers of identified proteins were so far only achieved by combining multiple diverse cell lines or tissues in one analysis (11Huttlin E.L. Jedrychowski M.P. Elias J.E. Goswami T. Rad R. Beausoleil S.A. Villén J. Haas W. Sowa M.E. Gygi S.P. A tissue-specific atlas of mouse protein phosphorylation and expression.Cell. 2010; 143: 1174-1189Abstract Full Text Full Text PDF PubMed Scopus (1188) Google Scholar), or by investing weeks of measurement for single samples (12Nagaraj N. Wisniewski J.R. Geiger T. Cox J. Kircher M. Kelso J. Pääbo S. Mann M. Deep proteome and transcriptome mapping of a human cancer cell line.Mol. Syst. Biol. 2011; 7: 548Crossref PubMed Scopus (738) Google Scholar, 13Beck M. Schmidt A. Malmstroem J. Claassen M. Ori A. Szymborska A. Herzog F. Rinner O. Ellenberg J. Aebersold R. The quantitative proteome of a human cell line.Mol. Syst. Biol. 2011; 7: 549Crossref PubMed Scopus (576) Google Scholar). Here we employ the latest proteomics technology in order to achieve a very extensive proteomic coverage of multiple human cell lines. The linear trap quadrupole (LTQ)-Orbitrap Velos mass spectrometer has improved higher-energy collisional dissociation (HCD) capabilities, and therefore enables acquisition of high resolution tandem MS (MS/MS) spectra without compromising the depth of analysis (14Olsen J.V. Schwartz J.C. Griep-Raming J. Nielsen M.L. Damoc E. Denisov E. Lange O. Remes P. Taylor D. Splendore M. Wouters E.R. Senko M. Makarov A. Mann M. Horning S. A dual pressure linear ion trap orbitrap instrument with very high sequencing speed.Mol. Cell. Proteomics. 2009; 8: 2759-2769Abstract Full Text Full Text PDF PubMed Scopus (375) Google Scholar). Here, we additionally make use of a novel “high field” Orbitrap analyzer with higher resolution and higher sequencing speed (15Makarov A. Denisov E. Lange O. Performance evaluation of a high-field Orbitrap mass analyzer.J. Am. Soc. Mass Spectrom. 2009; 20: 1391-1396Crossref PubMed Scopus (133) Google Scholar). This Orbitrap mass spectrometer is described in detail in another manuscript in this issue (16Michalski A. Damoc E. Lange O. Denisov E. Nolting D. Mueller M. Viner R. Schwartz J. Remes P. Belford M. Dunyach J.J. Cox J. Horning S. Mann M. Makarov A. Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes.Mol. Cell. Proteomics. 2011; 10.1074/mcp.O111.013698Google Scholar). We performed deep analysis of 11 cell lines in relatively short analysis time and obtained very extensive characterization of their proteomes. The data is deposited in the MaxQB database, which is the subject of an accompanying manuscript and which allows sophisticated analysis and visualization of these reference proteomes [www.biochem.mpg.de/maxqb] (17Schaab C. Geiger T. Stoehr G. Cox J. Mann M. Analysis of high-accuracy, quantitative proteomics data in the MaxQB database.Mol. Cell. Proteomics. 2012; 10.1074/mcp.M111.014068Abstract Full Text Full Text PDF Scopus (117) Google Scholar). DISCUSSION AND OUTLOOKHere we employed state of the art mass spectrometric technology and characterized the proteome of eleven common cell lines to a depth of over 10,000 proteins in each case. Remarkably, this depth of coverage was achievable without extensive fractionation and with a relatively straightforward proteomic workflow. Total measuring time at one proteome per day added up to only 33 days for triplicates of all eleven cell lines, comparing favorably even to transcriptome measurements using next generation sequencing technology. Although the depth of analysis shown here is currently limited to specialized groups, there are no principal obstacles to its application to a broad range of scientists. An interesting corollary of our study is that MS-based proteomics is not limited by the rate of protein identification per se. If diverse samples could be found in which all of the proteins were present in the 10,000 most abundant proteins, then proteins for all genes in the human genome could be sequenced rather quickly. Therefore, the major difficulty in obtaining an identified protein for every gene, as called for by the Human Proteome Organization (35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar), is in a suitable supply of proteomes. Already the single project described here can serve as a resource to determine expression of proteins of interest in specific cell lines for proteins from half of the human genome. It also provides reference peptides and their high resolution HCD fragmentation spectra for these proteins (see accompanying paper Schaab et al.). Furthermore, we expect our data to have many practical applications, such as in creating proteome standards as already demonstrated here.In biological research specific cell lines are chosen to investigate cellular processes that occur in their tissue of origin. In view of that fact, an unexpected finding of this study was the high degree of overall similarity of the proteomes of the diverse cell lines. For instance, the depth of the proteome detectable by our technology was very similar in all cases and label-free proteome correlations ranged from 0.68 to 0.83. Our findings do agree, however, with recent studies at the transcriptome and proteome levels that also found a large overlap of expressed genes in different cell types (2Lundberg E. Fagerberg L. Klevebring D. Matic I. Geiger T. Cox J. Algenäs C. Lundeberg J. Mann M. Uhlen M. Defining the transcriptome and proteome in three functionally different human cell lines.Mol. Syst. Biol. 2010; 6: 450Crossref PubMed Scopus (269) Google Scholar, 35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar, 36Pontén F. Gry M. Fagerberg L. Lundberg E. Asplund A. Berglund L. Oksvold P. Björling E. Hober S. Kampf C. Navani S. Nilsson P. Ottosson J. Persson A. Wernérus H. Wester K. Uhlén M. A global view of protein expression in human cells, tissues, and organs.Mol. Syst. Biol. 2009; 5: 337Crossref PubMed Scopus (147) Google Scholar). This high commonality of the proteome presumably results in part from the adaptation of cell lines to the in vitro growth. In this situation, cellular clones that proliferate rapidly are selected for whereas many cell type and tissue specific functions that are not crucial to their growth and survival may be lost. We have previously addressed this question directly by quantifying the proteome of a liver cell line against primary hepatocytes and our results support the above conclusions (38Pan C. Kumar C. Bohl S. Klingmueller U. Mann M. Comparative proteomic phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions.Mol. Cell. Proteomics. 2009; 8: 443-450Abstract Full Text Full Text PDF PubMed Scopus (348) Google Scholar).Despite the overall proteomics similarities, cell type specific clusters of protein expression are clearly present in the cell lines (Fig. 3A). Furthermore, statistical analysis of the expression profiles showed that a large proportion of the proteins changes significantly in at least one of the cell lines. Interestingly, when filtering for a robust number of quantification events per protein, we found that more than two thirds of the proteome is likely to change significantly and that this is not affected by protein abundance. Bioinformatic analysis of protein function revealed higher variability in redundant pathways whereas basal functions such as gene and protein expression tended to be more uniformly represented across the cell lines. These analyses can guide researchers in the choice of the optimal cell line for the biological interests. Our data shows that even functions carried out by abundant and ubiquitous proteins do not necessarily imply that these proteins need to be expressed at the same levels in all cell lines. Instead they often vary several fold.What do these results mean for the common notion of a “core” or “household” proteome composed of proteins that are needed by every cell type and that are highly abundant? At a minimum, deep proteomics reveals that a household proteome, is not as straightforward a concept as frequently believed. For instance, at least in cell lines, proteins tend to be present in very diverse cell types, not only for very common but also for more specialized functions. Furthermore, the household proteins themselves are not necessarily uniformly expressed. For a biologically desirable definition of the household proteome it may be necessary to study the proteomes of cell types in vivo, an undertaking that we expect to become technological possible within the next few years. Here we employed state of the art mass spectrometric technology and characterized the proteome of eleven common cell lines to a depth of over 10,000 proteins in each case. Remarkably, this depth of coverage was achievable without extensive fractionation and with a relatively straightforward proteomic workflow. Total measuring time at one proteome per day added up to only 33 days for triplicates of all eleven cell lines, comparing favorably even to transcriptome measurements using next generation sequencing technology. Although the depth of analysis shown here is currently limited to specialized groups, there are no principal obstacles to its application to a broad range of scientists. An interesting corollary of our study is that MS-based proteomics is not limited by the rate of protein identification per se. If diverse samples could be found in which all of the proteins were present in the 10,000 most abundant proteins, then proteins for all genes in the human genome could be sequenced rather quickly. Therefore, the major difficulty in obtaining an identified protein for every gene, as called for by the Human Proteome Organization (35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar), is in a suitable supply of proteomes. Already the single project described here can serve as a resource to determine expression of proteins of interest in specific cell lines for proteins from half of the human genome. It also provides reference peptides and their high resolution HCD fragmentation spectra for these proteins (see accompanying paper Schaab et al.). Furthermore, we expect our data to have many practical applications, such as in creating proteome standards as already demonstrated here. In biological research specific cell lines are chosen to investigate cellular processes that occur in their tissue of origin. In view of that fact, an unexpected finding of this study was the high degree of overall similarity of the proteomes of the diverse cell lines. For instance, the depth of the proteome detectable by our technology was very similar in all cases and label-free proteome correlations ranged from 0.68 to 0.83. Our findings do agree, however, with recent studies at the transcriptome and proteome levels that also found a large overlap of expressed genes in different cell types (2Lundberg E. Fagerberg L. Klevebring D. Matic I. Geiger T. Cox J. Algenäs C. Lundeberg J. Mann M. Uhlen M. Defining the transcriptome and proteome in three functionally different human cell lines.Mol. Syst. Biol. 2010; 6: 450Crossref PubMed Scopus (269) Google Scholar, 35Legrain P. Aebersold R. Archakov A. Bairoch A. Bala K. Beretta L. Bergeron J. Borchers C.H. Corthals G.L. Costello C.E. Deutsch E.W. Domon B. Hancock W. He F. Hochstrasser D. Marko-Varga G. Salekdeh G.H. Sechi S. Snyder M. Srivastava S. Uhlen M. Wu C.H. Yamamoto T. Paik Y.K. Omenn G.S. The human proteome project: current state and future direction.Mol. Cell. Proteomics. 2011; 10 (M111 009993)Abstract Full Text Full Text PDF PubMed Scopus (301) Google Scholar, 36Pontén F. Gry M. Fagerberg L. Lundberg E. Asplund A. Berglund L. Oksvold P. Björling E. Hober S. Kampf C. Navani S. Nilsson P. Ottosson J. Persson A. Wernérus H. Wester K. Uhlén M. A global view of protein expression in human cells, tissues, and organs.Mol. Syst. Biol. 2009; 5: 337Crossref PubMed Scopus (147) Google Scholar). This high commonality of the proteome presumably results in part from the adaptation of cell lines to the in vitro growth. In this situation, cellular clones that proliferate rapidly are selected for whereas many cell type and tissue specific functions that are not crucial to their growth and survival may be lost. We have previously addressed this question directly by quantifying the proteome of a liver cell line against primary hepatocytes and our results support the above conclusions (38Pan C. Kumar C. Bohl S. Klingmueller U. Mann M. Comparative proteomic phenotyping of cell lines and primary cells to assess preservation of cell type-specific functions.Mol. Cell. Proteomics. 2009; 8: 443-450Abstract Full Text Full Text PDF PubMed Scopus (348) Google Scholar). Despite the overall proteomics similarities, cell type specific clusters of protein expression are clearly present in the cell lines (Fig. 3A). Furthermore, statistical analysis of the expression profiles showed that a large proportion of the proteins changes significantly in at least one of the cell lines. Interestingly, when filtering for a robust number of quantification events per protein, we found that more than two thirds of the proteome is likely to change significantly and that this is not affected by protein abundance. Bioinformatic analysis of protein function revealed higher variability in redundant pathways whereas basal functions such as gene and protein expression tended to be more uniformly represented across the cell lines. These analyses can guide researchers in the choice of the optimal cell line for the biological interests. Our data shows that even functions carried out by abundant and ubiquitous proteins do not necessarily imply that these proteins need to be expressed at the same levels in all cell lines. Instead they often vary several fold. What do these results mean for the common notion of a “core” or “household” proteome composed of proteins that are needed by every cell type and that are highly abundant? At a minimum, deep proteomics reveals that a household proteome, is not as straightforward a concept as frequently believed. For instance, at least in cell lines, proteins tend to be present in very diverse cell types, not only for very common but also for more specialized functions. Furthermore, the household proteins themselves are not necessarily uniformly expressed. For a biologically desirable definition of the household proteome it may be necessary to study the proteomes of cell types in vivo, an undertaking that we expect to become technological possible within the next few years. We thank the members of the Department of Proteomics and Signal Transduction for fruitful discussions. Especially, we thank Annette Michalski for MS expertise and Bianca Splettstöβer for help with the cell culture and Nadin Neuhauser for computational support. Supplementary Material Download .zip (67.76 MB) Help with zip files Download .zip (67.76 MB) Help with zip files
Referência(s)