GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data

Artigo Acesso aberto Revisado por pares

GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data

2011; Elsevier BV; Volume: 10; Issue: 8 Linguagem: Inglês

10.1074/mcp.o110.007450

ISSN

1535-9484

Autores

Kristoffer T. G. Rigbolt, Jens T. Vanselow, Blagoy Blagoev,

Tópico(s)

Gene expression and cancer classification

Resumo

Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX) 1The abbreviations used are:GProXGraphical Proteomics Data ExplorerEGFepidermal growth factorGOGene OntologyHGFhepatocyte growth factorKEGGKyoto Encyclopedia of Genes and GenomesOSoperating systemSPIASignaling Pathway Impact Analysis.. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX) 1The abbreviations used are:GProXGraphical Proteomics Data ExplorerEGFepidermal growth factorGOGene OntologyHGFhepatocyte growth factorKEGGKyoto Encyclopedia of Genes and GenomesOSoperating systemSPIASignaling Pathway Impact Analysis.. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. Graphical Proteomics Data Explorer epidermal growth factor Gene Ontology hepatocyte growth factor Kyoto Encyclopedia of Genes and Genomes operating system Signaling Pathway Impact Analysis. During the last decade, identification and quantitation of proteomes has been facilitated by the constant developments in mass spectrometry instrumentation, fractionation techniques, quantitation-strategies, and data analysis software. Using state-of-the-art technology it has become possible to quantify several thousands of proteins (1Rigbolt K.T. Prokhorova T.A. Akimov V. Henningsen J. Johansen P.T. Kratchmarova I. Kassem M. Mann M. Olsen J.V. Blagoev B. System-wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation.Sci. Signal. 2011; 4: rs3Crossref PubMed Scopus (355) Google Scholar, 2Dengjel J. Kratchmarova I. Blagoev B. Receptor tyrosine kinase signaling: a view from quantitative proteomics.Mol. Biosyst. 2009; 5: 1112-1121Crossref PubMed Scopus (49) Google Scholar, 3Swaney D.L. Wenger C.D. Coon J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics.J. Proteome Res. 2010; 9: 1323-1329Crossref PubMed Scopus (315) Google Scholar, 4Peng J. Schwartz D. Elias J.E. Thoreen C.C. Cheng D. Marsischky G. Roelofs J. Finley D. Gygi S.P. A proteomics approach to understanding protein ubiquitination.Nat. Biotechnol. 2003; 21: 921-926Crossref PubMed Scopus (1278) Google Scholar, 5Usaite R. Wohlschlegel J. Venable J.D. Park S.K. Nielsen J. Olsson L. Yates Iii, J.R. Characterization of global yeast quantitative proteome data generated from the wild-type and glucose repression saccharomyces cerevisiae strains: the comparison of two quantitative methods.J. Proteome Res. 2008; 7: 266-275Crossref PubMed Scopus (93) Google Scholar, 6Aye T.T. Scholten A. Taouatas N. Varro A. Van Veen T.A. Vos M.A. Heck A.J. Proteome-wide protein concentrations in the human heart.Mol. Biosyst. 2010; 6: 1917-1927Crossref PubMed Scopus (59) Google Scholar, 7Prokhorova T.A. Rigbolt K.T. Johansen P.T. Henningsen J. Kratchmarova I. Kassem M. Blagoev B. Stable isotope labeling by amino acids in cell culture (SILAC) and quantitative comparison of the membrane proteomes of self-renewing and differentiating human embryonic stem cells.Mol. Cell. Proteomics. 2009; 8: 959-970Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar, 8Kristensen A.R. Schandorff S. Høyer-Hansen M. Nielsen M.O. Jäättelä M. Dengjiel J. Andersen J.S. Ordered organelle degradation during starvation-induced autophagy.Mol. Cell. Proteomics. 2008; 7: 2419-2428Abstract Full Text Full Text PDF PubMed Scopus (151) Google Scholar, 9Iwasaki M. Miwa S. Ikegami T. Tomita M. Tanaka N. Ishihama Y. One-dimensional capillary liquid chromatographic separation coupled with tandem mass spectrometry unveils the Escherichia coli proteome on a microarray scale.Anal. Chem. 2010; 82: 2616-2620Crossref PubMed Scopus (114) Google Scholar, 10Rigbolt K.T. Blagoev B. Proteome-wide quantitation by SILAC.Methods Mol. Biol. 2010; 658: 187-204Crossref PubMed Scopus (21) Google Scholar), and even complete proteomes within a single proteomics experiment (11de Godoy L.M. Olsen J.V. Cox J. Nielsen M.L. Hubner N.C. Fröhlich F. Walther T.C. Mann M. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.Nature. 2008; 455: 1251-1254Crossref PubMed Scopus (726) Google Scholar, 12Picotti P. Bodenmiller B. Mueller L.N. Domon B. Aebersold R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics.Cell. 2009; 138: 795-806Abstract Full Text Full Text PDF PubMed Scopus (638) Google Scholar). Powerful software solutions for protein identification and quantitation have been developed that allow users to process the information stored in the raw mass spectrometry data. These software solutions have been developed by both the scientific community (13Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (8607) Google Scholar, 14Matthiesen R. Trelle M.B. Højrup P. Bunkenborg J. Jensen O.N. VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.J. Proteome Res. 2005; 4: 2338-2347Crossref PubMed Scopus (110) Google Scholar, 15Mortensen P. Gouw J.W. Olsen J.V. Ong S.E. Rigbolt K.T. Bunkenborg J. Cox J. Foster L.J. Heck A.J. Blagoev B. Andersen J.S. Mann M. MSQuant, an open source platform for mass spectrometry-based quantitative proteomics.J. Proteome Res. 2010; 9: 393-403Crossref PubMed Scopus (221) Google Scholar, 16Deutsch E.W. Mendoza L. Shteynberg D. Farrah T. Lam H. Tasman N. Sun Z. Nilsson E. Pratt B. Prazen B. Eng J.K. Martin D.B. Nesvizhskii A.I. Aebersold R. A guided tour of the Trans-Proteomic Pipeline.Proteomics. 2010; 10: 1150-1159Crossref PubMed Scopus (580) Google Scholar) and by instrument vendors, exemplified by PEAKS (Bioinformatics Solutions) and Proteome Discoverer (Thermo Scientific). In face of these advances in the field, we find that data analysis is currently the bottleneck of proteomics experiments. Familiarity with several advanced bioinformatics tools, and preferably programming skills, are nowadays essential to perform a comprehensive analysis of large proteomics data sets (17Kumar C. Mann M. Bioinformatics analysis of mass spectrometry-based proteomics data sets.FEBS Lett. 2009; 583: 1703-1712Crossref PubMed Scopus (132) Google Scholar). So far, experimenters without familiarity with computer programming have typically been required to use spreadsheet applications that are not per se developed for analysis of biological data and are therefore of limited use for working with the large amount of data produced from modern proteomics experiments. Alternatively a number of software solutions for analyzing "omics" data has been developed, notable examples are the MultiExperiment Viewer (18Saeed A.I. Bhagabati N.K. Braisted J.C. Liang W. Sharov V. Howe E.A. Li J. Thiagarajan M. White J.A. Quackenbush J. TM4 microarray software suite.Methods Enzymol. 2006; 411: 134-193Crossref PubMed Scopus (1406) Google Scholar) which makes available algorithms for clustering and statistical analysis, and the GSEA-P (19Subramanian A. Kuehn H. Gould J. Tamayo P. Mesirov J.P. GSEA-P: a desktop application for Gene Set Enrichment Analysis.Bioinformatics. 2007; 23: 3251-3253Crossref PubMed Scopus (841) Google Scholar), FatiGO+ (20Al-Shahrour F. Minguez P. Tárraga J. Medina I. Alloza E. Montaner D. Dopazo J. FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments.Nucleic Acids Res. 2007; 35: W91-96Crossref PubMed Scopus (209) Google Scholar), and DAVID (21Dennis Jr., G. Sherman B.T. Hosack D.A. Yang J. Gao W. Lane H.C. Lempicki R.A. DAVID: Database for Annotation, Visualization, and Integrated Discovery.Genome Biol. 2003; 4: P3Crossref PubMed Google Scholar) resources that focus on annotation and enrichment analysis of particularly Gene Ontology (GO) (22Ashburner M. Ball C.A. Blake J.A. Botstein D. Butler H. Cherry J.M. Davis A.P. Dolinski K. Dwight S.S. Eppig J.T. Harris M.A. Hill D.P. Issel-Tarver L. Kasarskis A. Lewis S. Matese J.C. Richardson J.E. Ringwald M. Rubin G.M. Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.Nat. Genet. 2000; 25: 25-29Crossref PubMed Scopus (25781) Google Scholar) terms. Tools such as QuPE (23Albaum S.P. Neuweger H. Fränzel B. Lange S. Mertens D. Trötschel C. Wolters D. Kalinowski J. Nattkemper T.W. Goesmann A. Qupe–a Rich Internet Application to take a step forward in the analysis of mass spectrometry-based quantitative proteomics experiments.Bioinformatics. 2009; 25: 3128-3134Crossref PubMed Scopus (22) Google Scholar), DAnTE (24Polpitiya A.D. Qian W.J. Jaitly N. Petyuk V.A. Adkins J.N. Camp 2nd, D.G. Anderson G.A. Smith R.D. DAnTE: a statistical tool for quantitative analysis of -omics data.Bioinformatics. 2008; 24: 1556-1558Crossref PubMed Scopus (319) Google Scholar), and StatQuant (25van Breukelen B. van den Toorn H.W. Drugan M.M. Heck A.J. StatQuant: a post-quantification analysis toolbox for improving quantitative mass spectrometry.Bioinformatics. 2009; 25: 1472-1473Crossref PubMed Scopus (35) Google Scholar) provide a range of advanced statistical procedures for performing postquantitation analysis of protein abundance ratios. Finally, the Cytoscape (26Shannon P. Markiel A. Ozier O. Baliga N.S. Wang J.T. Ramage D. Amin N. Schwikowski B. Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res. 2003; 13: 2498-2504Crossref PubMed Scopus (22823) Google Scholar) development team has delivered a remarkable contribution for the analysis of, in particular, protein-protein interaction data and protein network visualization. Although these and other standalone tools are very useful for their specialized purposes, they do not support complex experimental setups and the divergent requirements for data input and output formats complicate interoperability and obstruct integration of several analysis steps. To allow experimenters to combine several individual tools, programs such as the Bioinformatic Resource Manager (27Shah A.R. Singhal M. Klicker K.R. Stephan E.G. Wiley H.S. Waters K.M. Enabling high-throughput data management for systems biology: the Bioinformatics Resource Manager.Bioinformatics. 2007; 23: 906-909Crossref PubMed Scopus (42) Google Scholar) and Prequips (28Gehlenborg N. Yan W. Lee I.Y. Yoo H. Nieselt K. Hwang D. Aebersold R. Hood L. Prequips–an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics data.Bioinformatics. 2009; 25: 682-683Crossref PubMed Scopus (10) Google Scholar), which both use the program Gaggle (29Shannon P.T. Reiss D.J. Bonneau R. Baliga N.S. The Gaggle: an open-source software system for integrating bioinformatics software and data sources.BMC Bioinformatics. 2006; 7: 176Crossref PubMed Scopus (133) Google Scholar) for data transfer, provide multifunctional platforms for data analysis. The Gaggle-based integrated solutions are powerful but particularly the divergent interfaces users are confronted with, might be challenging for nonspecialists. Finally, several commercial solutions are available, e.g. the Ingenuity Pathway Analysis (Ingenuity Systems) and ProteinCenter (Thermo Scientific/Proxeon). However, the high expenses associated with these programs and the intransparent nature of commercial software solutions might pose a significant obstacle to the application of these. These issues led us to develop the Graphical Proteomics Data Explorer (GProX), a software package for comprehensive and integrated bioinformatics analysis and visualization of large proteomics data sets. The basic concept of GProX is to provide a data browsing environment similar to common spreadsheet applications and from this interface make available an array of functionalities for analyzing proteomics data. The major goal of GProX is thus to allow experimenters without specialized skills in bioinformatics to analyze their data and produce graphical representations to be used in scientific publications or presentations. GProX focuses on making available a wide array of useful analysis functions within a single platform and focuses particularly on a user-friendly interface and the production of high-quality graphical objects. The software, as well as the complete source code, is freely available for download from http://gprox.sourceforge.net. The overall structure and context of GProX is illustrated schematically in Fig. 1. The main program and the user interface are written in the Visual Basic programming language under the Microsoft .NET environment. The object-oriented architecture and the large selection of graphical objects available in the .NET environment allows creation of user-friendly graphical interfaces, which resemble common Microsoft Windows applications. Furthermore, the large repository of high-level functionalities implemented in .NET makes it an efficient platform for interfacing data and communicating with the operation system (OS). One drawback of the .NET environment is that it demands a Microsoft Windows OS. But because most, if not all, mass spectrometer vendors proprietary software is only available for Windows, most proteomics labs anyway require Windows systems for data generation and analysis. Most of the features in GProX for data processing and generation of graphical objects are implemented as scripts written for R, the free software environment for statistical computing and graphics (30R Development Core Team R: A Language and Environment for Statistical Computing.http://www.R-project.orgDate: 2010Google Scholar), see supplemental Fig. S1A. R has during recent years obtained increasing popularity for processing omics data, promoted especially by the rapidly growing number of add-in libraries available from the Bioconductor consortium (31Gentleman R.C. Carey V.J. Bates D.M. Bolstad B. Dettling M. Dudoit S. Ellis B. Gautier L. Ge Y. Gentry J. Hornik K. Hothorn T. Huber W. Iacus S. Irizarry R. Leisch F. Li C. Maechler M. Rossini A.J. Sawitzki G. Smith C. Smyth G. Tierney L. Yang J.Y. Zhang J. Bioconductor: open software development for computational biology and bioinformatics.Genome Biol. 2004; 5: R80Crossref PubMed Google Scholar). In addition, R is well suited for processing the large amounts of numerical data produced by quantitative proteomics experiments and contains a range of well-developed functions for generating simple as well as advanced graphical outputs in a number of formats. The interfacing between the .NET based user interface and R is achieved via tab-delimited files that are used as input for external R instances. After completion of the R-task, tab-delimited output files are interfaced back to the main program and graphical objects are saved locally and displayed in the main program. During normal operation the user is not confronted with the R tasks, which are executed as external processes in the background. Not least for debugging purposes, both standard and error output from the R-process is fed back into the main program and saved locally. All R functions implemented in GProX are collected in documented R packages (GProXutils, GProXplot, and GProXanalysis) which are distributed together with the program. These packages and their included functions can also be used directly or as a source of inspiration for experimenters familiar with R to modify or expand the functionalities currently implemented in GProX (see supplemental Methods and supplemental Fig. S1). The GProX installer can be downloaded freely and without registration from the project website (http://gprox.sourceforge.net) as a self-installing executable file. The program requires a standard desktop computer with a contemporary Windows OS (we have tested the program on English versions of Windows XP and Windows7 32/64 bit) and version 3.5 or later of the .NET environment installed. Since the program depends critically on a working installation of R and several add-in libraries, these components must also be installed on the user's computer. During the first startup of GProX the user is prompted to download and install these components. To assist the user with this task we have included an automatic setup procedure. Also, detailed information about manual installation of R and add-in libraries is given in the GProX help. Furthermore, the whole installation procedure is described in detail in the tutorial distributed along with the program. For further support we have created a GProX Help Google group where users can post questions and comments. This group can be accessed from the project website or directly from http://groups.google.com/group/gprox. The input format required by GProX is a tab- or character delimited file with column headers in the first row and each protein entry in separate rows. The minimum information present for each protein in the input file is the database accession key(s) and quantitative information. If needed, any additional information available from preceding data processing can also be imported from this input data file for subsequent applications within GProX. This additional information might be e.g. peptide information or database annotations such as Gene Ontology (22Ashburner M. Ball C.A. Blake J.A. Botstein D. Butler H. Cherry J.M. Davis A.P. Dolinski K. Dwight S.S. Eppig J.T. Harris M.A. Hill D.P. Issel-Tarver L. Kasarskis A. Lewis S. Matese J.C. Richardson J.E. Ringwald M. Rubin G.M. Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.Nat. Genet. 2000; 25: 25-29Crossref PubMed Scopus (25781) Google Scholar) or Pfam (32Finn R.D. Mistry J. Tate J. Coggill P. Heger A. Pollington J.E. Gavin O.L. Gunasekaran P. Ceric G. Forslund K. Holm L. Sonnhammer E.L. Eddy S.R. Bateman A. The Pfam protein families database.Nucleic Acids Res. 2010; 38: D211-222Crossref PubMed Scopus (2449) Google Scholar). As a consequence of this generic input format the application of GProX is not restricted to a particular mass spectrometry instrumentation platform, quantitation technique or data processing and quantitation software. Import of data into GProX is done via an input wizard (see supplemental Fig. S2A) in which the user is requested to select the input file, specify the columns containing accession keys and quantitation ratios and finally, if required, specify the experimental setup. To specify the experimental setup, data columns containing quantitative data can be allocated to separate experiments. The arrangement of the experimental setup facilitates the analysis of more sophisticated quantitative proteomics experiments, where e.g. the temporal regulation patterns after different treatments are compared. In this case, a single experiment would include quantitation data from different time-points for one treatment condition. Multiple independent experiments can then be analyzed either separately or together and compared within GProX. Upon creation of such a session, all information required to recreate a previous session is saved inside the session folder as a flat file (.gpx file) from which users can reload sessions to continue an analysis. GProX employs a data management setup in which the input data file is regarded as a database, from which only columns specified are placed in a session data table containing only relevant information for data analysis. Other data columns from the input table can be imported on demand during analysis and is appended to the session table(s). Because of the fact that all data processing is performed only on the active session table, the processing efficiency is improved and, in addition, the original input file is left unchanged. We have attempted to bring the graphics produced by GProX as close to a final state as possible, but users might want to fine-tune or layout their figures in an external graphics editor such as e.g. Adobe Illustrator or Corel Draw before using them for presentations or in publications. To this end, several output formats, including vector (eps, pdf) and bitmap (png, bmp, jpg, tif) graphics enable the user to open and freely modify figures in external applications. The main user interface of GProX is similar in appearance to that of spreadsheet programs as e.g. Open Office Calc or Microsoft Excel. All operations within the program are accessible via a ribbon control, menus, and dedicated dialog boxes (Fig. 2). The main user environment is a multiple-document interface containing up to five windows (supplemental Fig. S2B). The Session Info window contains all information about the current session, including a list of all produced tabular and graphical objects as well as an overview of the specified experimental setup. The Data Tables window contains the session tables as collection of tab-pages. Upon starting a new session, a single session table is created, but during the course of an analysis session the user can move subsets of this table to new session tables. Data analysis steps are performed only on the active tab-page, allowing the user to processes and analyze subsets of the complete data collection. During the course of an analysis session a large collection of graphical objects can be created and these are displayed in a dedicated Graphics window. To navigate through the graphical objects the Graphics window contains an explorer panel to select displayable items and rename, delete, or move files. All tabular output from analysis steps is contained in the Analysis Tables window as tab page collection, similar to the Data Tables window. Finally, the input table can be displayed inside GProX, this however, serves mainly reference purposes, because the input table is solely used as a database without changing its content. We have strived to make the software as intuitive and user-friendly as possible, but especially the more advanced analysis steps allow changing several associated parameters. To assist the user in selecting these parameters and to offer support in the basic use of the program, a compiled HTML help (chm) functionality and tooltip help boxes in individual dialogs assist in using the software. Furthermore, a step-by-step tutorial describing an example workflow in GProX is distributed along with the program to help the users getting started with the program. Details about processing algorithms and data analysis strategies are described in the supplemental methods and outlined in supplemental Figs. S4 to S6. To demonstrate the features of GProX, a previously published quantitative proteomics data set comparing phospho-Tyrosine dependent signaling 5 and 30 min after epidermal growth factor (EGF) or hepatocyte growth factor (HGF) stimulation (33Hammond D.E. Hyde R. Kratchmarova I. Beynon R.J. Blagoev B. Clague M.J. Quantitative analysis of HGF and EGF-dependent phosphotyrosine signaling networks.J. Proteome Res. 2010; 9: 2734-2742Crossref PubMed Scopus (41) Google Scholar) was used (the experimental setup is summarized in supplemental Fig. S3). To analyze these data sets, the data was imported to GProX specifying the IPI accession keys as protein identifiers and corresponding columns containing the quantitation ratios after EGF or HGF stimulation were allocated to two separate experiments. An overview of the analysis steps outlined in the following sections is illustrated in Fig. 3A. One main goal of GProX is to provide a comfortable environment for browsing quantitative proteomics data in a spreadsheet-like fashion. Basic functions such as sort, find, deletion or insertion of rows and columns as well as arithmetic operations as e.g. summing or averaging over entire columns support organization and modification of experimental data. In this regard, data subsets can also be allocated to new data tables, providing data grouping based on e.g. functional categories or regulation. Furthermore, the experimental setup defined during data import can be easily modified in the course of an analysis session, e.g. to compare experimental properties or to account for data processing within GProX. Quantitative data in the form of intensity ratios often requires transformation to other scales. With GProX logarithmic, square root and inverse transformations can be readily performed. From the sample data set reverse hits (used to determine false discovery rate (13Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (8607) Google Scholar, 34Elias J.E. Gygi S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat. Methods. 2007; 4: 207-214Crossref PubMed Scopus (2726) Google Scholar)) and common contaminant identifications like trypsin, bovine serum albumin and keratins were moved to a separate table and the remaining protein quantitations were Log2 transformed. One strong feature in GProX is its on-the-fly plotting function, where the quantitation data from one or more proteins can be displayed as line diagrams (Fig. 3B). This is very helpful, as it gives a quick overview of regulation patterns of selected proteins and differences of treatment conditions. Here, if more than one protein is selected and there is more than one experimental condition defined, quantitative data can be plotted in two different ways: (1) one plot is generated for each selected protein, including all experimental conditions or (2) one plot is generated for each experimental condition, each plot containing data from all selected proteins for this condition. To exemplify the plotting function we show in Fig. 3B the regulation of key growth factor signaling proteins. From these plots it is striking that although the receptor proteins EGFR and HGFR are regulated oppositely between the stimulations, a very similar regulation is observed for the key effector kinases ERK1/2. Often it is also required to obtain immediately all available information from a given protein. Therefore, complete International Protein Index (35Kersey P.J. Duarte J. Williams A. Karavidopoulou Y. Birney E. Apweiler R. The International Protein Index: an integrated database for proteomics experiments.Proteomics. 2004; 4: 1985-1988Crossref PubMed Scopus (630) Google Scholar) and UniProt (36UniProt Consortium The Universal Protein Resource (UniProt) in 2010.Nucleic Acids Res. 2010; 38: D142-148Crossref PubMed Scopus (987) Google Scholar) database sheets, which link out to further information sources, can be promptly displayed for the pro

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

GProX, a User-Friendly Platform for Bioinformatics Analysis and Visualization of Quantitative Proteomics Data