SwePep, a Database Designed for Endogenous Peptides and Mass Spectrometry
2006; Elsevier BV; Volume: 5; Issue: 6 Linguagem: Inglês
10.1074/mcp.m500401-mcp200
ISSN1535-9484
AutoresMaria Fälth, Karl Sköld, Mathias Norrman, Marcus Svensson, David Fenyö, Per E. Andrén,
Tópico(s)Neuropeptides and Animal Physiology
ResumoA new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public. A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public. Proteomic tools, including two-dimensional gel electrophoresis in combination with MS, are limited to the analysis of proteins >10 kDa, and therefore an important part of the proteome is generally ignored in proteomic studies. This part of the proteome consists of endogenous proteins and peptides that include well characterized families of neuropeptide transmitters, neuropeptide modulators, hormones, and fragments of functional proteins, some of which are essential in many biological processes (1Hokfelt T. Millhorn D. Seroogy K. Tsuruo Y. Ceccatelli S. Lindh B. Meister B. Melander T. Schalling M. Bartfai T. Terenius L. Coexistence of peptides with classical neurotransmitters.Experientia. 1987; 43: 768-780Crossref PubMed Scopus (305) Google Scholar, 2Hokfelt T. Broberger C. Xu Z.Q. Sergeyev V. Ubink R. Diez M. Neuropeptides—an overview.Neuropharmacology. 2000; 39: 1337-1356Crossref PubMed Scopus (465) Google Scholar). The peptides exert potent biological actions in the respiratory, cardiovascular, endocrine, inflammatory, and nervous systems (1Hokfelt T. Millhorn D. Seroogy K. Tsuruo Y. Ceccatelli S. Lindh B. Meister B. Melander T. Schalling M. Bartfai T. Terenius L. Coexistence of peptides with classical neurotransmitters.Experientia. 1987; 43: 768-780Crossref PubMed Scopus (305) Google Scholar, 2Hokfelt T. Broberger C. Xu Z.Q. Sergeyev V. Ubink R. Diez M. Neuropeptides—an overview.Neuropharmacology. 2000; 39: 1337-1356Crossref PubMed Scopus (465) Google Scholar). The study of endogenously processed peptides has been termed "peptidomics" (3Schulz-Knappe P. Zucht H.D. Heine G. Jurgens M. Hess R. Schrader M. Peptidomics: the comprehensive analysis of peptides in complex biological mixtures.Comb. Chem. High Throughput Screen. 2001; 4: 207-217Crossref PubMed Scopus (197) Google Scholar). Peptidomics complements molecular biological approaches in its ability to characterize the processing of functional gene products. It allows direct observation of changes in the amount of peptides and small proteins and their post-translational modifications. The main difficulties in the analysis of endogenous peptides are their rapid degradation during extraction and purification (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar) and that their average tissue content is less than 0.1% of that of proteins (5Minamino N. Tanaka J. Kuwahara H. Kihara T. Satomi Y. Matsubae M. Takao T. Determination of endogenous peptides in the porcine brain: possible construction of peptidome, a fact database for endogenous peptides.J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2003; 792: 33-48Crossref PubMed Scopus (32) Google Scholar). Endogenous peptides also often contain post-translational modifications (PTMs) 1The abbreviations used are: PTM, post-translational modification; CLIP, corticotropin-lipotropin intermediary peptide; GPCR, G-protein-coupled receptor; LTQ, linear trap quadrupole; UniProt, Universal Protein Resource; XML, extensible markup language. 1The abbreviations used are: PTM, post-translational modification; CLIP, corticotropin-lipotropin intermediary peptide; GPCR, G-protein-coupled receptor; LTQ, linear trap quadrupole; UniProt, Universal Protein Resource; XML, extensible markup language. (e.g. acetylation, amidation, and phosphorylation), adding to the difficulty of deciphering the obtained mass spectra. An important functional group of the peptidome is the endogenous peptides in the brain. The neuropeptides range in length from 3 to 100 amino residues and are up to 50 times larger than classical neurotransmitters (6Hokfelt T. Bartfai T. Bloom F. Neuropeptides: opportunities for drug discovery.Lancet Neurol. 2003; 2: 463-472Abstract Full Text Full Text PDF PubMed Scopus (243) Google Scholar). The neuroactive peptides are derived from the processing of secretory proteins that are formed in the cell body on polyribosomes attached to the cytoplasmic surface of the endoplasmic reticulum. They are then processed in the endoplasmic reticulum and moved to the Golgi apparatus for further processing. In the central nervous system, most neurons contain biologically active peptides together with classical neurotransmitters. Neuropeptides are implicated in the pathology of various neurological and psychiatric disorders such as depression, neurodegenerative diseases, and eating and sleeping disorders (2Hokfelt T. Broberger C. Xu Z.Q. Sergeyev V. Ubink R. Diez M. Neuropeptides—an overview.Neuropharmacology. 2000; 39: 1337-1356Crossref PubMed Scopus (465) Google Scholar). Despite their biological and physiological importance there is at the moment a lack of easily accessible information in the public databases regarding endogenous peptides, making it difficult to identify the endogenous peptides from complex samples. MS in combination with two-dimensional gel electrophoresis or LC has become the main tool in proteomics for the identification of peptides and proteins and typically generates large sets of data (7Fenyo D. Identifying the proteome: software tools.Curr. Opin. Biotechnol. 2000; 11: 391-395Crossref PubMed Scopus (118) Google Scholar). By using a search engine, the data are compared with protein sequence collections such as UniProt Knowledgebase (8Bairoch A. Apweiler R. Wu C.H. Barker W.C. Boeckmann B. Ferro S. Gasteiger E. Huang H. Lopez R. Magrane M. Martin M.J. Natale D.A. O'Donovan C. Redaschi N. Yeh L.S. The Universal Protein Resource (UniProt).Nucleic Acids Res. 2005; 33: 154-D159Google Scholar) or the non-redundant (nr) protein sequence collection from the National Center for Biotechnology Information (NCBI). These protein sequence databases also offer additional information, including brief functional descriptions (if available), an annotation of sequence features (e.g. modifications), secondary and tertiary structure predictions, key references, and links to other databases. Lately a number of databases have become more oriented against specific proteomic subareas (5Minamino N. Tanaka J. Kuwahara H. Kihara T. Satomi Y. Matsubae M. Takao T. Determination of endogenous peptides in the porcine brain: possible construction of peptidome, a fact database for endogenous peptides.J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 2003; 792: 33-48Crossref PubMed Scopus (32) Google Scholar, 9Lu P. Szafron D. Greiner R. Wishart D.S. Fyshe A. Pearcy B. Poulin B. Eisner R. Ngo D. Lamb N. PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization.Nucleic Acids Res. 2005; 33: 147-D153Google Scholar, 10Xenarios I. Salwinski L. Duan X.J. Higney P. Kim S.M. Eisenberg D. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.Nucleic Acids Res. 2002; 30: 303-305Crossref PubMed Scopus (1381) Google Scholar). Although several of these databases are well organized and easy to use, they do not always fulfill all new demands. At present there is no searchable database specifically designed for identification of endogenous peptides. In the present study we have developed a database for endogenous peptides and small proteins below 10 kDa. The database consists of biologically active peptides such as classical neuropeptides and hormones, potential biologically active peptides, and uncharacterized peptides. Several examples on improved neuropeptide identification utilizing SwePep and MS are demonstrated. SwePep is a Java (11The Human Genome Issue.Nature. 2001; 409: 745-953Crossref PubMed Scopus (35) Google Scholar) Enterprise Edition (J2EE) application implemented according to a multitier application model (12Barish G. Building Scalable and High-Performance Java Web Applications Using J2EE Technology. Addison-Wesley, Boston2002Google Scholar). It consists of a dynamic web interface, a relational database, and a business tier, which uses the client input from the web interface to construct and execute queries to the database. The web interface was developed using hypertext markup language (HTML), and the dynamic content was developed using Java ServerPages (JSP), which at runtime compiles to JavaServlets. This makes it possible for the web interface to communicate with the server side functions and the database. When a user sends a request to the server through the web interface the request is caught by a servlet. The servlet first validates the request and then processes the request. A request to SwePep often involves database queries, and they are managed by Enterprise Java Beans (EJB). After the request is processed the control servlet sends a response back to the web interface, and the result of the request is displayed to the user. The SwePep database is implemented as a relational database (13Silberschatz A. Korth H. Sudarshan S. Database System Concepts. 4th Ed. McGraw-Hill, New York2002Google Scholar) using an MySql database management system (11The Human Genome Issue.Nature. 2001; 409: 745-953Crossref PubMed Scopus (35) Google Scholar). SwePep is specifically designed for endogenous peptides. Every peptide in the database is connected to the following information: name, sequence, precursor protein, position in precursor sequence, modifications, location, organisms, reference, mass, and pI. The database is designed to minimize the data redundancy. Therefore some objects are split into two or more tables that are connected to each other, e.g. peptide and peptide type. This way the peptide sequence, mass, name, and pI are only stored once in the database even though the peptide occurs many times in different precursors (Fig. 1). The information in SwePep is collected from three different sources: experimental data produced in our laboratory (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar), peptide information from UniProt (version 49.0, released February 2006), and peer-reviewed publications. The data from UniProt will be updated every time a major release of UniProt is made. The rest of the data will be updated continuously. For all the peptides in the SwePep database, monoisotopic mass, average mass, and pI (14Bjellqvist B. Hughes G.J. Pasquali C. Paquet N. Ravier F. Sanchez J.C. Frutiger S. Hochstrasser D. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences.Electrophoresis. 1993; 14: 1023-1031Crossref PubMed Scopus (781) Google Scholar) have been calculated according to their amino acid sequence. Currently SwePep consists of 4180 unique endogenous peptides, and many of these are post-translationally modified. So far, ∼100 neuropeptides have been experimentally identified from brain tissue in our laboratory. The neuropeptides in SwePep have been derived from 1643 precursor proteins from 394 different species. All peptides have searchable descriptors such as mass (monoisotopic and average), modifications, precursor information, and organism affiliation. Because the experimental data contain peptides and proteins in the mass range up to 10 kDa, the SwePep database also contains 25,047 small proteins with sequence length less than or equal to 120 amino acids. This makes it possible to identify more of the contents in experimental samples. The current state of the number of peptides in the SwePep database is shown in Table I.Table IClassification and number of peptides in SwePepBiologically active peptidesPotential biologically active peptidesUncharacterized PeptidesUniProt413600Experimentally identified peptides (in house)372816 Open table in a new tab Peptide and precursor protein data have been collected from UniProt by downloading the UniProt database in extensible markup language (XML) format. The XML file was searched for entries that had one or more annotated peptide. All entries with annotated peptides were saved into a new file that was used to automatically insert the entries into SwePep. The SwePep database is also populated with novel peptides from brain tissue identified in our laboratory from different species. For this data set SwePep also contains information about the experimental conditions such as sample information (i.e. species and treatment), mass spectral raw data, and processed data. To ensure that the information in the SwePep database is reliable, all peptides that are stored in SwePep are sorted into three different classes: (i) biologically active peptides, (ii) potential biologically active peptides, and (iii) uncharacterized peptides. This group of peptides contains the classical neuropeptides, such as substance P, neurotensin, enkephalins, and dynorphins, that are present in a neuron together with classical neurotransmitters. This group also contains peptides functioning as hormones, a class of peptides that are secreted into the blood stream to exert endocrine functions. All the neuropeptides and hormones in this group have known biological functions. This group contains pharmacologically uncharacterized peptides (between 3 and 100 amino acids) that potentially are biologically active. They are identified in tissues or body fluids, which have been instantly proteolytically deactivated postmortem or postsampling, and have characteristics similar to the neuropeptides and hormones, i.e. they have specific convertase processing sites (15Steiner D.F. The proprotein convertases.Curr. Opin. Chem. Biol. 1998; 2: 31-39Crossref PubMed Scopus (576) Google Scholar). Modifications such as amidation of the C terminus and N-terminal acetylation are regarded as important criteria because many bioactive peptides are amidated by conversion of a C-terminal glycine to a carboxamide. Peptides that do not fulfill the criteria of the groups above belong to this group. Among others, this group consists of peptides from samples not rapidly proteolytically deactivated postsampling. Rats (Sprague-Dawley) and mice (C57/BL6) were sacrificed as previously described (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar) (Murimachi Kikai, Tokyo, Japan). The brain regions of interest were thereafter rapidly dissected out and stored at −80 °C. The brain tissue was suspended in cold extraction solution (0.25% acetic acid) and homogenized by microtip sonication (Vibra cell 750, Sonics & Materials Inc., Newtown, CT) to a concentration of 0.2 mg of tissue/μl. The suspension was centrifuged at 20,000 × g for 30 min at 4 °C. The protein- and peptide-containing supernatant was transferred to a centrifugal filter (Microcon YM-10, Millipore, Bedford, MA) with a molecular mass limit of 10,000 Da and centrifuged at 14,000 × g for 45 min at 4 °C. Finally the peptide filtrate was immediately frozen and stored at −80 °C until analysis. The peptide extract was separated using on-line nanoflow reversed phase capillary liquid chromatography (Ettan MDLC, GE Healthcare, Uppsala, Sweden) and analyzed with ESI-MS using a Q-TOF (Waters) or Finnigan LTQ or LTQ-FT (Thermo Electron, San Jose, CA) mass spectrometer (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar). Identification of endogenous peptides using MS data is time-consuming. The available software tools for identification are generally designed for proteins, which are cleaved to peptides by specific enzymes, such as trypsin, prior to MS analysis (7Fenyo D. Identifying the proteome: software tools.Curr. Opin. Biotechnol. 2000; 11: 391-395Crossref PubMed Scopus (118) Google Scholar). However, the endogenous peptides will not contain many, if any, of such cleavage sites because they are processed at other specific sites by processing enzymes (proprotein convertases) that release the bioactive peptide from the precursor but also because of their small number of amino acid residues. This impairs the possibility of getting good peptide fragmentation data and a significant identification (16Fenyo D. Beavis R.C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes.Anal. Chem. 2003; 75: 768-774Crossref PubMed Scopus (401) Google Scholar, 17Eriksson J. Chait B.T. Fenyo D. A statistical basis for testing the significance of mass spectrometric protein identification results.Anal. Chem. 2000; 72: 999-1005Crossref PubMed Scopus (107) Google Scholar). Furthermore another difficulty is that endogenous peptides often contain PTMs, which make them even more challenging to identify. The main purpose of the SwePep database is to speed up the identification process of endogenous peptides and to increase the number of identified peptides from complex tissue samples. By classifying the peptides in SwePep into three different classes: (i) biologically active peptides, (ii) potential biologically active peptides, and (iii) uncharacterized peptides, it is possible to store peptides and protein fragments not proven to be biologically active. Peptides that belong to the group of potential or uncharacterized peptides are moved to the group of biologically active peptides if they demonstrate biological activity. Previous studies suggest that the process of protein elimination and degradation should not be considered a random proteolysis yielding free amino acids subsequently utilized for various metabolic purposes. Instead it should be regarded as a complex process regulated by a system of tissue-specific enzymes and protein substrates. These peptides, complementary to the conventional regulatory systems, may be considered as another concept of a peptidergic regulatory system, giving rise to a large group of peptides, which are defined as tissue-specific peptide pool (18Karelin A.A. Blishchenko E. Ivanov V.T. Fragments of functional proteins: role in endocrine regulation.Neurochem. Res. 1999; 24: 1117-1124Crossref PubMed Scopus (15) Google Scholar). For example, hemorphins are small peptides generated by enzymatic hydrolysis of hemoglobin or blood (19Ivanov V.T. Karelin A.A. Philippova M.M. Nazimov I.V. Pletnev V.Z. Hemoglobin as a source of endogenous bioactive peptides: the concept of tissue-specific peptide pool.Biopolymers. 1997; 43: 171-188Crossref PubMed Scopus (155) Google Scholar, 20Piot J.M. Zhao Q. Guillochon D. Ricart G. Thomas D. Isolation and characterization of two opioid peptides from a bovine hemoglobin peptic hydrolysate.Biochem. Biophys. Res. Commun. 1992; 189: 101-110Crossref PubMed Scopus (128) Google Scholar, 21Brantl V. Gramsch C. Lottspeich F. Mertz R. Jaeger K.H. Herz A. Novel opioid peptides derived from hemoglobin: hemorphins.Eur. J. Pharmacol. 1986; 125: 309-310Crossref PubMed Scopus (198) Google Scholar). Their physiological functions are discussed because they are found in a variety of mammalian tissues and fluids (22Nishimura K. Hazato T. Isolation and identification of an endogenous inhibitor of enkephalin-degrading enzymes from bovine spinal cord.Biochem. Biophys. Res. Commun. 1993; 194: 713-719Crossref PubMed Scopus (68) Google Scholar, 23Glamsta E.L. Marklund A. Hellman U. Wernstedt C. Terenius L. Nyberg F. Isolation and characterization of a hemoglobin-derived opioid peptide from the human pituitary gland.Regul. Pept. 1991; 34: 169-179Crossref PubMed Scopus (109) Google Scholar, 24Barkhudaryan N. Oberthuer W. Lottspeich F. Galoyan A. Structure of hypothalamic coronaro-constrictory peptide factors.Neurochem. Res. 1992; 17: 1217-1221Crossref PubMed Scopus (25) Google Scholar, 25Glamsta E.L. Morkrid L. Lantz I. Nyberg F. Concomitant increase in blood plasma levels of immunoreactive hemorphin-7 and β-endorphin following long distance running.Regul. Pept. 1993; 49: 9-18Crossref PubMed Scopus (84) Google Scholar, 26Moisan S. Harvey N. Beaudry G. Forzani P. Burhop K.E. Drapeau G. Rioux F. Structural requirements and mechanism of the pressor activity of Leu-Val-Val-hemorphin-7, a fragment of hemoglobin β-chain in rats.Peptides. 1998; 19: 119-131Crossref PubMed Scopus (33) Google Scholar, 27Moisan S. Drapeau G. Burhop K.E. Rioux F. Mechanism of the acute pressor effect and bradycardia elicited by diaspirin crosslinked hemoglobin in anesthetized rats.Can J. Physiol. Pharmacol. 1998; 76: 434-442Crossref PubMed Scopus (19) Google Scholar). Hemorphin peptides were previously found in brain tissue not proteolytically deactivated (28Skold K. Svensson M. Kaplan A. Bjorkesten L. Astrom J. Andren P.E. A neuroproteomic approach to targeting neuropeptides in the brain.Proteomics. 2002; 2: 447-454Crossref PubMed Scopus (104) Google Scholar) but were not detected in tissue that had been proteolytically deactivated (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar). However, hemorphins are claimed to have biological activity and produce constriction of coronary vessels and platelet aggregation (24Barkhudaryan N. Oberthuer W. Lottspeich F. Galoyan A. Structure of hypothalamic coronaro-constrictory peptide factors.Neurochem. Res. 1992; 17: 1217-1221Crossref PubMed Scopus (25) Google Scholar) and to inhibit angiotensin-converting enzyme activity (29Lantz I. Glamsta E.L. Talback L. Nyberg F. Hemorphins derived from hemoglobin have an inhibitory action on angiotensin converting enzyme activity.FEBS Lett. 1991; 287: 39-41Crossref PubMed Scopus (94) Google Scholar). The whole peptide identification procedure, starting with experiment and ending with a list of identified peptides, is shown in Fig. 2. When using SwePep for peptide identification, the user typically starts by selecting a mass tolerance based on the mass accuracy of the mass spectrometer used in the peptide analysis. A file containing the experimental peptide masses are then matched against theoretical calculated masses in the database. The matching is performed both with and without annotated PTMs. It is also possible to add non-annotated modifications to the peptides to investigate other possible modifications. The results from the database search are presented as a list containing peptide name, precursor name, species, peptide sequence, possible PTMs, monoisotopic mass, and mass difference between experimental and theoretical mass. A peptide identity from SwePep is a suggestion for an identity and needs to be confirmed by analysis of the corresponding tandem mass spectrum. Mostly there is only one suggested peptide identification for each experimental mass. This makes it easy to verify the results from SwePep. For the time being this confirmation has to be performed manually, but in the future tandem mass spectra will be stored in the database, and the confirmation will be implemented automatically. A database search in SwePep takes less than a minute, making it an efficient way of to identify known neuropeptides so that the effort can be put into the identification of novel peptides by de novo sequencing. We have developed a new approach to study a large number of neuropeptides and used it for an investigation of the endogenous neuropeptide content of hypothalamic brain tissue samples from rat (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar, 28Skold K. Svensson M. Kaplan A. Bjorkesten L. Astrom J. Andren P.E. A neuroproteomic approach to targeting neuropeptides in the brain.Proteomics. 2002; 2: 447-454Crossref PubMed Scopus (104) Google Scholar). The MS data of the neuropeptide and small protein content in the hypothalamus were analyzed by an automated software program method for processing the results (DeCyder MS, GE Healthcare). The generated mass list consisting of deconvoluted mass data was matched against the SwePep database for neuropeptide matches. The absolute mass difference between the theoretical and experimental mass was selected not to exceed 0.2 Da for a match to be valid. All positive matches were recorded, and subsequent analysis was performed on experimental data pertaining to these matches to streamline the identification procedure. The final validation step was either manual inspection of tandem mass spectra, searching of sequence collections with tandem MS data for peptide identification, or a combination of both. From hypothalamic mouse brain tissue DeCyder MS detected ∼400 specific peptide masses. SwePep suggested 54 neuropeptide candidates, and of these, 31 neuropeptides were verified by tandem mass spectrometry (Table II)Table IISwePep-matched rat hypothalamic neuropeptidesUniProt accession numberPeptide namePeptide sequenceAnnotated modificationExperimental massMass differenceDaDaO35314Secretogranin I precursor585SFAKAPHLDL5941097.6880.102P01167Somatostatin-28-(1–12)89SANSNPAMAPRE1001243.6590.098P01186Arg-vasopressin24CYFQNCPRG321086.6240.186P01186Vasopressin-neurophysin 2-copeptin precursor151VQLAGTQESVDSAKPRVY1681947.1710.165P01194Melanotropin γ77YVMGHFRWDRF87C-Amidation1511.9070.183P01194Melanotropin α124SYSMEHFRWGKPV136C-Amidation1621.9030.121P01194Melanotropin α124SYSMEHFRWGKPV1361622.8140.048P01194CLIP103AEEETAGGDGRPEPSPRE120C-Amidation1881.9960.151P01322Insulin 1 A chain90GIVDQCCTSICSLYQLEN-YCN1102368.1690.184P04094Met-enkephalin100YGGFM104573.3240.098P04094Met-enkephalin-Arg-Phe263YGGFMRF269876.4880.092P04094Met-enkephalin-Arg-Gly-Leu188YGGFMRGL195899.5040.072P04094Proenkephalin A precursor198SPQLEDEAKELQ2091385.7720.104P04094Proenkephalin A precursor198SPQLEDEAKEL2081257.7480.140P04094Proenkephalin A precursor264GGFMRF269713.4220.090P04094Proenkephalin A precursor219VGRPEWWMDYQ2291465.8050.160P06300Leu-enkephalin166YGGFL170555.2990.030P06300α-Neoendorphin166YGGFLRKYP1741099.6960.115P06767Neurokinin A98HKTDSFVGLM107C-Amidation1132.6780.108P06767Substance P58RPKPQQFFGLM68C-Amidation1346.8280.100P06767C-terminal flanking peptide111ALNSVAYERSAMQNYE1261845.0100.173P07490Gonadoliberin I24QHWSYGLRPG33Pyrrolidone carboxyl acid C-amidation1181.7020.129P08435Neurokinin B82DMHDFFVGLM911210.6710.156P10354WE-14361WSRMDQLAKELTAE3741676.9990.180P10354Chromogranin A precursor395AYGFRDPGPQL4051219.7210.123P10362Secretoneurin184TNEIVEEQYTPQSLATLESV- FQELGKLTGPSNQ2163649.8990.099P10683Galanin33GWTLNSAGYLLGPHAIDN-HRSFSDKHGLT61C-Amidation3162.7650.190P13432SMR1-related undecapeptide23VRGPRRQHNPR331371.7970.026P13589Pituitary adenylate cyclase-activating polypeptide precursor111GMGENLAAAAVDDRAPLT1281771.0300.173P13668Stathmin2ASSDIQVKELEKRASGQAF20Acetylation2105.2640.189P14200Neuropeptide-glutamic acid-isoleucine131EIGDEENSAKFPI143C-Amidation1446.8550.156P14200Pro-MCH precursor131EIGDEENSAKFPIG1441504.8530.149P20068Tail peptide165ASYYY169665.3710.102P20068Neurotensin150QLYENKPRRPYIL1621688.9490.013P20068Neurotensin150QLYENKPRRPYIL162Pyroglutamic acid1671.9450.036P20156VGF protein precursor491PPEPVPPPRAAPAPTHV5071729.0670.136P23436Cerebellin1SGSAKVAFSAIRSTNH161631.9420.104P27682C-terminal peptide198SVPHFSEEEKEPE2101542.8020.118P27682C-terminal peptide198SVPHFSEEEKEPE210Phosphorylation1622.8140.164P28841Neuroendocrine convertase 2 precursor94IKMALQQEGFD1041278.7640.137P49192CART protein precursor82IPIYE86633.3770.040P60042Somatostatin-14103AGCKNFFWKTFTSC116Disulfide bond1636.8760.160P60042Somatostatin-14103AGCKNFFWKTFTSC1161638.9150.183P81278Prolactin-releasing peptide PrRP2033TPDINPAWYTGRGIRPVGRF52C-Amidation2271.3740.171P98087Cerebellin 288SGSAKVAFSATRSTNH1031619.9220.121Q62923Nociceptin135FGGFTGARKSARKLANQ1511808.0940.114Q62923Neuropeptide 2154FSEFMRQYLVLSMQSSQ1702080.1380.162Q8BFS3Relaxin 3 A chain117DVLAGLSSSCCEWGCSKSQ-ISSLC140Disulfide bond2460.2130.170Q8BFS3Relaxin 3 B chain24RPAPYGVKLCGREFIRAVIFTCG-GSRW503038.7720.187Q9QXU9Big PEN245LENSSPQAPARRLLPP2601745.0690.111Q9QXU9Little SAAS42SLSAASAPLAETSTPLRL591784.1300.162Q9QZQ4Urotensin-2110QHGTAPECFWKYCI1231681.8930.155Q9R0R3Apelin-1365QRPRLSHKGPMPF77Pyrrolidone carboxyl acid1532.8520.048 Open table in a new tab The fact that only 54 of the 400 peptides detected by DeCyder MS could be identified by SwePep clearly demonstrates the challenge of identifying endogenous peptides. There are only 195 endogenous peptides from the mouse in the database, and many of these originate from other tissues than the brain. This indicates that a large portion of our detected peptides from hypothalamus are novel and not annotated and therefore do not exist in SwePep. Furthermore of these 195 peptides, 73 have annotated disulfide bonds, which impair the possibility to identify these peptides because the disulfide bonds may inhibit fragmentation in tandem MS. It is an important task to characterize all modifications for understanding of the biological function and the regulations of the peptides. Unfortunately it is both time-consuming and difficult to fully characterize peptides and proteins with respect to their modifications. Important modifications include acetylation, amidation, phosphorylation, and sulfation (2Hokfelt T. Broberger C. Xu Z.Q. Sergeyev V. Ubink R. Diez M. Neuropeptides—an overview.Neuropharmacology. 2000; 39: 1337-1356Crossref PubMed Scopus (465) Google Scholar), and ∼300 different modifications have been reported for proteins (30Jensen O.N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry.Curr. Opin. Chem. Biol. 2004; 8: 33-41Crossref PubMed Scopus (462) Google Scholar). For example, 50–90% of eukaryotic proteins synthesized in the cytoplasm are isolated with their N termini acetylated (31Polevoda B. Sherman F. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins.J. Mol. Biol. 2003; 325: 595-622Crossref PubMed Scopus (348) Google Scholar), including the opioid neuropeptide dynorphin that is acetylated after it has been cleaved from its larger precursor (32Robinson P. Toney K. James S. Bennett H.P. Mass spectrometric and biological characterization of guinea-pig corticotrophin.Regul. Pept. 1995; 56: 89-97Crossref PubMed Scopus (4) Google Scholar). It is also estimated that about 30% of mammalian proteins are phosphorylated (33Mann M. Ong S.E. Gronborg M. Steen H. Jensen O.N. Pandey A. Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome.Trends Biotechnol. 2002; 20: 261-268Abstract Full Text Full Text PDF PubMed Scopus (783) Google Scholar). Furthermore disulfide bonds are frequent modifications among peptides. Because of the small size of the peptides, disulfide bonds provide the necessary constraints for the peptides to have a well defined three-dimensional structure. This adds another level of complexity because many disulfide-linked peptides remain intact in tandem MS as mentioned above (34Gorman J.J. Wallis T.P. Pitt J.J. Protein disulfide bond determination by mass spectrometry.Mass Spectrom. Rev. 2002; 21: 183-216Crossref PubMed Scopus (221) Google Scholar). The fact that endogenous peptides often are modified is also reflected in SwePep where the majority of the peptides are modified, e.g. 122 of the 195 peptides found in mouse have annotated modification, and 58 of the 122 have more than one annotated modification. By having information about modifications and thereby taking into account possible changes in the molecular mass, identification of modified peptides is easier. In the example above analyzing the hypothalamic brain tissue we could identify a number of neuropeptides with different PTMs using SwePep. Several of the identified neuropeptides, such as corticotropin-lipotropin intermediary peptide (CLIP) and substance P, had C-terminal amidation. N-terminally acetylated stathmin was identified as well as gonadoliberin I with both a pyrrolidone carboxyl acid and C-terminal amidation. Additionally a phosphorylated (at Ser14) and non-phosphorylated form of CLIP was also identified. Searching the SwePep database for peptides matching the experimental peptide masses 2505.01 and 2585.23 Da with a mass accuracy of 0.2 Da generated one matching peptide for each of the two masses. The suggested identities were Arg-CLIP and the phosphorylated species of Arg-CLIP. The identities were confirmed by tandem mass spectrometry. Some of these neuropeptides would have been difficult to identify without the suggested identity from SwePep. In a proteomic study of an animal model of Parkinson disease, we observed a decreased level of a 6.7-kDa peptide in mouse striatum using nano-LC ESI Q-TOF MS (35Skold K. Svensson M. Nilsson A. Zhang X. Nydahl K. Caprioli R.M. Svenningsson P. Andren P.E. Decreased striatal levels of PEP-19 following MPTP lesion in the mouse.J. Proteome Res. 2006; 5: 262-269Crossref PubMed Scopus (99) Google Scholar). Subsequent accurate mass data of the protein were acquired using nano-LC ESI LTQ-FT, and the MS data were compared with the SwePep. Because the mass accuracy of the LTQ-FT mass spectrometer is specified to less than 2 ppm by the manufacturer using external calibration (36Metelmann-Strupat W. Strupat K. Peterman S. Muenster H. Accurate mass measurements using the Finnigan LTQ FT. Thermo Electron Corp., Waltham, MA2004Google Scholar), all possible peptide matches in the database were ensured by limiting the search to 10 ppm. Two matches corresponding to the molecular mass of the peptide were retrieved from the search, i.e. acetylated PEP-19 (mass, 6714.2604 Da) from mouse/rat and small venom protein 1 precursor (mass, 6714.2433 Da) from parasitoid wasp. The mass was calculated from the most intense charge state at m/z 747.0338 (Fig. 3). The suggested identity of the protein was also confirmed to be acetylated PEP-19 by tandem MS. Traditionally the discovery of several novel peptides has been achieved by searching for ligands to the G-protein-coupled receptors (GPCRs). For example, the first neuropeptide GPCR ligand to be discovered was orphaninFQ/nociceptin, which is a ligand for an opioid-like receptor (37Meunier J.C. Mollereau C. Toll L. Suaudeau C. Moisand C. Alvinerie P. Butour J.L. Guillemot J.C. Ferrara P. Monsarrat B. Mazarguil H. Vassart G. Parmentier M. Costentin J. Isolation and structure of the endogenous agonist of opioid receptor-like ORL1 receptor.Nature. 1995; 377: 532-535Crossref PubMed Scopus (1796) Google Scholar, 38Reinscheid R.K. Nothacker H.P. Bourson A. Ardati A. Henningsen R.A. Bunzow J.R. Grandy D.K. Langen H. Monsma Jr., F.J. Civelli O. Orphanin FQ: a neuropeptide that activates an opioidlike G protein-coupled receptor.Science. 1995; 270: 792-794Crossref PubMed Scopus (1752) Google Scholar). It is interesting to note that there exist about 550 GPCR genes in the human genome and that neuropeptides are ligands for about 20% of them. The classical transmitters constitute up to 55% of the GPCR ligands, and 25% have no known ligand (6Hokfelt T. Bartfai T. Bloom F. Neuropeptides: opportunities for drug discovery.Lancet Neurol. 2003; 2: 463-472Abstract Full Text Full Text PDF PubMed Scopus (243) Google Scholar). Recently we were able to identify a number of novel endogenous peptides from rat hypothalamus. Moreover post-translational modifications of some of these novel peptides were also identified. These novel peptides from rat hypothalamus have been added to SwePep. We also have identified and added an additional 30 novel peptides from various regions in the mouse and rat brain to SwePep. The identities of these peptides will be published separately. Our technology, which includes instant deactivation of processing enzymes in the brain and highly sensitive MS analysis (4Svensson M. Skold K. Svenningsson P. Andren P.E. Peptidomics-based discovery of novel neuropeptides.J. Proteome Res. 2003; 2: 213-219Crossref PubMed Scopus (200) Google Scholar), may contribute to additional identification of novel biologically active neuropeptides, which will be added to the SwePep database. We have developed a novel database for endogenous peptides, SwePep, that contain approximately 4200 endogenous peptides, hormones, potential neuropeptides, and uncharacterized peptides from 394 different species to facilitate and improve endogenous peptide identification utilizing MS. A light version of the SwePep database is accessible through the internet, www.swepep.org. The website will grow continuously. It is possible to search for peptides according to mass, name, organism affiliation, UniProt accession number, or a combination of them. The result of the search contain detailed information about the peptide such as precursor name, precursor sequence, peptide name, mass, sequence, peptide function, and references.
Referência(s)