Artigo Acesso aberto Revisado por pares

The Application of New Software Tools to Quantitative Protein Profiling Via Isotope-coded Affinity Tag (ICAT) and Tandem Mass Spectrometry

2003; Elsevier BV; Volume: 2; Issue: 7 Linguagem: Inglês

10.1074/mcp.m300041-mcp200

ISSN

1535-9484

Autores

Priska D. von Haller, Eugene C. Yi, Samuel Donohoe, Kelly Vaughn, Andrew Keller, Alexey I. Nesvizhskii, Jimmy K. Eng, Xiaojun Li, David R. Goodlett, Ruedi Aebersold, Julian D. Watts,

Tópico(s)

Mass Spectrometry Techniques and Applications

Resumo

Proteomic approaches to biological research that will prove the most useful and productive require robust, sensitive, and reproducible technologies for both the qualitative and quantitative analysis of complex protein mixtures. Here we applied the isotope-coded affinity tag (ICAT) approach to quantitative protein profiling, in this case proteins that copurified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. With the ICAT approach, cysteine residues of the two related protein isolates were covalently labeled with isotopically normal and heavy versions of the same reagent, respectively. Following proteolytic cleavage of combined labeled proteins, peptides were fractionated by multidimensional chromatography and subsequently analyzed via automated tandem mass spectrometry. Individual tandem mass spectrometry spectra were searched against a human sequence database, and a variety of recently developed, publicly available software applications were used to sort, filter, analyze, and compare the results of two repetitions of the same experiment. In particular, robust statistical modeling algorithms were used to assign measures of confidence to both peptide sequences and the proteins from which they were likely derived, identified via the database searches. We show that by applying such statistical tools to the identification of T cell lipid raft-associated proteins, we were able to estimate the accuracy of peptide and protein identifications made. These tools also allow for determination of the false positive rate as a function of user-defined data filtering parameters, thus giving the user significant control over and information about the final output of large-scale proteomic experiments. With the ability to assign probabilities to all identifications, the need for manual verification of results is substantially reduced, thus making the rapid evaluation of large proteomic datasets possible. Finally, by repeating the experiment, information relating to the general reproducibility and validity of this approach to large-scale proteomic analyses was also obtained. Proteomic approaches to biological research that will prove the most useful and productive require robust, sensitive, and reproducible technologies for both the qualitative and quantitative analysis of complex protein mixtures. Here we applied the isotope-coded affinity tag (ICAT) approach to quantitative protein profiling, in this case proteins that copurified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. With the ICAT approach, cysteine residues of the two related protein isolates were covalently labeled with isotopically normal and heavy versions of the same reagent, respectively. Following proteolytic cleavage of combined labeled proteins, peptides were fractionated by multidimensional chromatography and subsequently analyzed via automated tandem mass spectrometry. Individual tandem mass spectrometry spectra were searched against a human sequence database, and a variety of recently developed, publicly available software applications were used to sort, filter, analyze, and compare the results of two repetitions of the same experiment. In particular, robust statistical modeling algorithms were used to assign measures of confidence to both peptide sequences and the proteins from which they were likely derived, identified via the database searches. We show that by applying such statistical tools to the identification of T cell lipid raft-associated proteins, we were able to estimate the accuracy of peptide and protein identifications made. These tools also allow for determination of the false positive rate as a function of user-defined data filtering parameters, thus giving the user significant control over and information about the final output of large-scale proteomic experiments. With the ability to assign probabilities to all identifications, the need for manual verification of results is substantially reduced, thus making the rapid evaluation of large proteomic datasets possible. Finally, by repeating the experiment, information relating to the general reproducibility and validity of this approach to large-scale proteomic analyses was also obtained. A main objective of proteomics research is the systematic identification and quantification of the proteins expressed in a cell, or contained within a cell compartment or other protein complex. The common approach to quantitative protein analysis to date has been the combination of protein separation, most commonly high-resolution two-dimensional polyacrylamide gel electrophoresis (2DE) 1The abbreviations used are: 2DE, two-dimensional polyacrylamide gel electrophoresis; EM, expectation maximization; IADIFF, INTERACT differential; ICAT, isotope-coded affinity tag; LC, liquid chromatography; μLC-MS/MS, microcapillary-liquid chromatography tandem mass spectrometry; MIF, macrophage inhibitory factor; MS, mass spectrometry; MS/MS, tandem mass spectrometry; pcomp, computed probability that the given peptide sequence assignment is correct.; Pcomp, computed probability that the given protein identification is correct.; TCR, T cell receptor. 1The abbreviations used are: 2DE, two-dimensional polyacrylamide gel electrophoresis; EM, expectation maximization; IADIFF, INTERACT differential; ICAT, isotope-coded affinity tag; LC, liquid chromatography; μLC-MS/MS, microcapillary-liquid chromatography tandem mass spectrometry; MIF, macrophage inhibitory factor; MS, mass spectrometry; MS/MS, tandem mass spectrometry; pcomp, computed probability that the given peptide sequence assignment is correct.; Pcomp, computed probability that the given protein identification is correct.; TCR, T cell receptor. and tandem mass spectrometry (MS/MS). For this approach, protein identification is accomplished by individual spot excision, in-gel-digestion, and sequence identification by MS/MS. When desired, relative protein quantification is achieved by visualizing differences in the 2DE patterns from related samples via silver staining or radiolabeling (1Boucherie H. Sagliocco F. Joubert R. Maillet I. Labarre J. Perrot M. Two-dimensional gel protein database of Saccharomyces cerevisiae.Electrophoresis. 1996; 17: 1683-1699Crossref PubMed Scopus (70) Google Scholar, 2Gygi S.P. Rochon Y. Franza B.R. Aebersold R. Correlation between protein and mRNA abundance in yeast.Mol. Cell. Biol. 1999; 19: 1720-1730Crossref PubMed Scopus (3101) Google Scholar, 3Link A.J. Hays L.G. Carmack E.B. Yates 3rd, J.R. Identifying the major proteome components of Haemophilus influenzae type-strain NCTC 8143.Electrophoresis. 1997; 18: 1314-1334Crossref PubMed Scopus (161) Google Scholar, 4Garrels J.I. McLaughlin C.S. Warner J.R. Futcher B. Latter G.I. Kobayashi R. Schwender B. Volpe T. Anderson D.S. Mesquita-Fuentes R. Payne W.E. Proteome studies of Saccharomyces cerevisiae: Identification and characterization of abundant proteins.Electrophoresis. 1997; 18: 1347-1360Crossref PubMed Scopus (115) Google Scholar, 5Shevchenko A. Jensen O.N. Podtelejnikov A.V. Sagliocco F. Wilm M. Vorm O. Mortensen P. Boucherie H. Mann M. Linking genome and proteome by mass spectrometry: Large-scale identification of yeast proteins from two dimensional gels.Proc. Natl. Acad. Sci. U. S. A. 1996; 93: 14440-14445Crossref PubMed Scopus (1275) Google Scholar, 6Bardel J. Louwagie M. Jaquinod M. Jourdain A. Luche S. Rabilloud T. Macherel D. Garin J. Bourguignon J. A survey of the plant mitochondrial proteome in relation to development.Proteomics. 2002; 2: 880-898Crossref PubMed Scopus (129) Google Scholar). This method has proven quite successful for the cataloguing of large numbers of proteins in complex samples. However, the approach is highly repetitive, labor intensive, and difficult to automate. In addition, it necessarily selects only for proteins that can be resolved by 2DE, missing many larger and smaller proteins, in addition to proteins with lower solubility, such as membrane proteins. Also, due to sample loading limitations for 2DE, it generally selects for only the most abundant proteins in a biological sample (4Garrels J.I. McLaughlin C.S. Warner J.R. Futcher B. Latter G.I. Kobayashi R. Schwender B. Volpe T. Anderson D.S. Mesquita-Fuentes R. Payne W.E. Proteome studies of Saccharomyces cerevisiae: Identification and characterization of abundant proteins.Electrophoresis. 1997; 18: 1347-1360Crossref PubMed Scopus (115) Google Scholar, 7Gygi S.P. Corthals G.L. Zhang Y. Rochon Y. Aebersold R. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology.Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 9390-9395Crossref PubMed Scopus (1176) Google Scholar), thus missing many lower abundance, regulatory proteins, rarely detected when complex mixtures are analyzed. 2DE also typically resolves different posttranslationally modified forms of the same proteins. Given the high degree and variety of post-translational modifications occurring on the proteins of eukaryotic organisms, this results in great difficulties in obtaining accurate quantitative data on the many proteins that separate into multiple spots, as well as multiple proteins that co-migrate to the same spot, during 2DE. However, because the in vivo activities of many proteins are regulated by post-translational modification, the ability to readily resolve differentially modified forms of protein allows for the use of 2DE to monitor changes in the known "active" and "inactive" forms of many proteins. The recently developed isotope-coded affinity tag (ICAT) technology instead allows for quantitative proteomic analysis based on differential isotopic tagging of related protein mixtures (8Gygi S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags.Nat. Biotechnol. 1999; 17: 994-999Crossref PubMed Scopus (4246) Google Scholar, 9Han D.K. Eng J. Zhou H. Aebersold R. Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry.Nat. Biotechnol. 2001; 19: 946-951Crossref PubMed Scopus (819) Google Scholar, 10Smolka M.B. Zhou H. Aebersold R. Quantitative protein profiling using two-dimensional gel electrophoresis, isotope-coded affinity tag labeling, and mass spectrometry.Mol. Cell. Proteomics. 2002; 1: 19-29Abstract Full Text Full Text PDF PubMed Scopus (108) Google Scholar, 11Flory M.R. Griffin T.J. Martin D. Aebersold R. Advances in quantitative proteomics using stable isotope tags.Trends Biotechnol. 2002; 20: S23-S29Abstract Full Text Full Text PDF PubMed Google Scholar) and is summarized schematically in Fig. 1. ICAT reagents consist of three functional elements: a thiol-reactive group for the selective labeling of reduced Cys residues, a biotin affinity tag to allow for selective isolation of labeled peptides, and a linker synthesized in either an isotopically normal ("light") or "heavy" form (utilizing 2H or 13C) that allows for the incorporation of the stable isotope tags. In a typical experiment, protein disulfide bridges are reduced under denaturing conditions, and the free sulfhydryl groups of the proteins from the two related samples to be compared are labeled respectively with the isotopically "light" or "heavy" forms of the reagent. The samples are then combined, proteolyzed with trypsin, and the resulting peptides can be separated by any number of optional fractionation steps, including the removal of untagged peptides (i.e. not containing a Cys residue) via avidin-affinity chromatography. Peptide/protein identifications are made by MS/MS analyses of the individual fractions, followed by protein sequence database searching of the observed MS/MS spectra. Finally, the observed ratio between the signal intensities for the unfragmented isotopically "light" and "heavy" forms of the same peptide yields the relative abundances of that peptide, and hence the protein from which it was derived, in the original samples. We have applied the ICAT approach to the investigation of the role of detergent-resistant lipid raft membrane microdomains in T cell receptor (TCR) signaling in the human cell line, Jurkat. We also sought to evaluate the reproducibility, performance, and reliability of the method by comparing the results of two repetitions of the same experiment. In this paper, we present in-depth and systematic technical analyses and discussions of the identifications made within each dataset, as well as comparisons between various datasets. In particular, we show that the application of new, automated, statistical modeling algorithms greatly improved the accuracy of and confidence in both peptide and protein identifications made by assigning probability scores to each peptide and protein matched. While our general approach performed well in analyzing what would normally be challenging protein samples to approaches such as 2DE, the protein identification overlap between the two repetitions of the experiment, along with a number of observations made during the data processing, raised a number of caveats that should be kept in mind when performing and interpreting proteomic data. Furthermore, the use of statistical data analysis removed much of the need for manual verification of both peptide and protein identifications. These experiments thus illustrated how statistical tools of this nature will greatly facilitate the timely processing of large proteomic datasets, currently a time-consuming and frequently manual process. Also, the application of such tools for assigning measures of confidence to each peptide and protein identified should offer some form of standardization for the interpretation of, in particular, large proteomic datasets. In turn, this should enable researchers to perform any experiment, interpret their results consistently, and then compare the results to those from any other related experiment. Finally, the general application of statistical tools such as these should allow, for the first time, the transparent comparison of related datasets from multiple laboratories. A total of 5 × 108 exponentially growing Jurkat T cells were resuspended at ∼2 × 107/ml in RPMI 1640 medium supplemented to 10% fetal calf serum, split into two equal aliquots, and chilled on ice for 15 min. Cells were simultaneously treated with anti-TCR (OKT3) and anti-CD28 monoclonal antibodies, which were cross-linked with a secondary antibody for 2 min to simulate costimulation, essentially according to standard laboratory protocols (12Heller M. Goodlett D.R. Watts J.D. Aebersold R. A comprehensive characterization of the T-cell antigen receptor complex composition by microcapillary liquid chromatography-tandem mass spectrometry.Electrophoresis. 2000; 21: 2180-2195Crossref PubMed Scopus (0) Google Scholar, 13Heller M. Watts J.D. Aebersold R. CD28 stimulation regulates its association with N-ethylmaleimide-sensitive fusion protein and other proteins involved in vesicle sorting.Proteomics. 2001; 1: 70-78Crossref Google Scholar, 14Watts J.D. Sanghera J.S. Pelech S.L. Aebersold R. Phosphorylation of serine 59 of p56lck in activated T cells.J. Biol. Chem. 1993; 268: 23275-23282Abstract Full Text PDF PubMed Google Scholar). Detergent resistant membranes (rafts) were purified essentially as described elsewhere (15). Cells were lysed on ice in 25 mm Tris, pH 7.5, 150 mm NaCl, 10 mm β-glycerophosphate, 5 mm EDTA, 1 mm Na3VO4, 1 mm phenylmethanesulfonyl fluoride, 10 μg/ml soybean trypsin inhibitor, 2 μg/ml leupeptin, 1 μg/ml aprotinin, 0.1% Triton X-100, dounce homogenized (10 strokes), and mixed with an equal volume of 80% sucrose in MNE buffer (25 mm 2-morpholinoethanesulfonic acid, 150 mm NaCl, 5 mm EDTA, pH 6.5). Rafts were then isolated by sucrose density step gradient ultracentrifugation (16–18 h, 200,000 × g, 4 °C). The low-density raft-containing fraction was further diluted with MNE buffer, and the rafts were pelleted by centrifugation (5 h, 200,000 × g, 4 °C). The lipid raft-containing pellet was then dissolved in 50 mm Tris, pH 8, 5 mm EDTA, 6 m urea, 0.05% SDS. ICAT labeling and analysis was performed essentially according to the manufacturer's protocol (ICAT Kit for Protein Labeling; Applied Biosystems, Foster City, CA), with optimized conditions known to result in quantitative labeling (16Smolka M.B. Zhou H. Purkayastha S. Aebersold R. Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis.Anal. Biochem. 2001; 297: 25-31Crossref PubMed Scopus (170) Google Scholar). In short, following reduction of cysteines and labeling of control (d0-ICAT) and stimulated (d8-ICAT) samples, the samples were pooled and then diluted to ≤1 M urea, ≤ 0.01% SDS for proteolysis, using an excess of trypsin (Promega, Madison, WI). The peptides were separated by cation exchange chromatography using a 4.6 × 200 mm Polysulfoethyl A column (5 μm particles, 300 Å pore size; Poly LC, Columbia, MD) at a flow rate of 800 μl/min. Peptides were eluted by a gradient of 0–25% B over 30 min, followed by 25–100% B over 20 min (buffer A: 5 mm K2HPO4, 25% CH3CN, pH 3.0; buffer B: 5 mm K2HPO4, 25% CH3CN, 600 mm KCl, pH 3.0). The elution profile of the cation exchange chromatography (Fig. 8A) determined which fractions were further analyzed. Forty-three (fractions 10–52) cation exchange fractions were individually processed over avidin cartridges (Applied Biosystems) according to the manufacturer's protocol (ICAT Kit for Protein Labeling; Applied Biosystems), to isolate the labeled Cys-containing peptides. Both the avidin column eluate and flow-through fractions were retained. To increase the peptide concentration of Cys-containing peptides for microcapillary-liquid chromatography MS/MS (μLC-MS/MS) analysis, avidin column eluates were pooled in pairs combined (except fraction 52), making a total of 22 fractions for μLC-MS/MS. Because the flow-through fractions contained higher peptide concentrations, these were analyzed individually by μLC-MS/MS. Three sets of samples were generated for subsequent μLC-MS/MS analysis: the avidin-affinity elutes (i.e. mostly Cys-containing ICAT-labeled peptides) from the two iterations of the biological experiment and the avidin-affinity flow-through samples (i.e. unlabeled peptides) from the first iteration of the biological experiment. The resultant three data subsets generated from the analysis of these samples were termed ICAT 1, ICAT 2, and Flow-through 1, respectively. Fifty to 100% of each sample was loaded using an autosampler and sequentially analyzed by automated data-dependent μLC-MS/MS (17Yi E.C. Marelli M. Lee H. Purvine S.O. Aebersold R. Aitchison J.D. Goodlett D.R. Approaching complete peroxisome characterization by gas-phase fractionation.Electrophoresis. 2002; 23: 3205-3216Crossref PubMed Scopus (172) Google Scholar). Injections were made on 10 cm × 100 μm capillary column packed in-house (Magic C18; Michrom BioResources, Auburn, CA). Peptides were eluted with a linear gradient of 10–40% B over 50 min at ∼200–300 nl/min (buffer A: 0.4% acetic acid, 0.005% heptafluorobutyric acid in H2O; buffer B: 100% acetonitrile). A HP1100 solvent delivery system (Hewlett Packard, Palo Alto, CA) was used with precolumn flow splitting. An LCQ-DEKA ion-trap mass spectrometer (ThermoFinnigan, San Jose, CA) with an in-house built micro-spray device was used for all analyses. Peptide fragmentation by collision-induced dissociation was carried out in an automated fashion using the dynamic-exclusion option, and the resultant MS/MS spectra were recorded. The uninterpreted MS/MS data were finally submitted to a suite of software tools for automated database searching and statistical interpretation of the search results. This process, summarized in Fig. 2, is described below, and more extensively under "Results and Discussion." Automated database searching using SEQUEST™ software (18Eng J. McCormack A.L. Yates 3rd, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5198) Google Scholar) was performed to identify peptide and protein sequence matches for each recorded MS/MS spectrum. Uninterpreted MS/MS spectra were searched against a locally maintained human protein sequence database (version dated 9/8/2002) with typical contaminants such as porcine trypsin (used for proteolysis) and bovine serum albumin (a major component of cell culture medium) additionally included. SEQUEST™ search parameters for ICAT-labeled samples were set as follows: static modification for d0-ICAT-labeled Cys was set to +442.22, with a +8 differential modification for d8-ICAT-labeled Cys; +16 for oxidized Met; mass tolerance ± 3 Da; no proteolytic enzyme specified. SEQUEST™ search parameters for flow-through fractions were the same, but without the modifications for Cys. SEQUEST™ database search software is available from ThermoFinnigan. SEQUEST™ output files were automatically submitted to PeptideProphet™ (19Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3733) Google Scholar) for computation of the probability that each peptide sequence assignment is correct (pcomp). The resultant outputs from SEQUEST™ and PeptideProphet™ were displayed using INTERACT (9Han D.K. Eng J. Zhou H. Aebersold R. Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry.Nat. Biotechnol. 2001; 19: 946-951Crossref PubMed Scopus (819) Google Scholar), a software tool that allows for web/intranet-based data display, and data filtering and sorting via a range of user-definable parameters. INTERACT was used to restrict the datasets by filtering at different pcomp cut-offs, and its sorting functions were used to determine the number of "single hit" peptides and proteins (i.e. database entries identified via only one peptide with a pcomp above the predetermined threshold) that were contained within each filtered version of the data. The in-house software tool, INTERACT differential (IADIFF) was used for side-by-side comparison of identified peptide sequences contained within multiple INTERACT files. This allowed for determination of the overlap between the three datasets for both the peptide sequence matches made and the proteins (i.e. database entries) to which they corresponded. INTERACT also generates an Excel spreadsheet version of any filtered and/or sorted dataset for distribution and publication purposes. The INTERACT data files for all three datasets (ICAT 1, ICAT 2, and Flow-through 1) were submitted to ProteinProphet™. ProteinProphet™ utilizes the list of peptide sequences and their respective pcomp scores to determine a minimal list of proteins (database entries) that can explain the observed data and to compute a probability (Pcomp) that each protein was indeed present in the original sample(s) (20Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; (in press)Crossref PubMed Scopus (3443) Google Scholar). The ProteinProphet™ output groups together all peptides that (potentially) match a given protein (i.e. database entry). It deals with indistinguishable database entries by grouping them as one "protein." This commonly occurs when multiple sequences (mRNAs) and fragments of the same sequence are represented as multiple database entries. Highly homologous gene families are dealt with by formation of related "protein groups," again as single output results. ProteinProphet™ then generates a computed probability (Pcomp) for each protein or protein group match. These functions are discussed in detail below under "Results and Discussion." The ProteinProphet™ output is also web-based and can be readily exported to an Excel spreadsheet for sorting, distribution, and publication purposes. More information on PeptideProphet™, ProteinProphet™, and INTERACT can also be found on the Proteomics pages at www.systemsbiology.org/. These applications are available upon request and are open source. The general experimental strategy employed for this study is summarized in Fig. 1. Briefly, lipid rafts were isolated from both control and stimulated Jurkat human T cells via standard protocols (15Zhang W. Trible R.P. Samelson L.E. LAT palmitoylation: its essential role in membrane microdomain targeting and tyrosine phosphorylation during T cell activation.Immunity. 1998; 9: 239-246Abstract Full Text Full Text PDF PubMed Scopus (745) Google Scholar) with a few variations. Cell stimulation was via cross-linking of the TCR with the coreceptor CD28 (12Heller M. Goodlett D.R. Watts J.D. Aebersold R. A comprehensive characterization of the T-cell antigen receptor complex composition by microcapillary liquid chromatography-tandem mass spectrometry.Electrophoresis. 2000; 21: 2180-2195Crossref PubMed Scopus (0) Google Scholar, 13Heller M. Watts J.D. Aebersold R. CD28 stimulation regulates its association with N-ethylmaleimide-sensitive fusion protein and other proteins involved in vesicle sorting.Proteomics. 2001; 1: 70-78Crossref Google Scholar, 14Watts J.D. Sanghera J.S. Pelech S.L. Aebersold R. Phosphorylation of serine 59 of p56lck in activated T cells.J. Biol. Chem. 1993; 268: 23275-23282Abstract Full Text PDF PubMed Google Scholar). Proteins copurifying with Jurkat T cell lipid rafts were isolated via conventional detergent insolubility (in 0.1% Triton X-100) at 4 °C, followed by sucrose density ultracentrifugation (15Zhang W. Trible R.P. Samelson L.E. LAT palmitoylation: its essential role in membrane microdomain targeting and tyrosine phosphorylation during T cell activation.Immunity. 1998; 9: 239-246Abstract Full Text Full Text PDF PubMed Scopus (745) Google Scholar, 21Brown D.A. Rose J.K. Sorting of GPI-anchored proteins to glycolipid-enriched membrane subdomains during transport to the apical cell surface.Cell. 1992; 68: 533-544Abstract Full Text PDF PubMed Scopus (2568) Google Scholar). Proteins from control cells were labeled with isotopically normal ("light") ICAT reagent and from stimulated cells with isotopically heavy reagent. The two ICAT reagents differed by 8 mass units and are referred to as the d0- and d8-ICAT reagents, respectively. Samples were combined, proteolyzed with trypsin, and the resultant peptides fractionated by cation exchange chromatography, and individual fractions further processed by avidin-affinity chromatography to enrich for ICAT-labeled peptides. Both the avidin-affinity eluate (ICAT-labeled peptides) and flow-through fractions (unlabeled peptides) were retained for subsequent μLC-MS/MS analyses, as described under "Experimental Procedures." This protocol was repeated a second time to allow assessment of the reproducibility and reliability of the approach. From the two iterations of the experiment described above, the following fractions were carried forward for μLC-MS/MS analysis: all pooled avidin eluate fractions (i.e. Cys-containing, ICAT-labeled peptides) from both experiments, which will be referred to as the ICAT 1 and ICAT 2 datasets, respectively; the avidin flow-through fractions (i.e. non-Cys-containing peptides) from the first (ICAT 1) experiment, which will be referred to as the Flow-through 1 dataset. All recorded MS/MS spectra were searched against a human protein sequence database using SEQUEST™ software (18Eng J. McCormack A.L. Yates 3rd, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5198) Google Scholar). Peptide and protein identifications inferred from these search results were determined using PeptideProphet™ (19Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3733) Google Scholar) and ProteinProphet™ (20Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; (in press)Crossref PubMed Scopus (3443) Google Scholar) software tools, respectively, summarized in Fig. 2, and further described below and under "Experimental Procedures." Currently, MS/MS data are searched via a range of database search tools that generate scores relating in some way to the quality of the peptide sequence assigned to each spectrum. To date, determination of the final list of "correct" peptide identifications has typically been based on a "threshold approach," where data is filtered on the basis of these scores alone, with everything below the threshold being discarded. Protein identifications are subsequently determined from the database entries from which the peptide sequences were derived. Typically, visual inspection of spectra is performed by the user to verify spectral quality, and hence the "correctness" of peptide/protein identifications. This is particularly the case when scores are close to the preset threshold, or in cases of "single hits," whereby a protein is identified via only a single peptide sequence identification. This process is necessarily highly variable. Furthermore, each user/laboratory has their own opinion of a suitable minimum threshold score to set. This problem is compounded by the fact that the various laboratories use both a range of database search engines, each with their own unique scoring system, and different types of mass spectrometers, each producing MS/MS spectra with their own unique characteristics. In fact, due to a range of

Referência(s)