Peptizer, a Tool for Assessing False Positive Peptide Identifications and Manually Validating Selected Results
2008; Elsevier BV; Volume: 7; Issue: 12 Linguagem: Inglês
10.1074/mcp.m800082-mcp200
ISSN1535-9484
AutoresKenny Helsens, Evy Timmerman, J Vandekerckhove, Kris Gevaert, Lennart Martens,
Tópico(s)Mass Spectrometry Techniques and Applications
ResumoFalse positive peptide identifications are a major concern in the field of peptidecentric, mass spectrometry-driven gel-free proteomics. They occur in regions where the score distributions of true positives and true negatives overlap. Removal of these false positive identifications necessarily involves a trade-off between sensitivity and specificity. Existing postprocessing tools typically rely on a fixed or semifixed set of assumptions in their attempts to optimize both the sensitivity and the specificity of peptide and protein identification using MS/MS spectra. Because of the expanding diversity in available proteomics technologies, however, these postprocessing tools often struggle to adapt to emerging technology-specific peculiarity. Here we present a novel tool named Peptizer that solves this adaptability issue by making use of pluggable assumptions. This research-oriented postprocessing tool also includes a graphical user interface to perform efficient manual validation of suspect identifications for optimal sensitivity recovery. Peptizer is open source software under the Apache2 license and is written in Java. False positive peptide identifications are a major concern in the field of peptidecentric, mass spectrometry-driven gel-free proteomics. They occur in regions where the score distributions of true positives and true negatives overlap. Removal of these false positive identifications necessarily involves a trade-off between sensitivity and specificity. Existing postprocessing tools typically rely on a fixed or semifixed set of assumptions in their attempts to optimize both the sensitivity and the specificity of peptide and protein identification using MS/MS spectra. Because of the expanding diversity in available proteomics technologies, however, these postprocessing tools often struggle to adapt to emerging technology-specific peculiarity. Here we present a novel tool named Peptizer that solves this adaptability issue by making use of pluggable assumptions. This research-oriented postprocessing tool also includes a graphical user interface to perform efficient manual validation of suspect identifications for optimal sensitivity recovery. Peptizer is open source software under the Apache2 license and is written in Java. The protein set of a biological system is the topic of research in proteomics with bottom-up proteomics approaches relying on peptides as the fundamental analytical unit. Typically proteins are extracted prior to being digested into peptides, generally by a specific protease such as trypsin. In most work flows, the highly complex peptide sample obtained after digestion is then separated in one or more chromatographic dimensions before being analyzed by a mass spectrometer. Peptides are ionized and fragmented in this instrument, yielding fragment ion spectra as the final experimental output (1Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1593) Google Scholar). Data interpretation algorithms are then used to identify the peptide of origin from the fragment ion spectrum. The final step in the identification procedure consists of assembling a protein list from the identified peptides (2Martens L. Hermjakob H. Proteomics data validation: why all must provide data.Mol. Biosyst. 2007; 3: 518-522Crossref PubMed Scopus (37) Google Scholar). As a first and crucial step of data interpretation, coupling of a fragment ion spectrum to a peptide sequence has attracted much effort aimed at optimizing this process. A review of the variety of methods and tools available for this purpose was published recently (3Matthiesen R. Methods, algorithms and tools in computational proteomics: a practical point of view.Proteomics. 2007; 7: 2815-2832Crossref PubMed Scopus (66) Google Scholar). The most commonly applied method is based on sequence database searching by database search engines such as SEQUEST (4Eng J.K. McCormack A.L. Yates J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5391) Google Scholar), Mascot (5Perkins D.N. Pappin D.J. Creasy D.M. Cottrell J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6733) Google Scholar), X!Tandem (6Craig R. Cortens J.P. Beavis R.C. Open source system for analyzing, validating, and storing protein identification data.J. Proteome Res. 2004; 3: 1234-1242Crossref PubMed Scopus (574) Google Scholar), Virtual Expert Mass Spectrometrist (7Matthiesen R. Trelle M.B. Hojrup P. Bunkenborg J. Jensen O.N. VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins.J. Proteome Res. 2005; 4: 2338-2347Crossref PubMed Scopus (110) Google Scholar), or Open Mass Spectrometry Search Algorithm (8Geer L.Y. Markey S.P. Kowalak J.A. Wagner L. Xu M. Maynard D.M. Yang X. Shi W. Bryant S.H. Open mass spectrometry search algorithm.J. Proteome Res. 2004; 3: 958-964Crossref PubMed Scopus (1159) Google Scholar). The overall concept behind these algorithms is similar and consists of the generation of theoretical fragment ion spectra from sequence database entries against which experimental fragment ion spectra are matched. The difference between the algorithms is usually found in the spectral comparison method and scoring scheme (9Sadygov R.G. Cociorva D. Yates III, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat. Methods. 2004; 1: 195-202Crossref PubMed Scopus (334) Google Scholar). The most difficult part of this analysis is not necessarily finding the best match from the sequence database but finding out whether this best match is actually valid. Indeed an experimental spectrum cannot always be compared with the actual theoretical spectrum of its original precursor because this precursor may be absent from the database or because the precursor peptide carried one or more unanticipated modifications. Even so, this experimental spectrum may still be matched with a considerable score to a theoretical fragmentation spectrum derived from an unrelated precursor. To filter out such background matches, several search engines include probability-based scoring algorithms (9Sadygov R.G. Cociorva D. Yates III, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat. Methods. 2004; 1: 195-202Crossref PubMed Scopus (334) Google Scholar) in which the score of a proposed peptide identification can be compared against a threshold score for a given confidence level. In addition, postprocessing tools have been developed that analyze the detailed output of a search engine to obtain a revised score that should further optimize sensitivity and specificity (10Brosch M. Swamy S. Hubbard T. Choudhary J. Comparison of mascot and X!Tandem performance for low and high accuracy mass spectrometry and the development of an adjusted mascot threshold.Mol. Cell. Proteomics. 2008; 7: 962-970Abstract Full Text Full Text PDF PubMed Scopus (55) Google Scholar, 11Li F. Sun W. Gao Y. Wang J. RScore: a peptide randomicity score for evaluating tandem mass spectra.Rapid Commun. Mass Spectrom. 2004; 18: 1655-1659Crossref PubMed Scopus (22) Google Scholar, 12Savitski M.M. Nielsen M.L. Zubarev R.A. New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques.Mol. Cell. Proteomics. 2005; 4: 1180-1188Abstract Full Text Full Text PDF PubMed Scopus (84) Google Scholar, 13Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3861) Google Scholar). Typically such algorithms rely on certain assumptions about the identifications to model true positive and true negative score distributions. PeptideProphet (13Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3861) Google Scholar) for example relies on a mixture model approach that models SEQUEST score distributions according to fixed assumptions such as tryptic correctness of the identified peptides. Ultimately a revised probabilistic score is calculated that should allow discrimination between true and false positives with increased accuracy. In certain cases, however, only a subset of all peptide identifications obtained are of relevance to the biological system under study. In these cases, expert manual validation of the identifications is a more commonplace strategy for quality control. Protein modification studies for example often find biological relevance in a small subset of all experimentally obtained data (14Zhan X. Desiderio D.M. Nitroproteins from a human pituitary adenoma tissue discovered with a nitrotyrosine affinity column and tandem mass spectrometry.Anal. Biochem. 2006; 354: 279-289Crossref PubMed Scopus (102) Google Scholar, 15Zhan X. Du Y. Crabb J.S. Gu X. Kern T.S. Crabb J.W. Targets of tyrosine nitration in diabetic rat retina.Mol. Cell. Proteomics. 2008; 7: 864-874Abstract Full Text Full Text PDF PubMed Scopus (38) Google Scholar). In addition, the so-called "single hit wonders", which often populate the majority of identified peptides or proteins in gel-free proteomics, should not be simply discarded but must be treated intelligently as they potentially contain valuable biological information (16Hardwidge P.R. Rodriguez-Escudero I. Goode D. Donohoe S. Eng J. Goodlett D.R. Aebersold R. Finlay B.B. Proteomic analysis of the intestinal epithelial cell response to enteropathogenic Escherichia coli.J. Biol. Chem. 2004; 279: 20127-20136Abstract Full Text Full Text PDF PubMed Scopus (77) Google Scholar, 17Veenstra T.D. Conrads T.P. Issaq H.J. What to do with "one-hit wonders"?.Electrophoresis. 2004; 25: 1278-1279Crossref PubMed Scopus (66) Google Scholar). The manual validation required to assure the reliability of the biological conclusions drawn from such peptide identifications can be performed by using the visualization tools included with the search engine or by specialized applications such as CHOMPER (18Eddes J.S. Kapp E.A. Frecklington D.F. Connolly L.M. Layton M.J. Moritz R.L. Simpson R.J. CHOMPER: a bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high-throughput proteomic strategies.Proteomics. 2002; 2: 1097-1103Crossref PubMed Scopus (51) Google Scholar), DTASelect (19Tabb D.L. McDonald W.H. Yates III, J.R. DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics.J. Proteome Res. 2002; 1: 21-26Crossref PubMed Scopus (1132) Google Scholar), or myProMS (20). These tools present a specific set of details on a peptide identification and its associated spectrum for user validation. Finally a semimanual option was recently added to PeptideProphet by allowing the user to enable or disable certain of the modeling assumptions from which the overall score is derived (21Choi H. Nesvizhskii A.I. Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.J. Proteome Res. 2008; 7: 254-265Crossref PubMed Scopus (112) Google Scholar). An important side effect of the evolution of proteomics technologies toward more specialized and targeted approaches (22Gevaert K. Van Damme P. Ghesquiere B. Impens F. Martens L. Helsens K. Vandekerckhove J. A la carte proteomics with an emphasis on gel-free techniques.Proteomics. 2007; 7: 2698-2718Crossref PubMed Scopus (77) Google Scholar, 23Stahl-Zeng J. Lange V. Ossola R. Eckhardt K. Krek W. Aebersold R. Domon B. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites.Mol. Cell. Proteomics. 2007; 6: 1809-1817Abstract Full Text Full Text PDF PubMed Scopus (312) Google Scholar), however, relates to the corresponding changes in the actual assumptions that can be made about the identifications. These changes effectively introduce new parameters that can be used to further enhance the separation of false and true positives, yet are necessarily largely ignored by tools built upon fixed, generalized assumptions. To allow this expanding array of technologies and associated identification parameters to be used effectively in the postprocessing and validation of proteomics data, here we present the Peptizer tool. Built upon a dynamic profiling framework that operates on pluggable assumptions, Peptizer can be quickly and efficiently configured with any a priori knowledge that is available to the user. Each assumption or parameter is coded in an autonomous agent, which is allowed to cast a vote on each peptide identification. In a second layer, the votes of these agents are aggregated using a pluggable algorithm, which outputs a final score that is used to judge whether an identification represents a potential false positive. We show that elimination of these suspicious identifications increases specificity albeit at the cost of a noticeable loss in sensitivity through removal of certain true positives. A sophisticated and highly efficient manual validation interface is also included that can be used to compensate in part for this loss in sensitivity. The MS/MS spectra used in this study have been published previously (24Staes A. Van Damme P. Helsens K. Demol H. Vandekerckhove J. Gevaert K. Improved recovery of proteome-informative, protein N-terminal peptides by combined fractional diagonal chromatography (COFRADIC).Proteomics. 2008; 8: 1362-1370Crossref PubMed Scopus (129) Google Scholar). Full experimental details are provided in the supplemental information. Briefly human K562 cells were lysed by cycles of freeze-thawing followed by reduction and alkylation of cysteines. Primary free amines were then trideuteroacetylated by N-hydroxysuccinimide trideuteroacetate. Alkylated and acetylated proteins were digested by trypsin, and the generated peptide mixture was separated by strong cation exchange at pH = 3 to enrich for α-amino-blocked peptides in the strong cation exchange non-binding fraction. The sample was then acidified to oxidize methionines before the primary N-terminal COFRADIC 1The abbreviations used are: COFRADIC, combined fractional diagonal chromatography; GUI, graphical user interface. separation (25Gevaert K. Van Damme P. Martens L. Vandekerckhove J. Diagonal reverse-phase chromatography applications in peptide-centric proteomics: ahead of catalogue-omics?.Anal. Biochem. 2005; 345: 18-29Crossref PubMed Scopus (66) Google Scholar). Fractions of 4 min wide were collected and treated with 2,4,6-trinitrobenzenesulfonic acid. Such modified primary fractions were then loaded for the secondary COFRADIC run wherein the α-amino-blocked peptides, which show no altered chromatographic properties, are collected. The secondary fractions were analyzed by LC-MS/MS using a microfluidic interface (Agilent Chip Cube) on an Agilent XCT-Ultra ion trap mass spectrometer operated as described previously (26Staes A. Timmerman E. Van Damme J. Helsens K. Vandekerckhove J. Vollmer M. Gevaert K. Assessing a novel microfluidic interface for shotgun proteome analyses.J. Sep. Sci. 2007; 30: 1468-1476Crossref PubMed Scopus (21) Google Scholar). The MS/MS spectra were searched by Mascot version 2.2 against the human subset of the UniProtKB/Swiss-Prot sequence database, release 53.2 (June 26, 2007), concatenated with a shuffled version of this database generated by DBToolkit (27Martens L. Vandekerckhove J. Gevaert K. DBToolkit: processing protein databases for peptide-centric proteomics.Bioinformatics (Oxf.). 2005; 21: 3584-3585Crossref PubMed Scopus (120) Google Scholar). The following parameters were used in the Mascot searches: peptide mass tolerance and peptide fragment tolerance were set at ±0.5 Da, and allowed precursor charges were set to 1+, 2+, and 3+. Fixed modifications were oxidation of methionine to its sulfoxide derivative, trideuteroacetylation of lysine and carbamidomethylation of cysteine. Pyroglutamate formation (N-terminal Gln), pyrocarbamidomethylcysteine formation (N-terminal carbamidomethylated cysteines), acetylation and trideuteroacetylation of the α-N terminus, and deamidation (Gln and Asn) were considered as variable modifications. Endoproteinase Arg-C/P was set as the proteolytic enzyme, and at most one missed cleavage was allowed. The Mascot instrument setting parameter was set to ESI-TRAP. Only MS/MS spectra receiving an ion score equal to or exceeding the Mascot identity threshold score at the 95% confidence level were withheld for further inspection by Peptizer. All experimental fragmentation spectra (32,403), peptide identifications (2,739) made in the "forward" protein database, and corresponding experimental details will be made publicly available via the proteomics identifications (PRIDE) database (28Martens L. Hermjakob H. Jones P. Adamski M. Taylor C. States D. Gevaert K. Vandekerckhove J. Apweiler R. PRIDE: the proteomics identifications database.Proteomics. 2005; 5: 3537-3545Crossref PubMed Scopus (435) Google Scholar) under experiment accession number 3261. All experimental fragmentation spectra (32,430), peptide identifications (2,739) made in the "forward" protein database, and corresponding experimental details are publicly available via the proteomics identifications (PRIDE) database (28Martens L. Hermjakob H. Jones P. Adamski M. Taylor C. States D. Gevaert K. Vandekerckhove J. Apweiler R. PRIDE: the proteomics identifications database.Proteomics. 2005; 5: 3537-3545Crossref PubMed Scopus (435) Google Scholar) under experiment accession number 3,261. To estimate the false positive distribution we performed Mascot searches against a concatenated decoy database as described previously (29Elias J.E. Gygi S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat. Methods. 2007; 4: 207-214Crossref PubMed Scopus (2798) Google Scholar). Peptizer was developed as an open source project under the Apache2 license in Java 1.5. Peptizer relies on Mascotdatfile (30Helsens K. Martens L. Vandekerckhove J. Gevaert K. MascotDatfile: an open-source library to fully parse and analyse MASCOT MS/MS search results.Proteomics. 2007; 7: 364-366Crossref PubMed Scopus (41) Google Scholar) to process Mascot result files and can also interface with the ms_lims software package (31Piggee C. LIMS and the art of MS proteomics.Anal. Chem. 2008; 80: 4801-4806Crossref PubMed Scopus (18) Google Scholar). Manual validation was performed by an experienced mass spectrometrist. The scientist was blinded to the origin of the peptide identifications (i.e. from the decoy or target set proteins). The scientist was told to apply stringent criteria during the validation. The net effect of the manual validation was obtained by inspecting the unblinded results after completion of the validation. Peptizer was configured to use the agents listed in Table I for detecting potential false positive identifications in this data set. The agent configuration text file, which can be loaded in Peptizer, can be found at the project Website. The "best hit" agent aggregator was used to combine the individual agent votes. The aggregator used simply summed all votes together and marked the peptide identification as suspicious if the result was equal to or greater than 2 (or when an agent with veto rights declines).Table IAgent configurationAgentVetoParameterVoteDeamidationTrueCount: 2Declines if 2 or more deamidationsaWhen using MS/MS spectra obtained with low resolution mass spectrometers we typically enable deamidation as a variable modification to recover peptide identifications when the second isotope (not the monoisotopic ion) was selected for fragmentation. This modification tends to occur more frequently in false positive peptide identifications creating isobaric amino acid combinations amongst others.Suspect residueTrueSites: Arg; HisDeclines if a His or internal Arg residue is presentbPeptides that contain an internal basic residue were suspicious here because they should have been retained on the strong cation exchange column during sample preparation (24).Delta thresholdFalseDelta: 10Declines if score delta between ion score and identity threshold is more then 10Free NH2FalseNADeclines if N terminus is unmodifiedHomologyFalseNADeclines if ion score or identity threshold is beyond the homology thresholdLengthFalseLength: 9Declines if the peptides has less then 9 amino acidsMore confident hitsFalseDelta: 20Declines if there is more than one confident identificationN term acetylationFalseNARecommends if the N terminus is acetylatedcThe N-terminal COFRADIC procedure includes an amino acetylation step prior to digestion, and about 95% of all identified peptides isolated by this procedure are α-N-acetylated. Such acetylated peptides are less likely to be false positives because they are simply more likely to occur. For the same reason, peptides that start in protein position 1 or 2 (methionine removal) are more likely to occur in the "true data set."Proline peakFalseIntensity: 0.4Declines if absence of intense fragment ion N-terminal to an internal proline residueb-ion coverageFalsePercentage: 0.10Declines if b-ion coverage is less then 10%y-ion coverageFalsePercentage: 0.25Declines if y-ion coverage is less then 25%Start siteFalseLow: 2; high: 200Recommends if the peptide starts at protein position 1 or 2, declines if above protein position 200, and reserves in betweencThe N-terminal COFRADIC procedure includes an amino acetylation step prior to digestion, and about 95% of all identified peptides isolated by this procedure are α-N-acetylated. Such acetylated peptides are less likely to be false positives because they are simply more likely to occur. For the same reason, peptides that start in protein position 1 or 2 (methionine removal) are more likely to occur in the "true data set."a When using MS/MS spectra obtained with low resolution mass spectrometers we typically enable deamidation as a variable modification to recover peptide identifications when the second isotope (not the monoisotopic ion) was selected for fragmentation. This modification tends to occur more frequently in false positive peptide identifications creating isobaric amino acid combinations amongst others.b Peptides that contain an internal basic residue were suspicious here because they should have been retained on the strong cation exchange column during sample preparation (24Staes A. Van Damme P. Helsens K. Demol H. Vandekerckhove J. Gevaert K. Improved recovery of proteome-informative, protein N-terminal peptides by combined fractional diagonal chromatography (COFRADIC).Proteomics. 2008; 8: 1362-1370Crossref PubMed Scopus (129) Google Scholar).c The N-terminal COFRADIC procedure includes an amino acetylation step prior to digestion, and about 95% of all identified peptides isolated by this procedure are α-N-acetylated. Such acetylated peptides are less likely to be false positives because they are simply more likely to occur. For the same reason, peptides that start in protein position 1 or 2 (methionine removal) are more likely to occur in the "true data set." Open table in a new tab Peptizer was developed as a postprocessing tool aimed at separating true and false positive peptide identifications in a highly configurable manner without relying on any built-in assumptions. Indeed considerable variations in expected output are often found between distinct research methodologies that all convey some form of a priori knowledge that can ultimately be used to separate identification candidates at the postprocessing level. Because existing tools commonly rely on fixed assumptions that are derived from generalized or idealized research methods, they are limited in the amount of a priori experimental information they can take into account. In contrast, Peptizer is inherently designed with the necessary flexibility to integrate any available a priori knowledge. The peptide identifications are tested by evaluating a series of user-selectable and extensible properties. The result of this evaluation can be to decline, reserve, or recommend the identification based on that property. The results across all considered properties are then combined in an overall score for identification reliability that can ultimately be used as a filter. In Peptizer, a property is inspected by an Agent, and the combination of multiple Agent scores is performed by an Aggregator. These two components are shown in Fig. 1 and are discussed in detail in the following sections. An Agent in Peptizer typically inspects a single property of a peptide identification and reports a score (or "vote") to indicate whether it declines, reserves, or recommends the identification (score of +1, 0, or −1, respectively). An individual Agent can be given veto privilege, which means that a decision to decline an identification by such an Agent will directly result in declining the identification irrespective of the votes of the other Agents. Examples of properties that an Agent can inspect include the following: the peptide sequence coverage by fragment ions, the length of a peptide, the peptide modification status, the difference between peptide ion score and identity threshold, and the difference between best scoring hit and second best hit among many others. Furthermore apart from being readily included in or excluded from a profile, each Agent can be parameterized as well. The Agent that inspects peptide length, for instance, can be provided with a cutoff length below which to decline an identification. Another example is the Agent that inspects for sequence coverage by b-ions, which also takes a threshold level of coverage below which identifications are declined by the Agent. As a final example, consider the Agent that inspects identifications for missed cleavages; in this case, both the cleavage specificity of the protease as well as the number of tolerated missed cleavages are Agent parameters. Cleavage specificity is therefore easily adapted when evaluating data from an experimental protocol that uses a different protease. As outlined above, all peptide identifications are inspected by a voting panel composed of user-selected Agents that each decline, reserve, or recommend an identification by casting a vote. These individual votes must then be aggregated into an overall score for the identification on which recommendation or rejection is ultimately based (see Fig. 1). A first method in which Agent votes can be combined is by simple summation of the Agent scores. If the end result is above a preset threshold (e.g. 0), the identification is rejected. A more pessimistic approach counts only the number of Agents that decline the peptide identification. If that number is higher than a preset cutoff, the peptide identification is considered bad. Obviously an Aggregator can also be much more sophisticated than these simple examples, utilizing a support vector machine, neural network, or other learning algorithm for instance. Interestingly Peptizer also supports pluggable Aggregators, thus allowing complete flexibility at both the Agent and Aggregator level. It is worth noting that the Peptizer framework can therefore provide an extremely convenient infrastructure basis for the development and implementation of novel computational strategies for discovering false positive identification profiles. Peptizer is released as open source under the Apache2 software license, and binaries as well as source code can be downloaded. Although it is made freely available under a permissive license, the source code is not required to build extensions to Peptizer, nor is a recompilation of the application necessary to include novel Agents or Aggregators. A typical Agent is only about 20 lines of code, whereas a typical, simple Aggregator is about twice that size. Peptizer loads its Agents and Aggregators from a simple eXtensible Markup Language (XML)-based configuration file upon application start-up, so simply adding a newly developed Agent into this configuration file will make it available for inclusion in the voting panel of the application, and the same holds true for Aggregators. The effort required to provide Peptizer with new Agents or Aggregators is thus minimized by design, allowing rapid adoption of novel experimental methodologies and their corresponding a priori information through custom-developed Agents and Aggregators. Although Peptizer currently only accepts Mascot ".dat" result files as input, the source of peptide identifications can also be modified. However, to extend the reach of Peptizer to other search engine output files, a basic understanding of programming in Java is required as parsing of these more complex files can be more involved. All these extensions to Peptizer can be achieved by implementing well documented interfaces, thus providing a clean and efficient develop-by-contract approach. Peptizer can be used in one of two modes: fully automatic command line execution, or semiautomatic operation by means of a user-friendly graphical user interface (GUI). Both modes address a distinct group of users: although the average user will work most comfortably in GUI mode, more experienced users will benefit from the automated and scriptable command line execution. An important diff
Referência(s)