Multibatch TMT Reveals False Positives, Batch Effects and Missing Values
2019; Elsevier BV; Volume: 18; Issue: 10 Linguagem: Inglês
10.1074/mcp.ra119.001472
ISSN1535-9484
AutoresAlejandro J. Brenes, Jens Hukelmann, Dalila Bensaddek, Angus I. Lamond,
Tópico(s)Advanced Biosensing Techniques and Applications
ResumoMultiplexing strategies for large-scale proteomic analyses have become increasingly prevalent, tandem mass tags (TMT) in particular. Here we used a large iPSC proteomic experiment with twenty-four 10-plex TMT batches to evaluate the effect of integrating multiple TMT batches within a single analysis. We identified a significant inflation rate of protein missing values as multiple batches are integrated and show that this pattern is aggravated at the peptide level. We also show that without normalization strategies to address the batch effects, the high precision of quantitation within a single multiplexed TMT batch is not reproduced when data from multiple TMT batches are integrated.Further, the incidence of false positives was studied by using Y chromosome peptides as an internal control. The iPSC lines quantified in this data set were derived from both male and female donors, hence the peptides mapped to the Y chromosome should be absent from female lines. Nonetheless, these Y chromosome-specific peptides were consistently detected in the female channels of all TMT batches. We then used the same Y chromosome specific peptides to quantify the level of ion coisolation as well as the effect of primary and secondary reporter ion interference. These results were used to propose solutions to mitigate the limitations of multi-batch TMT analyses. We confirm that including a common reference line in every batch increases precision by facilitating normalization across the batches and we propose experimental designs that minimize the effect of cross population reporter ion interference. Multiplexing strategies for large-scale proteomic analyses have become increasingly prevalent, tandem mass tags (TMT) in particular. Here we used a large iPSC proteomic experiment with twenty-four 10-plex TMT batches to evaluate the effect of integrating multiple TMT batches within a single analysis. We identified a significant inflation rate of protein missing values as multiple batches are integrated and show that this pattern is aggravated at the peptide level. We also show that without normalization strategies to address the batch effects, the high precision of quantitation within a single multiplexed TMT batch is not reproduced when data from multiple TMT batches are integrated. Further, the incidence of false positives was studied by using Y chromosome peptides as an internal control. The iPSC lines quantified in this data set were derived from both male and female donors, hence the peptides mapped to the Y chromosome should be absent from female lines. Nonetheless, these Y chromosome-specific peptides were consistently detected in the female channels of all TMT batches. We then used the same Y chromosome specific peptides to quantify the level of ion coisolation as well as the effect of primary and secondary reporter ion interference. These results were used to propose solutions to mitigate the limitations of multi-batch TMT analyses. We confirm that including a common reference line in every batch increases precision by facilitating normalization across the batches and we propose experimental designs that minimize the effect of cross population reporter ion interference. Highlights•Revealed inflation of missing values as multiple TMT 10-plex batches are integrated.•Analyzed the impact of integrating multiple TMT 10-plex batches on the quantification accuracy of both high and low abundance proteins.•Established reliable detection of false positives caused by coisolation and reporter ion interference, highlighted by the incidence of Y chromosome peptides in all female channels.•Optimized new experimental design set-ups to minimize cross population reporter ion interference via insights into coisolation and reporter ion interference. High-throughput, shotgun proteomics, using data dependent acquisition (DDA), 1The abbreviations used are:DDAdata dependent acquisitionPTMpost-translational modificationRIIreporter ion interferenceCIIcoisolation interferenceTMTtandem mass tags. 1The abbreviations used are:DDAdata dependent acquisitionPTMpost-translational modificationRIIreporter ion interferenceCIIcoisolation interferenceTMTtandem mass tags. now enables the comprehensive study of proteomes, allowing the identification of 10,000 or more proteins from cells and tissues (1Bekker-Jensen D.B. Kelstrup C.D. Batth T.S. Larsen S.C. Haldrup C. Bramsen J.B. Sorensen K.D. Hoyer S. Orntoft T.F. Andersen C.L. Nielsen M.L. Olsen J.V. An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes.Cell Syst. 2017; 4: 587-599 e584Abstract Full Text Full Text PDF PubMed Scopus (255) Google Scholar, 2Beck M. Schmidt A. Malmstroem J. Claassen M. Ori A. Szymborska A. Herzog F. Rinner O. Ellenberg J. Aebersold R. The quantitative proteome of a human cell line.Mol. Syst. Biol. 2011; 7: 549Crossref PubMed Scopus (585) Google Scholar, 3Meier F. Geyer P.E. Virreira Winter S. Cox J. Mann M. BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes.Nat. Methods. 2018; 15: 440-448Crossref PubMed Scopus (218) Google Scholar). However, to achieve such deep proteome coverage using DDA, extensive prefractionation of extracts before mass spectrometry (MS) analysis is frequently required (1Bekker-Jensen D.B. Kelstrup C.D. Batth T.S. Larsen S.C. Haldrup C. Bramsen J.B. Sorensen K.D. Hoyer S. Orntoft T.F. Andersen C.L. Nielsen M.L. Olsen J.V. An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes.Cell Syst. 2017; 4: 587-599 e584Abstract Full Text Full Text PDF PubMed Scopus (255) Google Scholar, 4Camerini S. Mauri P. The role of protein and peptide separation before mass spectrometry analysis in clinical proteomics.J. Chromatogr. A. 2015; 1381: 1-12Crossref PubMed Scopus (60) Google Scholar). To evaluate statistically the significance of the resulting data, a minimum of 3 replicates for each sample/condition is also necessary (5Rost H.L. Malmstrom L. Aebersold R. Reproducible quantitative proteotype data matrices for systems biology.Mol. Biol. Cell. 2015; 26: 3926-3931Crossref PubMed Scopus (36) Google Scholar, 6Turck C.W. Falick A.M. Kowalak J.A. Lane W.S. Lilley K.S. Phinney B.S. Weintraub S.T. Witkowska H.E. Yates N.A. Association of Biomolecula Resource Facilities Proteomics Research, G, The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 study: relative protein quantitation.Mol. Cell. Proteomics. 2007; 6: 1291-1298Abstract Full Text Full Text PDF PubMed Scopus (94) Google Scholar). The data acquisition time involved is increased still further for experiments that analyze the multi-dimensional characteristics of the proteome; for example, studying differences in protein subcellular localization, turnover rates, post-translational modifications (PTMs) and protein-protein interactions (7Larance M. Lamond A.I. Multidimensional proteomics for cell biology.Nat. Rev. Mol. Cell Biol. 2015; 16: 269-280Crossref PubMed Scopus (289) Google Scholar, 8Boisvert F.M. Ahmad Y. Gierlinski M. Charriere F. Lamont D. Scott M. Barton G. Lamond A.I. A quantitative spatial proteomics analysis of proteome turnover in human cells.Mol. Cell. Proteomics. 2012; 11 (M111 011429)Abstract Full Text Full Text PDF Scopus (268) Google Scholar, 9Larance M. Kirkwood K.J. Tinti M. Brenes Murillo A. Ferguson M.A. Lamond A.I. Global membrane protein interactome analysis using in vivo crosslinking and mass spectrometry-based protein correlation profiling.Mol. Cell. Proteomics. 2016; 15: 2476-2490Abstract Full Text Full Text PDF PubMed Scopus (47) Google Scholar). data dependent acquisition post-translational modification reporter ion interference coisolation interference tandem mass tags. data dependent acquisition post-translational modification reporter ion interference coisolation interference tandem mass tags. To cope with the challenges of large-scale proteomics analyses, strategies have been developed to allow multiple samples to be analyzed in parallel, through multiplexing isotopically tagged peptides (10Hennrich M.L. Romanov N. Horn P. Jaeger S. Eckstein V. Steeples V. Ye F. Ding X. Poisa-Beiro L. Lai M.C. Lang B. Boultwood J. Luft T. Zaugg J.B. Pellagatti A. Bork P. Aloy P. Gavin A.C. Ho A.D. Cell-specific proteome analyses of human bone marrow reveal molecular features of age-dependent functional decline.Nat. Commun. 2018; 9: 4004Crossref PubMed Scopus (42) Google Scholar, 11Munoz I.M. Morgan M.E. Peltier J. Weiland F. Gregorczyk M. Brown F.C. Macartney T. Toth R. Trost M. Rouse J. Phosphoproteomic screening identifies physiological substrates of the CDKL5 kinase.EMBO J. 2018; 37 (pii): e99559Crossref PubMed Scopus (32) Google Scholar). The most widely used MS multiplexing methods, TMT (12Thompson A. Schafer J. Kuhn K. Kienle S. Schwarz J. Schmidt G. Neumann T. Johnstone R. Mohammed A.K. Hamon C. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS.Anal. Chem. 2003; 75: 1895-1904Crossref PubMed Scopus (1709) Google Scholar) and iTRAQ (13Ross P.L. Huang Y.N. Marchese J.N. Williamson B. Parker K. Hattan S. Khainovski N. Pillai S. Dey S. Daniels S. Purkayastha S. Juhasz P. Martin S. Bartlet-Jones M. He F. Jacobson A. Pappin D.J. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.Mol. Cell. Proteomics. 2004; 3: 1154-1169Abstract Full Text Full Text PDF PubMed Scopus (3680) Google Scholar), use isobaric tags for simultaneous peptide identification and quantification. TMT has increased in popularity and is now widely used (14Isasa M. Rose C.M. Elsasser S. Navarrete-Perea J. Paulo J.A. Finley D.J. Gygi S.P. Multiplexed, proteome-wide protein expression profiling: yeast deubiquitylating enzyme knockout strains.J. Proteome Res. 2015; 14: 5306-5317Crossref PubMed Scopus (36) Google Scholar, 15McAlister G.C. Nusinow D.P. Jedrychowski M.P. Wuhr M. Huttlin E.L. Erickson B.K. Rad R. Haas W. Gygi S.P. MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes.Anal. Chem. 2014; 86: 7150-7158Crossref PubMed Scopus (710) Google Scholar). This reflects the ability of multiplexed TMT to increase sample throughput in proteomics studies and reduce the "missing values" problem that arises from the stochastic sampling inherent in DDA proteomics (16Lazar C. Gatto L. Ferro M. Bruley C. Burger T. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies.J. Proteome Res. 2016; 15: 1116-1125Crossref PubMed Scopus (189) Google Scholar, 17Webb-Robertson B.J. Wiberg H.K. Matzke M.M. Brown J.N. Wang J. McDermott J.E. Smith R.D. Rodland K.D. Metz T.O. Pounds J.G. Waters K.M. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.J. Proteome Res. 2015; 14: 1993-2001Crossref PubMed Scopus (142) Google Scholar). Thus, within a single multiplexed TMT batch, the number of missing values at the protein level is low, frequently <2% (14Isasa M. Rose C.M. Elsasser S. Navarrete-Perea J. Paulo J.A. Finley D.J. Gygi S.P. Multiplexed, proteome-wide protein expression profiling: yeast deubiquitylating enzyme knockout strains.J. Proteome Res. 2015; 14: 5306-5317Crossref PubMed Scopus (36) Google Scholar). Further, the precision of the quantification within a multiplexed TMT batch is high (18O'Connell J.D. Paulo J.A. O'Brien J.J. Gygi S.P. Proteome-wide evaluation of two common protein quantification methods.J. Proteome Res. 2018; 17: 1934-1942Crossref PubMed Scopus (79) Google Scholar). However, it is less clear how well multiplexed TMT performs for very large-scale analyses, involving numerous TMT batches. In this manuscript, we use a proteomic data set of human iPSC cells, involving 24 separate 10-plex TMT batches (19Brenes A. Bensaddek D. Hukelmann J.L. Afzal V. Lamond A.I. The iPSC proteomic compendium.bioRxiv. 2018; (https://www.biorxiv.org/content/10.1101/469916v1)Google Scholar). We compare the quantitation of data both within and between 10-plex batches and focus our analysis on 4 main issues: (1) missing values, (2) accuracy of quantification, (3) false positives and (4) the effect of both reporter ion interference (RII) and coisolation interference (CII). We show there is an inflationary effect on missing values as data from multiple batches are integrated both at the protein and peptide level. We evaluated reproducibility both by studying the coefficient of variation (CV) within each 10-plex TMT batch, and by comparing a reference line (technical replicates of the iPSC line "bubh_3") that were common to every batch. Furthermore, the incidence of false positives was studied by using Y chromosome peptides as an internal control. The iPSC lines quantified in this data set were derived from 163 different donors including both male and female, hence the peptides mapped to the Y chromosome should be absent from female lines. Nonetheless, we confirm that these Y chromosome-specific peptides were consistently detected in the female channels of all TMT batches. Finally, by using these Y chromosome peptides, we quantified the effect of ion coisolation and reporter ion interference upon TMT quantification accuracy. The study consists of 240 iPSC replicates, 217 biological replicates and 24 technical replicates, derived from 163 different donors. The study comprises twenty-four 10-plex TMT batches. Each batch consisted of 1 common reference line (technical replicates of iPSC cell line "bubh_3") and 9 different iPSC cell lines. The technical replicates were used for the data normalization strategy described below. Out of the 240 replicates analyzed, 142 were derived from female donors and from 98 male donors. For protein extraction, iPSC cell pellets were washed with ice cold PBS and redissolved immediately in 200 μl of lysis buffer (8 m urea in 100 mm triethyl ammonium bicarbonate (TEAB)) and mixed at room temperature for 15 min. Cellular DNA was sheared using ultrasonication (6 × 20 s on ice). The proteins were reduced using tris-carboxyethylphosphine TCEP (25 mm) for 30 min at room temperature, then alkylated in the dark for 30 min using iodoacetamide (50 mm). Total protein was quantified using the EZQ assay (Thermo Fisher Scientific, Waltham, MA). For the first digestion with mass spectrometry grade lysyl endopeptidase, Lys-C (Wako, Japan), the lysates were diluted 4-fold with 100 mm TEAB then further diluted 2.5-fold before a second digestion with trypsin. Lys-C and trypsin were used at an enzyme to substrate ratio of 1:50 (w/w). The digestions were carried out overnight at 37 °C, then stopped by acidification with trifluoroacetic acid (TFA) to a final concentration of 1% (v:v). Peptides were desalted using C18 Sep-Pak cartridges (Waters, Millford, MA) following manufacturer's instructions. For tandem mass tag (TMT)-based quantification, the dried peptides were re-dissolved in 100 mm TEAB (50 μl) and their concentration was measured using a fluorescent assay (CBQCA, Thermo Fisher Scientific). For each 10-plex TMT batch 100 μg of peptides from each cell line to be compared, in 100 μl of TEAB, were labeled with a different TMT tag (20 μg/ml in 40 μl acetonitrile) (Thermo Fisher Scientific), for 2 h at room temperature. After incubation, the labeling reaction was quenched using 8 μl of 5% hydroxylamine (Thermo Fisher Scientific) for 30 min and the different cell lines/tags were mixed and dried in vacuo. The TMT samples were fractionated using off-line, high-pH reverse-phase (RP) chromatography: samples were loaded onto a 4.6 × 250 mm Xbridge BEH130 C18 column with 3.5-μm particles (Waters). Using a Dionex bioRS system, the samples were separated using a 25-min multistep gradient of solvents A (10 mm formate at pH 9) and B (10 mm ammonium formate pH 9 in 80% acetonitrile), at a flow rate of 1 ml/min. Peptides were separated into 48 fractions, which were consolidated into 24 fractions. The fractions were subsequently dried and the peptides re-dissolved in 5% formic acid and analyzed by LC-MS/MS. Samples were analyzed using an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific), equipped with a Dionex ultra-high-pressure liquid-chromatography system (RSLCnano). RPLC was performed using a Dionex RSLCnano HPLC (Thermo Fisher Scientific). Peptides were injected onto a 75 μm × 2 cm PepMap-C18 pre-column and resolved on a 75 μm × 50 cm RP- C18 EASY-Spray temperature-controlled integrated column-emitter (Thermo Fisher Scientific), using a four-hour multistep gradient from 5% B to 35% B with a constant flow rate of 200 nl/min. The mobile phases were: 2% ACN incorporating 0.1% FA (solvent A) and 80% ACN incorporating 0.1% FA (solvent B). The spray was initiated by applying 2.5 kV to the EASY-Spray emitter and the data were acquired under the control of Xcalibur software in a data-dependent mode using top speed and 4 s duration per cycle. The survey scan is acquired in the orbitrap covering the m/z range from 400 to 1,400 Thomson with a mass resolution of 120,000 and an automatic gain control (AGC) target of 2.0 × 105 ions. The most intense ions were selected for fragmentation using CID in the ion trap with 30% CID collision energy and an isolation window of 1.6 Th. The AGC target was set to 1.0 × 104 with a maximum injection time of 70 ms and a dynamic exclusion of 80 s, the scan rate was set to "Rapid." During the MS3 analysis for more accurate TMT quantifications, 5 fragment ions were coisolated using synchronous precursor selection with a window of 2 Th and further fragmented using HCD collision energy of 55%. The fragments were then analyzed in the orbitrap with a resolution of 60,000. The AGC target was set to 1.0 × 105 and the maximum injection time was set to 105 ms. All of the TMT batches were analyzed on the same Orbitrap Fusion MS instrument (Thermo Fisher Scientific). Between each individual TMT experiment, one blank was run, followed by analysis of a 15 peptide Retention Time Calibration (RTC) standard, to evaluate retention time drift. This was followed by analysis of an MCF10a total cell digest standard to evaluate peptide and protein identifications. The last step consisted of analysis of two blanks, one with an oscillating gradient and one with the gradient matching the samples to be run. The data from all twenty-four 10-plex TMT batches were batches were analyzed using Maxquant (20Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (9154) Google Scholar, 21Tyanova S. Temu T. Cox J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.Nat. Protoc. 2016; 11: 2301-2319Crossref PubMed Scopus (1872) Google Scholar) v. 1.6.3.3. The FDR threshold was set to 1% for each of the respective Peptide Spectrum Match (PSM) and Protein levels. The data was searched with the following parameters; type was set to Reporter ion MS3 with 10plex TMT, stable modification of carbamidomethyl (C), variable modifications, oxidation (M), acetylation (protein N terminus), deamidation (NQ), Glutamine to pyro-glutamate (N terminus), with a 2 missed tryptic cleavages threshold, reporter mass tolerance set to 0.03 ppm. Minimum peptide length was set to 7 amino acids. Proteins and peptides were identified using UniProt (SwissProt December 2018). The run parameters are accessible at ProteomeXchange (22Vizcaino J.A. Deutsch E.W. Wang R. Csordas A. Reisinger F. Rios D. Dianes J.A. Sun Z. Farrah T. Bandeira N. Binz P.A. Xenarios I. Eisenacher M. Mayer G. Gatto L. Campos A. Chalkley R.J. Kraus H.J. Albar J.P. Martinez-Bartolome S. Apweiler R. Omenn G.S. Martens L. Jones A.R. Hermjakob H. ProteomeXchange provides globally coordinated proteomics data submission and dissemination.Nat. Biotechnol. 2014; 32: 223-226Crossref PubMed Scopus (2071) Google Scholar) via the PRIDE repository (23Vizcaino J.A. Csordas A. del-Toro N. Dianes J.A. Griss J. Lavidas I. Mayer G. Perez-Riverol Y. Reisinger F. Ternent T. Xu Q.W. Wang R. Hermjakob H. 2016 update of the PRIDE database and its related tools.Nucleic Acids Res. 2016; 44: D447-D456Crossref PubMed Scopus (2775) Google Scholar), along with the full MaxQuant (20Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (9154) Google Scholar) quantification output (PXD010557). All proteins that were marked as "Reverse," "Potential Contaminants," or "Only identified by site" were discarded. The final subset comprised 9,640 proteins. Peptides marked as "Potential contaminants" or "Reverse" were also filtered from the analysis. The final peptide data set comprised 178,491 peptides. Protein copy numbers were calculated following the proteomic ruler approach (24Wisniewski J.R. Hein M.Y. Cox J. Mann M. A "proteomic ruler" for protein copy number and concentration estimation without spike-in standards.Mol. Cell. Proteomics. 2014; 13: 3497-3506Abstract Full Text Full Text PDF PubMed Scopus (348) Google Scholar). For protein, p, uCNb,c,p is the uncorrected protein copy number: uCNb,c,p=protein,MS3signalb,c,p×AMp×6.85×10−12∑h∈b,chistonesMS3signalhforbatchb∈{1,2,…24}andchannelc∈{126C,127N,…131N} where A is Avogadro's constant, Mp is the molar mass of the protein p, protein MS3 signal is the protein MS3 intensity and histones MS3 signal is the MS3 intensity for all histones, h. These uncorrected copy numbers, which will be referred to here as "raw copy numbers", were used to study the coefficient of variation (CV). To control for technical variation between the 24 different 10-plex batches, a correction factor, cf, was applied to every protein, p, in every batch, b, to adjust the protein copy numbers. cfb,p=uCNb,126C,p∑buCNb,126C,p/24forb∈{1,2,…24} where uCNb,126C,p is the protein copy number derived from reporter channel 126C (the reference channel). The normalized copy number, normCN, is calculated for protein, p in all batches, b, and all channels c: normCNb,c,p=uCNb,c,pcfb,pforbatchb∈{1,2,…24}andchannelc∈{126C,127N,…,131N} First, to estimate missing values within this DDA analysis, a list of unique proteins/peptides that were detected with at least 1 reporter intensity greater than zero were calculated for each batch. To determine the number of missing values within each 10-plex TMT batch, the number of unique proteins/peptides per reporter channel was compared with the number of unique proteins/peptides identified within the batch. This approach was applied to generate the missing value calculations for each of the 24 individual 10-plex TMT batches. To assess the effect of integrating multiple TMT batches, random sampling was performed to estimate how missing values are affected by a progressive increase in the number of 10-plex TMT batches analyzed. This was performed in an incremental fashion starting from 2 and finishing with 22 batches (PT6388 was not used for this analysis), with 500 iterations per level. At the first level 2 batches would be selected at random with no replacement, and at the last level 22 batches would be selected at random, again with no replacement. This was performed with the R function "sample()" part of the base R-core package. At each level a new list of proteins/peptides detected with at least 1 reporter ion intensity greater than zero within any of the integrated TMT batches was calculated, and the number of proteins/peptides with intensity greater than 0 per reporter channel was evaluated against the new list. The coefficient of variation (CV) in protein abundance levels was calculated using the log10 transformed protein copy numbers. CV=SX×100 For each protein the CV is equal to the copy number standard deviation (S) divided by the mean copy number (X) times 100. The protein CV within each 10-plex TMT batch was calculated for all 10 cell lines within the same batch, using all proteins detected in every reporter channel. The reference line CV was calculated using proteins that were detected in the TMT10-126C (reference line) channel across all of the 24 10-plex TMT batches. For each 10-plex TMT batch, a concordance correlation value was calculated for all cell lines within the same batch. The calculations were performed using "correlation()" function from the R package "agricolae" version 1.2.8. The same process was applied to calculate the concordance correlation values for the reference lines, i.e. using reporter channel 126C in all TMT batches. The replicate normalized intensity, rni, was calculated per peptide, q: rniq=log10(peptideMS3signalb,c,qmedian(Ib,c))Ib,c={peptideMS3signalb,c,q:∀q}givenbatch,bandchannel,cforbatchb∈{1,2,…24}andchannelc∈{126C,127N,…,131N} The median normalized intensity, mni, for peptide, q, is the median of all batches, b, and channels, c: mniq=median(rnib,c,q)forallbatchesb∈{1,2,…24}andchannelsc∈{126C,127N,…,131N} The global median is the median of mni for all peptides, q: globalmedian=median(mniq) The reporter ion interference (RII) targets are based on a typical product data sheet for 10-plex TMT Label Reagents from ThermoFisher Scientific, as summarized in Table I below:Table IReporter ion interference classification for all TMT batches, specifying the reporter mass tag, the reporter channel within the MaxQuant output and the target channels for primary (+1 Da) and secondary (−1 Da) reporter ion interferenceMass tagReporter channel−1Da (secondary RII)+1Da (primary RII)TMT10-1261–127CTMT10-127N2–128NTMT10-127C3126128CTMT10-128N4127N129NTMT10-128C5127C129CTMT10-129N6128N130NTMT10-129C7128C130CTMT10-130N8129N131TMT10-130C9129C–TMT10-13110130N– Open table in a new tab To study the effect of reporter ion interference across different TMT channels, we selected a subset of 69 peptides that were specific to the following list of protein coding genes uniquely located on the Y chromosome; "CDY1," "CDY2A," "DDX3Y," "EIF1AY," "KDM5D," "NLGN4Y," "PCDH11Y," "RPS4Y1," "TBL1Y," "USP9Y," and "UTY." This approach of using peptide values from Y chromosome specific genes depends upon there being a diverse mixture of male and female donor-derived iPSC lines in each 10-plex TMT batch. However, two of the 24 TMT batches comprised exclusively female donor-derived iPSCs, which had been shown not to have Y chromosome derived DNA in QC analyses (25Kilpinen H. Goncalves A. Leha A. Afzal V. Alasoo K. Ashford S. Bala S. Bensaddek D. Casale F.P. Culley O.J. Danecek P. Faulconbridge A. Harrison P.W. Kathuria A. McCarthy D. McCarthy S.A. Meleckyte R. Memari Y. Moens N. Soares F. Mann A. Streeter I. Agu C.A. Alderton A. Nelson R. Harper S. Patel M. White A. Patel S.R. Clarke L. Halai R. Kirton C.M. Kolb-Kokocinski A. Beales P. Birney E. Danovi D. Lamond A.I. Ouwehand W.H. Vallier L. Watt F.M. Durbin R. Stegle O. Gaffney D.J. Common genetic variation drives molecular heterogeneity in human iPSCs.Nature. 2017; 546: 370-375Crossref PubMed Scopus (283) Google Scholar). For these female donor-specific batches, any peptide assigned to Y chromosome specific genes was excluded from the analysis. An additional batch, PT6388, displayed an irregular behavior, and was hence also discarded from the posterior analysis. A final subset of 65 Y chromosome-specific peptides were used for this analysis (see supplemental data for list). The peptide ratios comparing male channels versus female channels, mpr, subjected to different reporter ion interference conditions, cond, were calculated per 10-plex TMT batch, b, for peptide, q, using the replicate normalized intensities: mprb,qmedian(RNIb,male,q)median(RNIb,cond,q)RNIb,male,q={rnib,c,q:∀malechannels,c},RNIb,cond,q={rnib,c,q:∀condchannels,c},forb∈{1,2,…24}andcond∈{primaryRII,secondaryRII,doubleRII,noRII}. The box plot comparing male replicates to the different reporter ion interference conditions used these peptide batch ratios and was plotted using "ggplot2" version 3.0.0 (26Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York2016Crossref Google Scholar). The peptide ratios, npr, comparing different reporter ion interference (RII) conditions, cond, in female channels to female channels with no reporter ion interference, noRII, were calculated within each 10-plex TMT batch, b, for peptide, q, using the replicate normalized intensities. nprb,qmedian(RNIb,cond,q)median(RNIb,noRII,q)RNIb,cond,q={rnib,c,q:∀condchannels,c},RNIb,noRII,q={rnib,c,q:∀noRIIchannels,c},forb∈{1,2,…24}andcond∈{primaryRII,secondaryRII,doubleRII}. These results were stratified by the global median, where peptides with median normalized either intensity greater than or equal to the global median were considered 'High intensity' and those lower than the global median were considered 'Low intensity'. The box plot comparing different reporter ion interference conditions to the replicates not affected by reporter ion interference used these peptide batch ratios and was plotted using "ggplot2" version 3.0.0 (26Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York2016Crossref Google Scholar). A known advantage of using TMT is the low index of missing values that are present within a single TMT batch. Recent studies report as low as <1% missing values at the protein level (18O'Connell J.D. Paulo J.A. O'Brien J.J. Gygi S.P. Proteome-wide evaluation of two common protein quantification methods.J. Proteome Res. 2018; 17: 1934-1942Crossref PubMed Scopus (79) Google Scholar), albeit data are usually not reported at the peptide level. We started by analyzing the iPSC 10-plex TMT data for th
Referência(s)