Artigo Acesso aberto Revisado por pares

Proteome‐scale mapping of binding sites in the unstructured regions of the human proteome

2022; Springer Nature; Volume: 18; Issue: 1 Linguagem: Inglês

10.15252/msb.202110584

ISSN

1744-4292

Autores

Caroline Benz, Muhammad Ali, Izabella Krystkowiak, Leandro Simonetti, Ahmed Sayadi, Filip Mihalič, Johanna Kliche, Eva Andersson, Per Jemth, Norman E. Davey, Ylva Ivarsson,

Tópico(s)

Glycosylation and Glycoproteins Research

Resumo

Article19 January 2022Open Access Source DataTransparent process Proteome-scale mapping of binding sites in the unstructured regions of the human proteome Caroline Benz Caroline Benz orcid.org/0000-0002-5166-3598 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden These authors contributed equally to this work Contribution: Validation (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal) Search for more papers by this author Muhammad Ali Muhammad Ali orcid.org/0000-0002-8858-6776 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden These authors contributed equally to this work Contribution: ​Investigation (equal), Methodology (equal) Search for more papers by this author Izabella Krystkowiak Izabella Krystkowiak orcid.org/0000-0002-8863-7086 Division of Cancer Biology, The Institute of Cancer Research, London, UK These authors contributed equally to this work Contribution: Software (equal), Methodology (equal), Writing - review & editing (equal) Search for more papers by this author Leandro Simonetti Leandro Simonetti orcid.org/0000-0003-1283-9770 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal), Visualization (equal), Writing - original draft (equal) Search for more papers by this author Ahmed Sayadi Ahmed Sayadi Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal), Visualization (equal) Search for more papers by this author Filip Mihalic Filip Mihalic orcid.org/0000-0002-6840-2319 Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Johanna Kliche Johanna Kliche orcid.org/0000-0003-3179-4635 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Eva Andersson Eva Andersson Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Per Jemth Per Jemth orcid.org/0000-0003-1516-7228 Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: Formal analysis (equal), Supervision (equal), Funding acquisition (equal) Search for more papers by this author Norman E Davey Corresponding Author Norman E Davey [email protected] orcid.org/0000-0001-6988-4850 Division of Cancer Biology, The Institute of Cancer Research, London, UK Contribution: Conceptualization (equal), Data curation (equal), Software (equal), Supervision (equal), Funding acquisition (equal), ​Investigation (equal), Visualization (equal), Methodology (equal), Writing - original draft (equal), Writing - review & editing (equal) Search for more papers by this author Ylva Ivarsson Corresponding Author Ylva Ivarsson [email protected] orcid.org/0000-0002-7081-3846 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: Conceptualization (equal), Data curation (equal), Supervision (equal), Writing - original draft (equal), Project administration (equal), Writing - review & editing (equal) Search for more papers by this author Caroline Benz Caroline Benz orcid.org/0000-0002-5166-3598 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden These authors contributed equally to this work Contribution: Validation (equal), ​Investigation (equal), Writing - original draft (equal), Writing - review & editing (equal) Search for more papers by this author Muhammad Ali Muhammad Ali orcid.org/0000-0002-8858-6776 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden These authors contributed equally to this work Contribution: ​Investigation (equal), Methodology (equal) Search for more papers by this author Izabella Krystkowiak Izabella Krystkowiak orcid.org/0000-0002-8863-7086 Division of Cancer Biology, The Institute of Cancer Research, London, UK These authors contributed equally to this work Contribution: Software (equal), Methodology (equal), Writing - review & editing (equal) Search for more papers by this author Leandro Simonetti Leandro Simonetti orcid.org/0000-0003-1283-9770 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal), Visualization (equal), Writing - original draft (equal) Search for more papers by this author Ahmed Sayadi Ahmed Sayadi Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal), Visualization (equal) Search for more papers by this author Filip Mihalic Filip Mihalic orcid.org/0000-0002-6840-2319 Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Johanna Kliche Johanna Kliche orcid.org/0000-0003-3179-4635 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Eva Andersson Eva Andersson Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: ​Investigation (equal) Search for more papers by this author Per Jemth Per Jemth orcid.org/0000-0003-1516-7228 Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden Contribution: Formal analysis (equal), Supervision (equal), Funding acquisition (equal) Search for more papers by this author Norman E Davey Corresponding Author Norman E Davey [email protected] orcid.org/0000-0001-6988-4850 Division of Cancer Biology, The Institute of Cancer Research, London, UK Contribution: Conceptualization (equal), Data curation (equal), Software (equal), Supervision (equal), Funding acquisition (equal), ​Investigation (equal), Visualization (equal), Methodology (equal), Writing - original draft (equal), Writing - review & editing (equal) Search for more papers by this author Ylva Ivarsson Corresponding Author Ylva Ivarsson [email protected] orcid.org/0000-0002-7081-3846 Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden Contribution: Conceptualization (equal), Data curation (equal), Supervision (equal), Writing - original draft (equal), Project administration (equal), Writing - review & editing (equal) Search for more papers by this author Author Information Caroline Benz1, Muhammad Ali1, Izabella Krystkowiak2, Leandro Simonetti1, Ahmed Sayadi1, Filip Mihalic3, Johanna Kliche1, Eva Andersson3, Per Jemth3, Norman E Davey *,2 and Ylva Ivarsson *,1 1Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden 2Division of Cancer Biology, The Institute of Cancer Research, London, UK 3Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden *Corresponding author. Tel: +44 20 3437 7662; E-mail: [email protected] *Corresponding author. Tel: +46 18 4714038; E-mail: [email protected] Molecular Systems Biology (2022)18:e10584https://doi.org/10.15252/msb.202110584 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Specific protein–protein interactions are central to all processes that underlie cell physiology. Numerous studies have together identified hundreds of thousands of human protein–protein interactions. However, many interactions remain to be discovered, and low affinity, conditional, and cell type-specific interactions are likely to be disproportionately underrepresented. Here, we describe an optimized proteomic peptide-phage display library that tiles all disordered regions of the human proteome and allows the screening of ~ 1,000,000 overlapping peptides in a single binding assay. We define guidelines for processing, filtering, and ranking the results and provide PepTools, a toolkit to annotate the identified hits. We uncovered >2,000 interaction pairs for 35 known short linear motif (SLiM)-binding domains and confirmed the quality of the produced data by complementary biophysical or cell-based assays. Finally, we show how the amino acid resolution-binding site information can be used to pinpoint functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the proteome. The optimized human disorderome library paired with PepTools represents a powerful pipeline for unbiased proteome-wide discovery of SLiM-based interactions. Synopsis An optimized phage peptidome that tiles the disordered regions of the human proteome is presented, allowing the field of motif-based interactions to transition into high-throughput. Guidelines and tools for data analysis are provided. An optimized second generation human disorderome (HD2) phage library tiles all disordered regions from the human proteome. Different peptide display parameters are tested, including display on the major or minor coat proteins of the M13 phage, and splitting the library design based sub-cellular localization of the peptide containing proteins. PepTools is a dedicated toolkit to annotate peptides and to identify consensus motifs. > 2,000 motif-based interactions are presented, together with information on potential disease mutations or phosphorylation sites that might affect the interactions. Introduction System-wide insights into protein–protein interactions (PPIs) are crucial for a comprehensive description of cellular function and organization, and a molecular understanding of genotype-to-phenotype relationships. Impressive advances are being made toward illuminating the human interactome. For example, Luck et al (2020) recently provided the human reference interactome (HuRI), a map of about 53,000 human PPIs generated by all-by-all yeast-two-hybrid (Y2H) screening. Moreover, Huttlin et al (2021) released BioPlex 3.0, a dataset generated through affinity-purification coupled to mass spectrometry (AP-MS) that contains close to 120,000 interactions. However, a hidden interactome of low affinity, transient, and conditional interactions remains undiscovered. A significant portion of these unknown interactions are likely mediated by short linear motifs (SLiMs) found in the intrinsically disordered regions (IDRs) of the human proteome (Tompa et al, 2014). Given that IDRs are predicted to constitute up to 40% of the residues in higher eukaryotic proteomes (Pancsa & Tompa, 2012; Xue et al, 2012), the consensus is that tens of thousands of human motif-based interactions remain undiscovered. Here, we focus on proteome-wide screening of SLiM-based interactions involving a folded domain in one protein and a short peptide present in an IDR in another protein. On average, a SLiM interface buries only 3–4 residues in the binding pocket of the folded binding partner and the interactions are often of low-to-mid micromolar affinities (Van Roey et al, 2014; Ivarsson & Jemth, 2019). SLiM-based interactions are prevalent and crucial for dynamic processes such as cell signaling and regulation. They commonly direct the transient complex association, scaffolding, modification state, half-life, and localization of a protein. The Eukaryotic Linear Motif (ELM) database, which is the most comprehensive, manually curated database of SLiMs, currently holds 2,092 experimentally validated human SLiM instances (Kumar et al, 2020). Most of these interactions have been characterized through low-throughput experiments, as the properties that make SLiM-based interactions suited for their physiological function make them difficult to capture experimentally by classical large-scale PPI discovery methods. For example, the stringent washing steps in large-scale AP-MS protocols bias selections toward stronger binders. In contrast, the resolution of modern variants of Y2H is ~ 20 μM (Cluet et al, 2020), which overlaps with the affinity range of most motif classes. However, Y2H is limited to proteins that can translocate to the nucleus, are not toxic in yeast, and do not cause autoactivation (Dreze et al, 2010). Many SLiM-based PPIs rely on additional binding sites present in the interacting proteins, which further complicates their identification (Ivarsson & Jemth, 2019; Bugge et al, 2020). Consequently, it is likely that the majority of SLiMs remain to be discovered (Tompa et al, 2014). Proteomic peptide-phage display (ProP-PD) offers a large-scale approach to simultaneously identify novel SLiM-based PPIs and the binding motifs (Fig 1A) (Ivarsson et al, 2014; Davey et al, 2017). In ProP-PD, a phage-encoded peptide library is computationally designed to display the disordered regions of a target proteome. The designed peptides are displayed on the M13 phage that has a circular single-stranded DNA (ssDNA) genome that is encapsulated by five coat proteins (Huang et al, 2012; Marvin et al, 2014). Approximately 2,700 copies of the major coat protein P8 cover the length of the phage, and five copies of the minor coat protein P3, which is necessary for infection, are presented at one end of the phage (Fig 1B). The approach is similar to combinatorial peptide-phage display that has been extensively applied to identify SLiM specificity determinants (Teyra et al, 2020), but displaying designed sequences instead of randomized sequences. We have previously constructed a first-generation human disorderome (HD1) (Davey et al, 2017) displayed on the major coat protein P8 and used it to identify interactors and binding sites for several proteins, including the docking interactions of the phosphatases PP2A (Wu et al, 2017), PP4 (Ueki et al, 2019), and calcineurin (Wigington et al, 2020). However, the HD1 library suffers from limitations that have hampered the exploitation of the full power of the approach, with a main limitation being a low coverage of the library design in the constructed phage library due to low quality of the oligonucleotide pool obtained from the commercial provider and suboptimal tiling of the IDRs (Davey et al, 2017). The field has also been limited by a lack of guidelines on how to design ProP-PD experiments, postprocess the results, and attribute confidence to the selected peptides. Figure 1. ProP-PD workflow, library design and quality, and initial evaluation of selection results Schematic visualization of library design, cloning process, phage selection, and data analysis. Two main library parameters were explored: (i) comparing selection results from the whole HD2 library versus sublibraries grouped by subcellular localization, and (ii) the display of the HD2 peptide library design on phage proteins P8 (multivalent, HD2 P8) and P3 (monovalent, HD2 P3), respectively. Comparison of the percentage of peptides that are reproduced in pairwise comparisons between replicate selections for the same bait (blue), for the same control bait (green) and for different bait proteins (red). Comparison of the percentage of selected peptides that are overlapping in pairwise comparisons between replicate selections for the same bait (blue), for the same control bait (green), and for different bait proteins (red). Comparison of the log10 enrichment probability of the ELM defined motif consensus in peptides selected for the correct consensus-binding bait (blue) and all other baits (red). Comparison of the CompariMotif similarity of the de novo SLiMFinder-defined enriched motif in the overlapping and replicated peptides against the established ELM consensus for the bait (blue) and against all other ELM classes (red). Selection quality metrics split per bait. Data include metrics from panels (C) through (F). Enriched de novo consensus shows the P-value of the SLiMFinder-discovered enriched motif, and Enriched Interactors show the probability the selection returning the observed number of previously validated interactors for the bait by chance. Asterisk denotes no motif defined for the bait. Data for the panel are available in Dataset EV4. Data information: Boxen plots (C–F) are used to more accurately visualize the distribution of values. The central section has two blocks each containing 25% of the data split by the median (denoted by a dark black bar) and each additional block represents 50% of the data of the previous block. Sample sizes are (C) and (D): nbait-bait = 358, ncontrol-control = 156 and nbait-other = 23,276, (E): nbait-bait = 61 and nbait-other = 7,633, (F): nbait-bait = 40 and nbait-other = 1,560. Download figure Download PowerPoint In this study, we present a novel resource for the interactomics community. We describe an optimized human disorderome library (HD2), an online toolkit for annotation and analysis of selected peptide ligands termed PepTools (http://slim.icr.ac.uk/tools/peptools/), and general guidelines on how to analyze the results. We evaluated the HD2 ProP-PD library by using it in selections against a benchmarking set of 34 bait protein domains representing 30 distinct domain families with known motif-mediated interaction partners listed in the ELM database (Kumar et al, 2020). We also screened against the HEAT repeat of importin subunit beta-1 (KPNB1 HEAT), which is a challenging test case due to its typically low affinity for individual peptide ligands (Milles et al, 2015). Selections against the novel HD2 library captured 65 (19.3%) of the 337 known SLiM-mediated interactions for the screened protein domains, which is twice the recall of SLiM-based interactions as compared to the recall of Y2H and MS based screens. We uncovered 2,161 potential SLiM-mediated interactions and defined the binding sites of these interactions at amino acid resolution. Biophysical characterization demonstrated that the selections capture interactions in a broad affinity span, ranging from low nanomolar to millimolar range. Using importin subunit alpha-3 (KPNA4) we validated the functional relevance of novel interactions. We further systematically tested parameters to define the optimal analysis setup by examining the use of cell compartment-specific sublibraries, and the display on the minor coat protein P3 instead of the major coat protein P8. Finally, we explored the effects of phosphorylation or disease-related mutations on the interactions, thus highlighting the advantage of simultaneous PPI screening and binding site identification. The approach outlined here is generally applicable and will be of great value when exploring interactions involving the IDRs of the human proteome. Results ProP-PD library design, construction, and quality control We designed a phage-encoded library of peptides representing the IDRs of the intracellular human proteome (Fig 1A, Dataset EV1). These disordered regions were tiled as 16-amino acid-long peptides that are overlapped by 12 amino acids. The library contains 938,427 peptides from 16,969 proteins and covers approximately one-third of the proteome tiled with overlapping peptides. An interactive website to explore the full library design is available at http://slim.icr.ac.uk/phage_libraries/human/proteins.html. The library was subdivided into different, partially overlapping, pools based on the cellular localization of the peptide-containing proteins (cytoplasmic, endomembrane, cytoplasmic and nuclear, and nuclear based on localization annotation; Fig 1B) to allow for compartment-specific sampling of the interaction space. The point of subdividing the library into pools based on subcellular localization is to reduce the number of competing interactions. The sequences were displayed using an M13 phage system where fusion proteins of the designed peptides and a coat protein are encoded by a phagemid, and a M13KO7 helper phage provides all genes necessary for phage infection, replication, assembly, and budding (Ledsgaard et al, 2018). Fusion of the peptides to the P8 protein results in the display of peptides on 5–40% of the ~ 2,700 copies of the P8 protein on each phage (Fig 1B) (Malik et al, 1996). We also generated a version of the HD2 library displayed on the minor coat protein P3 (HD2 P3; Fig 1B), which results in monovalent display. Next-generation sequencing (NGS) of the phage libraries confirmed that ~ 90% of the designed peptide sequences were present in the constructed libraries, and the extrapolated library coverage percentage surpassed 95% (Dataset EV1, Appendix Fig S1). As each amino acid of the IDRs is covered by at least two overlapping peptides, this design ensures full coverage of the human IDRs by the library. We thus confirmed that the constructed phage libraries have high coverage and are of high quality. Phage selections and initial evaluation of selection results We established a benchmarking set of 34 SLiM-binding domains from 30 domain families (Table 1, Dataset EV2). The selected bait domains were chosen to represent the diversity of motif types recognized by motif-binding pockets (Table 1, Appendix Fig S2, Dataset EV3, http://slim.icr.ac.uk/data/proppd_hd2_pilot). In addition, we included the HEAT domain of KPNB1 as a challenging test case based on its typically low ligand affinity (Milles et al, 2015). A set of protein domains not expected to bind to the library peptides were chosen as negative controls, namely the phospho-peptide-binding proteins 14-3-3 protein sigma (SFN 14-3-3) (Yaffe et al, 1997) and interferon regulatory factor 3 (IRF3 IRF-3) (Liu et al, 2015), the N-acetyl-peptide-binding PONY domain of the DCN1-like protein 1 (DCUN1D1 PONY) (Scott et al, 2011), and the C-terminal-binding Cap-Gly domain of CAP-Gly domain-containing linker protein 1 (CLIP1 Cap-Gly) (Kumar et al, 2020). As the libraries described here do not display free N-terminal or C-terminal residues, and no post-translational modifications are introduced, these domains should represent valid negative controls. GST was used as an additional negative control as all bait proteins were GST-tagged. Table 1. Overview of the baits and the outcome of the ProP-PD selections Gene Domain Motifs found Motifs in library Observed motif Expected motif ANKRA2 Ank 2 4 [LMP]xLPx[FIL] PxLPx[IL] x{1,3}[VLF] AP2B1 Adaptin 2 8 [FW]xx[AFLP] [DE]x{1,2}Fxx[FL]xxxR CALM1 EF-hand 0 19 WxxL [ACLIVTM]xx[ILVMFCT]Qxxx[RK] CEP55 EABR 1 3 PPxxxY AxGPPx{2,3}Y CLTC Clathrin-propeller 0 9 LIx[FW] L[IVLMF]x[IVLMF] [DE] CRK SH3 2 11 Px[LV]Px[KR] PxxPx[KR] EIF4E eIF4E 2 6 – YxxxxL[VILMF] EPS15 EH 10 37 NPF NPF KEAP1 KELCH 1 7 TGE [DNS]x[DES][TNS]GE KLC1 TPR 0 8 – [LMTAFSRI]xW[DE] KPNA4 Arm 0 18 KRxxx[DES] Polybasic KPNB1 HEAT 0 2 [AILPV][FY]xF FxF G MAD2L1 HORMA 0 2 – [KR][IV][LV]xxxxxP MAP1LC3A Atg8 5 14 [FWY]xx[ILV] [EDST]x{0,2}[WFY]xx[ILV] MAP1LC3B Atg8 3 15 [FHWY]xx[ILV] [EDST]x{0,2}[WFYxx[ILV] MDM2 SWIB 3 5 FxxxWxxL FxxxW xxx[VIL] NEDD4 WW4 2 8 [LP]PxY PPxY OXSR1 OSR1-C 4 6 RFx[IV] RFx[IV] PABPC1 PABP 10 19 AxxF[VY]P [LFP][NS][PIVTAFL]xAxx[FY]x[PYLF] PDCD6IP Alix-V-domain 0 0 YPxL [LM]YPx[LI] PEX14 Pex14 0 9 [FLM]xxxW Fxxx[WF] PTK2 Focal-AT 2 5 – [LV] [DE] x [LM] [LM]xxL SIN3A PAH 1 6 [FILMVW]xxL[LV] [FHYM]xA[AV]x[VAC]L[MV]x[MI] SPSB1 SPRY 0 1 – [ED][LIV]NNN SUFU SUFU 0 2 – [SV][CY]GH[LIF][LAST][GAIV]. SUMO1 Rad60-SLD 6 29 [IV]DLxxD [VILPTM][VIL][DESTVILMA][VIL] TLN1 PTB 0 13 Wxx[NS]x[IL] NPx[FY] TNKS Ank 2 16 Rxx[AP]xG R xx[PGAV] [DEIP]G TSG101 UEV 1 10 [AP][ST]AP P[TS]AP USP7 MATH 1 9 [AP][GS]xS [PA]xxS VASP EVH1 2 11 [FW]PxP[LP] [FYWL]PxPP WDR5 WD40 0 11 – [SCA]AR[STCA] YAP1 WW1 4 9 [LP]PxY PPxY YES1 SH3 0 5 RxLPxxP [RKY]xxPxxP ZMYND11 MYND 0 2 [MP]Px[LY] PxL xP GST GST – – – – DCUN1D1 PONY 0 2 – ^M[MIL]x[MIL] SFN 14-3-3 0 58 – LxIS IRF3 FHA 0 3 – Rxx[ST]xP CLIP1 Cap-Gly 0 4 – xW[RK][DE]GCY$;[ED]x{0,2}[ED]x{0,2}[EDQ]x{0,1}[YF]$ Overview of the bait constructs screen in the current study, the number of validated motifs discovered in selection for each bait, the number of validated motifs present in the HD2 library, the enriched motif consensus in the peptides selected for each bait, and the expected consensus for each bait. Gray shaded area indicates baits used as negative controls. The bold and underlined characters indicate matches between the motifs reported in ELM and the motif generated based on ProP-PD results. Sequence logos of the observed and expected motifs are available for comparison at http://slim.icr.ac.uk/data/proppd_hd2_pilot. The HD2 libraries, the HD1 library (displayed on P8), and a combinatorial peptide phage display library with high complexity (displayed on P8, estimated 1010 diversity) (Ilari et al, 2015) were used in triplicate selections against the immobilized bait proteins for four rounds of phage selections. The peptide-coding regions of the binding-enriched phage pools were barcoded and analyzed by NGS (Appendix Fig S3). The peptide sequences were mapped to the human proteome with PepTools (http://slim.icr.ac.uk/tools/peptools/), our novel web-based tool developed for the annotation of protein regions built on the annotation framework of the PSSMSearch tool (Krystkowiak et al, 2018) (Dataset EV4: http://slim.icr.ac.uk/data/proppd_hd2_pilot). Next, we analyzed the selected peptides for each bait to understand the ability of the ProP-PD approach to specifically and reproducibly enrich for binders. We found an enrichment of replicated peptides in selections against the same bait proteins, as expected for successful selections (Fig 1C). Overlapping peptides were more frequently found in selections for the same bait as compared to unrelated screens (Fig 1D). Moreover, the expected ELM consensus for a bait was often enriched in identified peptides selected for that bait (Fig 1E), and the consensus motif discovered de novo based on the identified peptides matched the key residues of the expected ELM consensus for the bait (Fig 1F and Dataset EV3, http://slim.icr.ac.uk/data/proppd_hd2_pilot). Replicated peptides, overlapping peptides, and enriched binding determinants are hence strong indicators of a successful selection. We further analyzed the results on the bait protein level (Fig 1G), and found that only four of the bait proteins from the benchmarking set had selection quality statistics that were similar to the negative controls, indicating little or no enrichment for specific binders (MAD2L1, SPSB1, SUFU, and WDR5). The low enrichment of ligands observed for these domains with well-characterized motif-binding preferences might relate to protein quality issues (including for example incompatibility with the immobilization method) (Kumar et al, 2020). Benchmarking of metrics for ranking of ProP-PD results Next, we benchmarked the discriminatory power of several criteria for filtering and prioritization of the selected peptides to establish a robust protocol for data analysis. The data returned from successful ProP-PD selections contain enriched bait-binding peptides and noise introduced by spurious peptides identified because of the depth of the sequencing. We used four metrics to define peptide quality: (i) reproducible occurrence in replicate selections, (ii) identification of a region with overlapping peptide hits, (iii) the presence of a shared consensus motif, and (iv) strong enrichment as indicated by high NGS read counts (Fig 2A). We evaluated the discriminatory power of each of the metrics using a ProP-PD motif benchmarking dataset (Dataset EV5; http://slim.icr.ac.uk/data/proppd_hd2_pilot) compiled from the ELM database and structures of SLiM-domain complexes available in the Protein Data Bank (PDB). The benchmarking dataset contains 337 motif instances that have previously been reported to bind to the 34 benchmarking bait proteins and that are represented in the HD2 P8 library. We found, as expected, that peptides that were discovered through the HD2 P8 selections and overlapped with the benchmarking dataset were more frequently found in replicate selections (P = 2.82 × 10−19), identified with overlapping peptides (P = 9.75 × 10−58) and contained the de novo consensus established for the ProP-PD-derived peptides using SLiMFinder (P = 4.41 × 10−49; Fig 2B–D). Previously validated motif instances also had higher than average normalized peptide counts (P = 3.68 × 10−9; where normalization is based on the NGS counts observed for each peptide in a replicate selection against a given bait to the total NGS counts for the bait selection; Fig 2E, see also Appendix Fig S4A). The results support that the four metrics have predictive power in terms of discriminating genuine binding peptides from the non-specific background binding events (Fig 2G). Cut-off values were determined for each of the four metrics through receiver operating characteristic (ROC) curve analysis (Fig 2A). The resulting binary confidence criteria obtained for the individual metrics were combined for each peptide to create a single score termed “Confidence level” (Fig 2F). Peptides were classified into four categories based on their confidence level (“High” for a confidence level of 4, “Medium” for a confidence level of 2 or 3, “Low” for a confidence level of 1, and “Filtered” for all other peptides). As expected, we identified no or few medium/high confidence peptides for the negative control baits. One notable exception was the overlapping and replicated 1836-PSWLADIPPWVPKDRP-1851 peptide from microtubule-associated protein 1A (MAP1A) selected by the SFN 14-3-3. The aspartate side chain of the MAP1A1836–1851 peptide may mimic the negative charge of a phospho-serine, as previously shown for other unphosphorylated 14-3-3-binding peptides (Petosa et al, 1

Referência(s)