Artigo Acesso aberto Revisado por pares

SIDERITE: Unveiling hidden siderophore diversity in the chemical space through digital exploration

2024; Wiley; Volume: 3; Issue: 2 Linguagem: Inglês

10.1002/imt2.192

ISSN

2770-5986

Autores

Ruolin He, Shaohua Gu, Jiazheng Xu, Xuejian Li, Haoran Chen, Zhengying Shao, Fanhao Wang, Jiqi Shao, Wen‐Bing Yin, Long Qian, Zhong Wei, Zhiyuan Li,

Tópico(s)

Plant Parasitism and Resistance

Resumo

In this work, we introduced a siderophore information database (SIDERTE), a digitized siderophore information database containing 649 unique structures. Leveraging this digitalized data set, we gained a systematic overview of siderophores by their clustering patterns in the chemical space. Building upon this, we developed a functional group-based method for predicting new iron-binding molecules with experimental validation. Expanding our approach to the collection of open natural products (COCONUT) database, we predicted a staggering 3199 siderophore candidates, showcasing remarkable structure diversity that is largely unexplored. Our study provides a valuable resource for accelerating the discovery of novel iron-binding molecules and advancing our understanding of siderophores. Siderophore is a diverse family of secondary metabolites that exhibit high affinities for binding and chelating iron, one of the essential elements for cellular processes, including replication and respiration [1, 2]. The significance of siderophores lies in their vital role in ensuring microbial survival and growth. Pathways associated with siderophore synthesis and uptake are widely present in microorganisms [3], constituting complex ecological games [4]. As a special type of natural product, siderophores exhibit notable antibacterial and antifungal activities, making them promising candidates for the development of novel therapeutics [5]. Despite their importance, our current understanding of siderophores is still limited due to their high diversity. Thanks to the efforts of countless researchers over the past few decades, significant progress has been made in systematically analyzing siderophores. In 2010, Robert C. Hider and Xiaole Kong provided a valuable resource of siderophores in a seminal review, which included structural features of 294 siderophores in the appendix [6]. According to a review in 2014, over 500 different types of siderophores have been identified, with 270 having been structurally characterized [7]. The only siderophore database (http://bertrandsamuel.free.fr/siderophore_base/index.php), which contained 262 siderophores, was last updated in 2013 and is no longer maintained. Moreover, new siderophore molecules and even new siderophore functional group types [8] are constantly being discovered in various microorganisms. Information regarding these siderophores is currently dispersed across various publications and needs to be systematically recorded. Another significant challenge in achieving a systematic overview of siderophores is the lack of digitalization, which hinders computational investigations. In the field of natural products, large digitized databases such as collections of open natural products (COCONUT) [9] record their molecules in Simplified Molecular Input Line Entry System (SMILES) format. SMILES is the commonly used format for storing and analyzing chemical molecules, which translates a chemical structure into a string of symbols that are easily readable by computer software [10]. This format enables large-scale computational investigations, such as machine learning [10]. However, there is no systematically curated digital data set about siderophores. Digitalized natural product databases do not offer publicly accessible and searchable instances of "siderophore" and only contain a fraction of currently known siderophores. Taken together, establishing a comprehensive siderophore database is crucial for gaining a deeper understanding of siderophore synthesis, function, and application. To fulfill this need, we have developed the Siderophore Information Database (SIDERTIE), a user-friendly platform that includes 649 unique structures in SMILES format, covering all known siderophores up to May 2023. Leveraging SIDERTIE's digitalization capabilities, we presented the most comprehensive statistics of siderophores to date, covering biosynthetic pathways, source of producers, and several chemical characterizations. The dispersed distribution of known siderophores within the chemical landscape of natural products hints at the vast, largely uncharted territory of undiscovered diversity. Building upon this quantitative overview, we proposed a functional group-based method to batch discover new siderophores, with experimental validation. The Siderophore Information Database (SIDERITE, http://siderite.bdainformatics.org) contains 872 records covering all known siderophores up to May 2023. In addition to siderophore records from previous databases and reviews [6, 11-14], 224 records were curated from single research articles for the first time (Table S1). In addition to the expanded collection, SIDERITE records the siderophore structures in SMILES format. Notably, in comparison to other siderophore collections, SIDERITE boasts the largest collection of siderophores and stands out for being freely accessible and digitized (Table S2). Digitizing siderophores enables computational analysis, particularly in unifying siderophores based on their chemical structures. By comparing the canonical SMILES of siderophores, we identified 649 unique siderophore structures out of the 872 total records (Table S3). During this process, we observed that many siderophores share identical structures but have different names, such as bacillibactin and corynebactin. This observation indicates that the same siderophores were discovered in different species or by different research groups [15, 16]. Therefore, for each unique siderophore structure, we merged corresponding records and designated one of their names as the official "Siderophore name" while recording the other names as "Siderophore other name." Digitizing siderophores also enables computational analysis of statistics (Figure S1). Siderophores have been known to exhibit remarkable structural diversity [6] (Figure S2). Converting siderophores into SMILES format enables us to quantify their chemical similarity more effectively, both within the SIDERITE database and between other natural products. To systematically assess the structural diversity of siderophores, we first locate all 649 SIDERITE structures in the vast chemical space of the COCONUT database by merging all molecules in these two databases, which encompass over 4 × 105 natural products. By TMAP visualization of chemical similarity (Figure S3, described in the method section), we observed that the 649 siderophores could be grouped into 25 distinct clusters, which were separated from each other by natural products from the COCONUT. The clustering result shows that siderophores have unevenly distributed structural diversity (Figure 1 and Table S3). Most of these clusters (16 out of 25) only contain a few members ( 60%) to the known 649 siderophores cataloged in SIDERITE. The remaining 2915 molecules are strong candidates for novel siderophores with relatively unexplored chemical structures. This analysis underscores the notion that the structural diversity of siderophores remains largely concealed, inviting further in-depth exploration and investigation. Subsequently, we searched for purchasable molecules out of the 3199 candidates for experimental verification. 48 molecules (Table S7 and Figure 2B) are available in the commercial natural product library (the Natural Product Library for high throughput screening, catalog number L6000, TargetMol, June 2023). Among these molecules, 22 are soluble in water, while the remaining 26 have poor solubility (Table S7). To address this, we dissolved the poorly soluble molecules by dimethyl sulfoxide instead of water. Subsequently, solutions of these 48 molecules were tested by the CAS assay, a universal colorimetric method that detects iron-binding molecules [17]. The high positive rate from the CAS assay supports the effectiveness of our functional group-based method. Among the tested molecules, 20 out of 22 (90.9%) water-soluble compounds and 20 out of 26 (76.9%) water-insoluble compounds exhibited iron-binding activity, as evidenced by a noticeable change in the color of the CAS dye (Figure 2G,H and Table S8). Actually, most molecules with negative CAS results (compounds 20, 24, 31, 33, 37, 41, and 44) exhibited unusual color patterns, which hinders accurate assessment. For instance, their original solutions were significantly dark in color or reacted with CAS reagents and formed precipitates or turbidity. Actually, compounds 24, 37, 41, and 44 did induce color change in the CAS assay solution, but their precipitation interfered with the optical density measurement. Taken together, only one (compound 13) out of the 48 molecules was confirmed to lack iron-chelating ability. In our work, we introduced the most comprehensive Siderophore Information Database (SIDERTE, https://siderite.bdainformatics.org/), the first digitized siderophore repository with 649 unique structures in the SMILES format. This digitized repository empowers researchers to transcend the limitations of manual approaches, paving the way for data-driven discoveries in the siderophore field. On the basis of these digitized structures, a computational method was developed for discovering novel iron-binding molecules with high accuracy and found remarkable structural diversity largely uncharted in the realm of siderophore research. SIDERTE provides a repository for novel siderophore discoveries. We provide tutorial materials and feedback channels in the database or the GitHub page (see Supporting Information Material for details) and are committed to maintaining the SIDERITE database continually and updating it based on the feedback received from our users. Ruolin He and Shaohua Gu collected the data and drafted the manuscript. Ruolin He performed the majority of computational analysis in this research. Shaohua Gu and Zhong Wei designed experiments. Ruolin He, Fanhao Wang, and Jiqi Shao proposed the functional group-based method. Jiazheng Xu and Zhengying Shao were responsible for the execution and analysis of the CAS experiments. Xuejian Li, Haoran Chen, and Long Qian contributed to constructing the website SIDERITE. Wen-Bing Yin offered insightful comments. Wen-Bing Yin, Zhong Wei, and Zhiyuan Li revised the manuscript. Zhong Wei and Zhiyuan Li oversaw the project. Zhiyuan Li conceptualized the project. All authors have read the final manuscript and approved it for publication. We thank Professor Luhua Lai for her insightful comments and suggestions. This work was supported by the National Natural Science Foundation of China (42107140, 32071255, T2321001, and 42325704), the National Key Research and Development Program of China (2021YFF1200500), National Postdoctoral Program for Innovative Talents (BX2021012), and China Postdoctoral Science Foundation (2020M680212). The authors declare no conflict of interest. No animals or humans were involved in this study. The data underlying this article are available in Zenodo at https://zenodo.org/doi/10.5281/zenodo.10369626. The codes underlying this article are available on GitHub at https://github.com/RuolinHe/SIDERITE. The database is available at http://siderite.bdainformatics.org. Supporting Information Materials (methods, figures, tables, scripts, graphical abstract, slides, videos, Chinese translated version, and updated materials) may be found in the online DOI or iMeta Science http://www.imeta.science/. Figure S1: The statistics of 649 unique siderophores in SIDERITE. Figure S2: Known siderophore functional groups (ligands). Figure S3: Displaying 25 clusters of 649 siderophores in the COCONUT database by TMAP. Figure S4: Visualization of 649 siderophores with functional group hydroxamate number by TAMP. Figure S5: Visualization of 649 siderophores with functional group catecholate number by TAMP. Figure S6: Visualization of 649 siderophores with functional group phenolate number by TAMP. Figure S7: Visualization of 649 siderophores with functional group carboxylate number by TAMP. Figure S8: Visualization of 649 siderophores with functional group carboxylate in citrate number by TAMP. Figure S9: Visualization of 649 siderophores with functional group alpha-hydroxycarboxylate number by TAMP. Figure S10: Visualization of 649 siderophores with functional group hydroxyphenyloxazoline number by TAMP. Figure S11: Visualization of 649 siderophores with functional group hydroxyphenylthiazoline number by TAMP. Figure S12: Visualization of 649 siderophores with functional group alpha-aminocarboxylate number by TAMP. Figure S13: Visualization of 649 siderophores with functional group alpha-hydroxyimidazole number by TAMP. Figure S14: Visualization of 649 siderophores with functional group alpha-hydroxycarboxylate in citrate number by TAMP. Figure S15: Visualization of 649 siderophores with functional group diazeniumdiolate number by TAMP. Figure S16: Visualization of 649 siderophores with functional group 2-nitrosophenol number by TAMP. Figure S17: The predicted properties of 649 siderophores. Figure S18: The distribution of C/N and C/O ratios in the different biosynthetic types and clusters. Figure S19: The distribution of nitrogen atom and oxygen atom numbers in the different biosynthetic types and clusters. Figure S20: The SIDERITE database usage and interface. Table S1: 872 siderophore information records. Table S2: Comparison of databases related with siderophores. Table S3: Detailed information on 649 siderophores with unique structures. Table S4: 15 functional group structures used in potential siderophore search. Table S5: 8 modified siderophore functional groups in the negative control. Table S6: Tanimoto similarity matrix between 649 siderophores in SIDERITE and 3199 molecules with potential iron-binding activities in COCONUT database. Table S7: Detailed information on 48 molecules with potential iron-binding activities in the CAS assay test. Table S8: The raw OD630 values of 48 molecules with potential iron-binding activities in the CAS assay test. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Referência(s)
Altmetric
PlumX