The environmental fate of organic pollutants through the global microbial metabolism
2007; Springer Nature; Volume: 3; Issue: 1 Linguagem: Inglês
10.1038/msb4100156
ISSN1744-4292
AutoresManuel J. Gómez, Florencio Pazos, Francisco J Guijarro, Vı́ctor de Lorenzo, Alfonso Valencia,
Tópico(s)Analytical Chemistry and Chromatography
ResumoArticle5 June 2007Open Access The environmental fate of organic pollutants through the global microbial metabolism Manuel J Gómez Manuel J Gómez Centro de Astrobiología (INTA-CSIC), Ctra. Torrejón Ajalvir, Km 4. Torrejón de Ardoz, Madrid, Spain Search for more papers by this author Florencio Pazos Florencio Pazos Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Bioalma, Ronda de Poniente 4, Tres Cantos, Madrid, Spain Search for more papers by this author Francisco J Guijarro Francisco J Guijarro Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Search for more papers by this author Víctor de Lorenzo Corresponding Author Víctor de Lorenzo Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Search for more papers by this author Alfonso Valencia Alfonso Valencia Centro Nacional de Investigaciones Oncológicas, Calle Melchor Fernández Almagro 3, Madrid, Spain Search for more papers by this author Manuel J Gómez Manuel J Gómez Centro de Astrobiología (INTA-CSIC), Ctra. Torrejón Ajalvir, Km 4. Torrejón de Ardoz, Madrid, Spain Search for more papers by this author Florencio Pazos Florencio Pazos Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Bioalma, Ronda de Poniente 4, Tres Cantos, Madrid, Spain Search for more papers by this author Francisco J Guijarro Francisco J Guijarro Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Search for more papers by this author Víctor de Lorenzo Corresponding Author Víctor de Lorenzo Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain Search for more papers by this author Alfonso Valencia Alfonso Valencia Centro Nacional de Investigaciones Oncológicas, Calle Melchor Fernández Almagro 3, Madrid, Spain Search for more papers by this author Author Information Manuel J Gómez1, Florencio Pazos2,3, Francisco J Guijarro2, Víctor de Lorenzo 2 and Alfonso Valencia4 1Centro de Astrobiología (INTA-CSIC), Ctra. Torrejón Ajalvir, Km 4. Torrejón de Ardoz, Madrid, Spain 2Centro Nacional de Biotecnología (CSIC), Darwin 3, Cantoblanco, Madrid, Spain 3Bioalma, Ronda de Poniente 4, Tres Cantos, Madrid, Spain 4Centro Nacional de Investigaciones Oncológicas, Calle Melchor Fernández Almagro 3, Madrid, Spain *Corresponding author. Centro Nacional de Biotecnología (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain. Tel.: +34 91 585 4536; Fax: +34 91 585 4506; E-mail: [email protected] Molecular Systems Biology (2007)3:114https://doi.org/10.1038/msb4100156 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions Figures & Info The production of new chemicals for industrial or therapeutic applications exceeds our ability to generate experimental data on their biological fate once they are released into the environment. Typically, mixtures of organic pollutants are freed into a variety of sites inhabited by diverse microorganisms, which structure complex multispecies metabolic networks. A machine learning approach has been instrumental to expose a correlation between the frequency of 149 atomic triads (chemotopes) common in organo-chemical compounds and the global capacity of microorganisms to metabolise them. Depending on the type of environmental fate defined, the system can correctly predict the biodegradative outcome for 73–87% of compounds. This system is available to the community as a web server (http://www.pdg.cnb.uam.es/BDPSERVER). The application of this predictive tool to chemical species released into the environment provides an early instrument for tentatively classifying the compounds as biodegradable or recalcitrant. Automated surveys of lists of industrial chemicals currently employed in large quantities revealed that herbicides are the group of functional molecules more difficult to recycle into the biosphere through the inclusive microbial metabolism. Synopsis The number of new molecules generated by the chemical and pharmaceutical industry has boomed in the last few years owing to the emergence of combinatorial chemistry along with the demand for novel industrial, agricultural and therapeutic products. The number of natural or man-made organic compounds present in the biosphere is somewhere between 8 and 16 million species, of which as many as 40 000 are predominant in our daily lives. Microorganisms are key players in determining the environmental fate of novel compounds because they can be used as carbon and energy sources. Microbial metabolism may not only cause the complete elimination of a given chemical compound but it can also generate chemical species that are as toxic or as persistent as the original ones. In the case of complete metabolism, microbial biodegradation can be exploited for waste treatment and used in directed bioremediation processes in situ or ex situ. Therefore, knowing whether a novel chemical compound is likely to be metabolised by microorganisms is crucial for assessing the environmental risks associated to its production, transportation, utilisation and disposal. However, after 50 years of research on microbial biodegradation, detailed knowledge about catabolic pathways is available for only about 900 chemical species. New pesticides and pharmaceuticals are being produced at rates that cannot be matched by experimental attempts to predict the outcome when spilled or released into the environment. This makes essential to develop systems that can predict the fate of chemical compounds before experimentally assessing the capacity of the microbiota to degrade them. We have approached the problem from a Systems Biology perspective to the global biodegradation network which is formed by integrating all known metabolic reactions exerted by microorganisms on unusual, mostly man-made chemical compounds. The outcome of such analyses reveals that microbial communities integrate their catabolic abilities in a sort of catabolic gene landscape, which is dominated by the scale-free organisation of the corresponding topology. Furthermore, the organisation of the biodegradation network assembled with reactions from many different strains is indistinguishable of those typical of single organisms. The catalytic capacities of such a supermetabolism appear to be much more than the mere addition of the members of the community. In this way, given compounds can be degraded by means of reactions contributed by different partners—culturable or not—of given microbial consortia. On this background, we have addressed the challenge predicting the environmental fate of chemical species from an experience-based perspective, using a (micro)biological logic rather than a purely (bio)chemical appraisal, for example, making the most out of available information about known microbial catabolic reactions on organic pollutants. To this end, we have exploited the wealth of knowledge on the genetic and genomic basis of microbial metabolism available at the University of Minnesota Biodegradation and Biocatalysis Database (UMBBD) to train a rule-based classification system for detecting the association between certain chemical compound descriptors and environmental fates. Such descriptors are based on the deconstruction of chemical structures in atomic triads (also referred to as chemotopes). The Biodegradation Prediction System is based on the deconstruction of any given chemical formula into frequencies of each of the 149 possible chemical triads and its combination with molecular weight and water solubility data for assembling the compound vector. A machine learning system was then used to identify rules that associate such compound vectors to environmental fates as inferred from the analysis of the metabolic network that represents the global biodegradation potential of microbial organisms. Finally, a system to predict the fate of new chemical compounds, using the previously identified rules, was implemented as a web server (www.pdg.cnb.uam.es/BDPSERVER). Automated surveys made on lists of chemicals currently employed for diverse large-scale operations revealed that herbicides seem to be the group of functional molecules that have less favourable prospects of recycling through the global microbial biodegradation network. Introduction The number of new molecules generated by the chemical and pharmaceutical industry has boomed in the last few years owing to the emergence of combinatorial chemistry along with the demand for novel industrial, agricultural and therapeutic products (Dolle, 2004). The number of natural or man-made organic compounds present in the biosphere is somewhere between 8 and 16 million molecular species, of which as many as 40 000 are predominant in our daily lives (Hou et al, 2003). Microorganisms are key players in determining the environmental fate of novel compounds because they can be used as carbon and energy sources (Mishra et al, 2001). Microbial metabolism may not only cause the complete elimination of a given chemical compound but it can also generate chemical species that are as toxic or as persistent as the original ones. In the case of complete metabolism, microbial biodegradation can be exploited for waste treatment and used in directed bioremediation processes in situ or ex situ (Diaz, 2004). Therefore, knowing whether a novel chemical compound is likely to be metabolised by microorganisms is crucial for assessing the environmental risks associated to its production, transportation, utilisation and disposal (Wackett and Ellis, 1999; Wackett, 2004b). However, after 50 years of research on microbial biodegradation, detailed knowledge about biodegradative pathways is available for only about 900 chemical species (Urbance et al, 2003; Ellis et al, 2006). New pesticides and pharmaceuticals are being produced at rates that cannot be matched by experimental attempts to determine the outcome when spilled or released into the environment. This makes essential to develop systems that can predict the fate of chemical compounds (Wackett and Hershberger, 2001; Wackett, 2004b) before experimentally assessing the capacity of the microbiota to degrade them. Although hydrophobicity, water solubility and the presence of xenophores (Klopman et al, 1992; Wackett and Ellis, 1999; Hou et al, 2003) have been invoked for assessing the biodegradability of given compounds, there are many examples in which the presence/absence of certain functional groups do not match the experimental results. As an alternative, we have approached the problem of predicting the environmental fate of chemical species from an experience-based perspective, using a (micro)biological logic rather than a purely (bio)chemical appraisal, for example, making the most out of available information about known microbial catabolic reactions on organic pollutants. To this end, we have exploited the wealth of knowledge on the genetic and biochemical basis of microbial metabolism available at the University of Minnesota Biodegradation and Biocatalysis Database (UMBBD; Ellis et al, 2003, 2006) and the Biodegrative Strain Database of the Michigan State University (BSD; Urbance et al, 2003) to train a rule-based classification system (Quinlan, 1993) for detecting the association between certain chemical compound descriptors and environmental fates. Such descriptors are based on the deconstruction of chemical structures in atomic triads (also referred to as chemotopes). A machine learning system (Quinlan, 1993) was then used to identify explicit rules that associate compound vectors to environmental fates as inferred from the analysis of the metabolic network that represents the global biodegradative potential of microorganisms. Finally, a scheme to predict the fate of new chemical compounds, using the previously identified rules, was implemented as a web server. The results obtained include the evaluation of the prediction capacity of the system and its application to several sets of compounds provided by the European Chemicals Bureau or obtained from the database PubChem Compound—for most of which there are no data on their biological fate. Herbicides seem to be the group of functional molecules that have less favourable prospects of recycling through the global microbial biodegradation network. Results Deconstructing organic chemicals into atomic triad-based compound vectors At the time of starting this work, the UMBBD contained information on 850 compounds and 903 reactions (Ellis et al, 2003, 2006). The first issue at stake was whether structural features of the target molecules could be significantly correlated to their known environmental fate. To this end, we resorted to describing each chemical structure as a whole of 152 descriptors that represented atomic triad frequencies, molecular weight (MW) and water solubility, the latter expressed both quantitatively and qualitatively. Such atomic triads (or chemotopes) included 149 groups of three consecutive, connected atoms that can be identified on the structure of a compound, taking into account the type of connecting chemical bonds. For example, the atomic triad C–C–H is different from C=C–H, whereas C=C–H is equal to H–C=C (Figure 1). The choice of atomics triads instead of focusing on reactive groups or functional motives reflected the tradeoff between having significant structural information and the handling of a minimal number of attributes (see the Discussion section). Deconstruction of each compound in this way is achieved by first translating the SMILES (Weininger, 1988) representation of each molecule, which is available from UMBBD, into other forms of chemical depiction that include explicit information regarding atom connectivity and chemical bond types. Then, the frequency in which the different atomic triads appear for each compound is recorded. MW is also available from UMBDD and compound solubility is, in some cases, accessible through links to the corresponding entry in ChemFinder (Figure 2A). The collection of atomic triad frequencies, the MW and the solubility were then assembled to generate molecular descriptors, henceforth referred to as compound vectors (Figures 1 and 2A). Figure 1.Deconstruction of acetaldehyde into its constituent atomic triads (chemotopes). The figure shows a simple example of generation of the compound vectors mentioned in the text. To this end, the chemical structure of acetaldehyde is shown along with its corresponding SMILES string and its composition in terms of atomic triads. One instance of the atomic triad H–C–H is boxed on the chemical structure of the molecule. The vector representing the properties of acetaldehyde regarding its degradability is assembled from its solubility, MW and the corresponding set of atomic triad frequencies as indicated. Download figure Download PowerPoint Figure 2.Rationale for developing an experience-based biodegradation prediction system. (A) represents the strategy to generate environmental fate classifiers with the learning machine c4.5, in the form of sets of propositional rules, starting from information gathered from the Biodegradation database UMBBD. (B) Sketches the functioning and queries of BDPServer. Download figure Download PowerPoint Through these criteria, nine compounds out of the 850 listed were not associated to any vector because they had less than three atoms or because their entries in UMBBD did not include SMILES strings. About 718 distinct vectors represented the remaining set of 841 compounds, indicating that the correspondence between compounds and vectors is not equipotent. A one-to-one relation between compounds and vectors existed for 625 compounds, whereas 93 vectors described the remaining 216 compounds. Many-to-one relations between compounds and vectors is explained by the fact that positional isomers in which functional groups have changed between equivalent positions may share the same pattern of atomic triad frequencies even if they do have different connectivity and different SMILES strings. That is the case for pyrogallol versus phloroglucinol, and also the case of 2-formil-1-indanone versus 1-formil-2-indanone. In addition, as stereoisomers have the same atomic connectivity and identical composition in terms of atomic triads, they are encoded by the same vectors. This kind of information was expected to enter some noise in the predictive system, although (as explained below) not as important as one could anticipate. Description of chemicals as compound vectors of this sort (Figures 1 and 2A) was used to feed the training algorithm for classification of the molecule according to its fate in the global network shaped by the global microbial metabolism (see below). Classification of compounds according to their environmental fate Once each compound had been expressed in a vector form, the reactions in which the chemical is known to take part, as a substrate or as product, were retrieved from the database (Ellis et al, 2003, 2006; Pazos et al, 2005). To categorise the environmental outcome of the complete list of 850 chemical species under study, we exploited all known metabolic reactions for organic chemicals (independent of their specific bacterial host) to delineate a global network of microbial catalysis (Pazos et al, 2003). Such an inclusive biodegradation network has been described before as an entity with topological properties that resemble single-cell metabolic transactions. Although a network of this kind includes interconnected pathways that may not stand alone in a single organism (MacNaughton et al, 1999; Pelz et al, 1999; Whiteley and Bailey, 2000; Koizumi et al, 2002; Dennis et al, 2003; Zhou, 2003), it does represent the known biodegradative potential of microbial communities at a global scale (Pazos et al, 2003). Such a pooled biodegradation network (Pazos et al, 2003, 2005) was employed to pinpoint the channelling of every compound into one of three final destinations (Figure 3) as follows. Figure 3.Categorisation of the three partially overlapping sets of metabolic pathways that form the global biodegradation network. The sets of chemicals and their metabolic products were defined according to their final environmental sinks: (i) NBs, chemicals that cannot be degraded (nonbiodegradable) and metabolic precursors of molecules that cannot be degraded; (ii) CMs, chemicals that belong to the central metabolism and precursors that are biologically processed to central metabolites; (iii) CDs, molecules that are directly channelled to production of CO2; (iv) CMCDs, the sum of CMs and CDs. The general trend of these types of compounds towards recalcitrance or biodegradation is sketched on top. Download figure Download PowerPoint The first sink was composed of 38 compound entries in UMBBD that were annotated as belonging to the central metabolism. We extended this category of chemicals by including all molecules that participated in pathways through the network leading them to the central metabolism. In this way, a group of 533 chemical species were defined as central metabolism path compounds (CMs). On the other hand, we labelled as recalcitrant, nonbiodegradable compounds those that do not participate as substrates in any reaction documented in UMBBD, and thus can never reach the central metabolism or being biodegraded otherwise. After scrutiny of the global biodegradation network, 108 compounds of the database unequivocally fulfilled that criterion. In addition, two pairs of somewhat special compounds (arsenate/arsenite, benzyldisulphide/benzylmercaptane) that were linked by bidirectional reactions but had no other outgoing connections were also classified as nonbiodegradable. The operative list of recalcitrant compounds included, therefore, 112 compounds. Yet, as before, we extended the nonbiodegradable whole to those molecular species that were directly or indirectly connected to recalcitrant compounds as precursors of ultimately intractable chemicals (Figure 3). The extended set of such molecules included 353 specimens, which were operatively tagged as nonbiodegradable path compounds (NB). This set of compounds did overlap by 112 compounds with the previously defined set of CM compounds. This indicated that many chemicals can either be degraded upon being channelled into the central metabolism or accumulated in the environment if diverted into nonproductive reactions. The nonredundant set that contained all CM and NB compounds included 774 chemical species. The 76 remaining molecules were not connected to either central metabolism or nonbiodegradable compounds. Instead, they belong to various pathways that go straight into carbon dioxide and water, without converting into any of the typical intermediates of the central metabolism. Although they are of course biodegradable, the lack of connections to the central metabolism rules out their classification as CMs. On the other hand, they cannot be classified as NB compounds either, as CO2 is not a bona fide recalcitrant, terminal molecule: it can be captured back to metabolism by a formylmethanofuran dehydrogenase reaction or (in practice) by many other CO2-fixing microbial processes. We thus established a separate, extended type of compounds, which were directly or indirectly positioned in pathways leading to CO2. This group, which includes 329 molecular specimens, was termed as carbon dioxide path compounds (CDs). One further extension of this criterion was to take CO2 and central metabolism as the same final fate, and group all compounds connected to them. The resulting set thus comprises central metabolism and carbon dioxide path compounds (CMCDs) and includes 634 chemicals. CMCDs correspond to what can be considered intuitively as the set of biodegradable compounds. In summary, as shown in Figure 3, each compound can be ascribed to each of three environmental fates (CM, NB and CDs), in which the sum of CMs and CDs forms the operative biodegradable (CMCD) category. The four types of biodegradative fates (CM, NB, CD and CMCD, Figure 3) did overlap to a significant extent. To refine further the sorting of the chemicals and to generate better classifiers for the compounds, we established four separate, binary categorisation schemes that would label out each chemical as belonging to each of the groups or to the cognate negated classes. Accordingly, we defined the following four classification classes: (i) CM or No CM, (ii) NB or No NB, (iii) CD or No CD and (iv) CMCD or No CMCD. Obviously, the most important categorisation for our purposes is the last (CMCD or no CMCD), as it reflects either eventual recalcitrance or amenability to biological recycling. Yet, the other classifiers do hold a considerable practical value as well (see the Discussion section). Matching compound vectors to environmental fates Once a vectorial description of each chemical of the UMBB had been established and a clear classification of outcomes through the global metabolism delineated, we set out to discover relationships between them. As mentioned above, one early difficulty to this end is that one-to-one relations between compounds and vectors existed for only 625 compounds, whereas 93 vectors redundantly described the remaining 216 compounds. A similar scenario occurs with stereoisomers, which are encoded by the same vector compound but may differ in their accessibility to biodegradation. To assess the importance of these cases in the global process, we determined the number of instances in which compounds that share the same frequencies of atomic triads happen to have the same environmental fate. Starting with the 216 compounds that were associated to 93 vectors, we identified all possible pairs of compounds that had the same pattern of atomic triad distribution. Out of the resulting 163 cases, the number of pairs consisting of two compounds with identical fate in the different classification schemes, was as follows: 141 (87%) for CM or No CM; 126 (77%) for NB or No NB; 111 (68%) for CD or No CD; and 142 (87%) for CMCD or No CMCD. These results indicated that in most cases (average, 80%), the environmental fate of structural isomers and stereoisomers after passing them through the global biodegradation network is the same—although in some cases, the specific reactions involved might be different. A second consideration was related to the structural similarity between the different types of compounds. One could suspect that chemicals belonging to the same group (CM, NB, CD, CMCD, or the corresponding negated classes) might share some structural features, especially if they are part of the same metabolic pathway. To examine rigorously this issue, chemical compound similarity was estimated for each pair of compounds using their atomic triad frequencies for calculating a modified version of the Tanimoto association coefficient τ (Holliday et al, 2002). This coefficient reflects the ratio between the number of atomic triads that two compounds have in common and the number of atomic triads that they do not have in common, and can be used as a measure of the distance between compounds, in respect to their chemical similarity. The distribution of such distances for the whole of compound pairs (Supplementary Figure S1A), indicated that the collection of chemicals was quite diverse. Although 90% of the pairs had τ values 19 –O–C–C >1 –O–C–C ⩽3 THEN, the compound belongs to the NB class (Confidence 90.6%) Examples (14 cases) No. Class Compound 451 NB 1-Methoxyphenanthrene 454 NB 9-Phenanthrol 389 NB 9-Fluorenol 535 NB 1-Phenanthrylsulfate 493 NB 4-Phenanthrol 525 NB 2-Phenanthrol 513 NB 2,2′-Biphenyldimethanol 539 NB 4-Phenanthrylsulfate 538 NB 3-Phenanthrylsulfate 537 NB 2-Phenanthrylsulfate 529 NB 9-Phenanthrylsulfate 494 NB 3-Phenanthrol 450 NB 1-Phenanthrol 390 NB 9-Fluorenone NB, nonbiodegradable path compound. Each of the four final classifiers was composed of a set of 16–23 rules, and each rule was composed, in average, of 3.3 attribute-based tests (standard deviation 2.07, range 1–12). To gain some insight on the relationship between chemical structure (i.e., the frequency of triplets) and environmental fate, the rules were reanalysed to assess the weight of each of the attributes. Out of such 152 traits (149 frequencies of atomic triads, MW, quantitative solubility and qualitative solubility), only 52 were included as part of the propositional rules of all classifiers. These attributes are listed in Supplementary Figure S2, together with a graphical depiction of the frequencies in which each of them appears in the rules. For example, attribute-based tests referring to the frequency of atomic triad O–C=O come out in about 45% of the rules that conform the classifier for the scheme CD or No CD. Also, although MW and solubility are taken into account by the classifier NB or No NB, they are useless for the classifier CM or No CM, (Supplementary Figure S2). This reflects that not all attributes have the same importance for each of the environmental fates. To assess the predictive capacity of the system, we followed a fivefold cross-validation strategy. For this, the data set was divided into five blocks; four of them were used as a training set, to generate the classifiers (rules), and the remaining block was used as a test set. This allowed measuring the ability of the classifiers for predicting the environmental fate of chemicals not included in the training set.
Referência(s)