Artigo Acesso aberto Revisado por pares

The Linear Pentadecapeptide Gramicidin Is Assembled by Four Multimodular Nonribosomal Peptide Synthetases That Comprise 16 Modules with 56 Catalytic Domains

2004; Elsevier BV; Volume: 279; Issue: 9 Linguagem: Inglês

10.1074/jbc.m309658200

ISSN

1083-351X

Autores

Nadine Kessler, Holger Schuhmann, Sabrina Morneweg, Uwe Linne, Mohamed A. Marahiel,

Tópico(s)

Antimicrobial Peptides and Activities

Resumo

Linear gramicidin is a membrane channel forming pentadecapeptide that is produced via the nonribosomal pathway. It consists of 15 hydrophobic amino acids with alternating l- and d-configuration forming a β-helix-like structure. It has an N-formylated valine and a C-terminal ethanolamine. Here we report cloning and sequencing of the entire biosynthetic gene cluster as well as initial biochemical analysis of a new reductase domain. The biosynthetic gene cluster was identified on two nonoverlapping fosmids and a 13-kilobase pair (kbp) interbridge fragment covering a region of 74 kbp. Four very large open reading frames, lgrA, lgrB, lgrC, and lgrD with 6.8, 15.5, 23.3, and 15.3 kbp, were identified and shown to encode nonribosomal peptide synthetases with two, four, six, and four modules, respectively. Within the 16 modules identified, seven epimerization domains in alternating positions were detected as well as a putative formylation domain fused to the first module LgrA and a putative reductase domain attached to the C-terminal module of LgrD. Analysis of the substrate specificity by phylogenetic studies using the residues of the substrate-binding pockets of all 16 adenylation domains revealed a good agreement of the substrate amino acids predicted with the sequence of linear gramicidin. Additional biochemical analysis of the three adenylation domains of modules 1, 2, and 3 confirmed the colinearity of this nonribosomal peptide synthetase assembly line. Module 16 was predicted to activate glycine, which would then, being the C-terminal residue of the peptide chain, be reduced by the adjacent reductase domain to give ethanolamine, thereby releasing the final product N-formyl-pentadecapeptide-ethanolamine. However, initial biochemical analysis of this reductase showed only a one-step reduction yielding the corresponding aldehyde in vitro. Linear gramicidin is a membrane channel forming pentadecapeptide that is produced via the nonribosomal pathway. It consists of 15 hydrophobic amino acids with alternating l- and d-configuration forming a β-helix-like structure. It has an N-formylated valine and a C-terminal ethanolamine. Here we report cloning and sequencing of the entire biosynthetic gene cluster as well as initial biochemical analysis of a new reductase domain. The biosynthetic gene cluster was identified on two nonoverlapping fosmids and a 13-kilobase pair (kbp) interbridge fragment covering a region of 74 kbp. Four very large open reading frames, lgrA, lgrB, lgrC, and lgrD with 6.8, 15.5, 23.3, and 15.3 kbp, were identified and shown to encode nonribosomal peptide synthetases with two, four, six, and four modules, respectively. Within the 16 modules identified, seven epimerization domains in alternating positions were detected as well as a putative formylation domain fused to the first module LgrA and a putative reductase domain attached to the C-terminal module of LgrD. Analysis of the substrate specificity by phylogenetic studies using the residues of the substrate-binding pockets of all 16 adenylation domains revealed a good agreement of the substrate amino acids predicted with the sequence of linear gramicidin. Additional biochemical analysis of the three adenylation domains of modules 1, 2, and 3 confirmed the colinearity of this nonribosomal peptide synthetase assembly line. Module 16 was predicted to activate glycine, which would then, being the C-terminal residue of the peptide chain, be reduced by the adjacent reductase domain to give ethanolamine, thereby releasing the final product N-formyl-pentadecapeptide-ethanolamine. However, initial biochemical analysis of this reductase showed only a one-step reduction yielding the corresponding aldehyde in vitro. Gramicidin is a pentadecapeptide antibiotic produced by Bacillus brevis ATCC 8185 during its sporulation phase (1Hotchkiss R.D. Dubois R.J. J. Biol. Chem. 1940; 132: 791-792Abstract Full Text PDF Google Scholar). The primary structure of gramicidin A was determined as formyl-Val-Gly-Ala-d-Leu-Ala-d-Val-Val-d-Val-Trp-d-Leu-Trp-d-Leu-Trp-d-Leu-Trp-ethanolamine (2Sarges R. Witkop B. J. Am. Chem. Soc. 1965; 87: 2011-2020Crossref PubMed Scopus (269) Google Scholar). The other naturally occurring isoforms, gramicidin B and C, have either phenylalanine or tyrosine replacing tryptophan at position 11, respectively. Gramicidin D refers to the naturally produced mixture of gramicidins A, B, and C of ∼80% A, 5% B, and 15% C (3Weinstein S. Wallace B.A. Morrow J.S. Veatch W.R. J. Mol. Biol. 1980; 143: 1-19Crossref PubMed Scopus (94) Google Scholar). In all three gramicidin isoforms, an isoleucine residue instead of a valine one at position 1 has been observed (∼5% (2Sarges R. Witkop B. J. Am. Chem. Soc. 1965; 87: 2011-2020Crossref PubMed Scopus (269) Google Scholar)). Several facts about the sequence of gramicidin are striking. First, the amino acid sequence contains solely hydrophobic residues. Second, the N terminus is blocked by N-formylation of the first residue (valine), and third, the C terminus is blocked with ethanolamine. These three features provide the reason for the high insolubility in water but very good solubility in various organic solvents as gramicidin is unable to adopt a net charge or form a zwitterion at any pH. The last important feature is the alternating l- and d-amino acid composition except for position 2 (glycine). Through this, the molecule forms a helix where all side chains point outwards, resulting in the formation of a β-helix-like channel. Usually two molecules of gramicidin dimerize, giving either a double helix ("pore" form) or a helical dimer ("channel" form) (for more details, see Ref. 4Wallace B.A. J. Struct. Biol. 1998; 121: 123-141Crossref PubMed Scopus (155) Google Scholar). Once inserted into a membrane, it is specific for the transport of monovalent cations across the bilipid layer and thus collapses the transmembrane ion potentials. The above given features about the sequence of gramicidin have let to the assumption that it must be synthesized via the nonribosomal pathway. The nonribosomal peptide synthetases (NRPSs) 1The abbreviations used are: NRPS, nonribosomal peptide synthetases; kbp, kilobase pairs; ORF, open reading frame; PCP, peptidyl carrier proteins; HPLC, high pressure liquid chromatography; RBS, ribosome binding site; aa, amino acids; A, adenylation; C, condensation; E, epimerization; F, formylation; M, methylation; R, reductase; Cy, cyclization; Te, termination; formyl-THF, N10-formyl-tetrahydrofolate. have been researched intensively over the past decades, and their mechanism is well understood by now. NRPSs are large multifunctional enzymes, carrying out several reactions in a specific and coordinated manner. Their products are a large and diverse group of bioactive natural products. These products typically consist of 3–22 carboxyl or amino acids including the nonproteinogenic ones, meaning that the generated structure is of an extremely broad diversity. As these products often contain hydroxyl-, N-methylated, glycosylated, or d-amino acids that contribute to their biological activity, they have many applications in medicine (e.g. penicillin, vancomycin, and cyclosporine), agriculture, and biochemical research. NRPSs are composed of modules where each module possesses all catalytic units to activate, covalently bind an amino (or carboxyl) acid, and perform a condensation reaction through a peptide bond formation. Each module can be divided further into domains where each domain carries out a specific catalytic reaction repeatedly. Three core domains of NRPSs have been identified as being the minimal requirement needed, namely adenylation (A), thiolation (PCP), and condensation (C) domains (5Marahiel M.A. Stachelhaus T. Mootz H.D. Chem. Rev. 1997; 97: 2651-2674Crossref PubMed Scopus (931) Google Scholar). The A domain selects the cognate amino acid specifically and activates it as aminoacyl adenylate at the expense of ATP (6Dieckmann R. Lee Y.O. van Liempt H. von Dohren H. Kleinkauf H. FEBS Lett. 1995; 357: 212-216Crossref PubMed Scopus (66) Google Scholar, 7Stachelhaus T. Marahiel M.A. J. Biol. Chem. 1995; 270: 6163-6169Abstract Full Text Full Text PDF PubMed Scopus (141) Google Scholar). Next, the activated amino acid is transferred onto the thiol moiety of the downstream PCP domain, giving an energy-rich thioester bond (8Weber T. Marahiel M.A. Structure. 2001; 9: R3-R9Abstract Full Text Full Text PDF PubMed Scopus (87) Google Scholar). The C domain is located between two adjacent A-PCP domain pairs. It catalyzes the condensation of the thioester-bound intermediates, thereby elongating the peptide chain by 1 amino acid (9Stachelhaus T. Mootz H.D. Bergendahl V. Marahiel M.A. J. Biol. Chem. 1998; 273: 22773-22781Abstract Full Text Full Text PDF PubMed Scopus (270) Google Scholar). The chain that is then attached to the downstream PCP domain is subsequently used in the condensation reaction of the next C domain and by consequent elongation handed on until it reaches the terminal PCP domain of the NRPS. The peptide chain is then released by either a termination (Te) or a reductase (R) domain. A Te domain gives cyclic, branched cyclic, or hydrolyzed products (10Stachelhaus T. Marahiel M.A. FEMS Microbiol. Lett. 1995; 125: 3-14Crossref PubMed Google Scholar), whereas an R domain performs a reduction step forming an aldehyde or alcohol at the C terminus (11Gaitatzis N. Kunze B. Müller R. Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 11136-11141Crossref PubMed Scopus (110) Google Scholar). Optional domains have been found to modify the bound substrate through an epimerization (E (12Stein T. Kluge B. Vater J. Franke P. Otto A. Wittmann-Liebold B. Biochemistry. 1995; 34: 4633-4642Crossref PubMed Scopus (55) Google Scholar, 13Linne U. Doekel S. Marahiel M.A. Biochemistry. 2001; 40: 15824-15834Crossref PubMed Scopus (79) Google Scholar)), N-methylation (M (14Haese A. Schubert M. Herrmann M. Zocher R. Mol. Microbiol. 1993; 7: 905-914Crossref PubMed Scopus (156) Google Scholar)), or cyclization (Cy (15Konz D. Klens A. Schorgendorfer K. Marahiel M.A. Chem. Biol. 1997; 4: 927-937Abstract Full Text PDF PubMed Scopus (188) Google Scholar)) reaction, enlarging the above mentioned structural diversity of the product. A first example for a putative formylation (F) domain has been found in Anabaena strain 90 (16Rouhiainen L. Paulin L. Suomalainen S. Hyytiainen H. Buikema W. Haselkorn R. Sivonen K. Mol. Microbiol. 2000; 37: 156-167Crossref PubMed Scopus (141) Google Scholar). We describe here the cloning and sequencing of the gramicidin biosynthetic gene cluster encoding four large NRPSs and provide biochemical data for the colinearity of this gigantic assembly line. General Methods—Antibiotics were used in the following concentrations: 50 μg/ml kanamycin; 100 μg/ml ampicillin; 12.5 μg/ml chloramphenicol. All oligonucleotides used in this study were purchased from MWG Biotech or Qiagen Operon. Cells were usually grown in LB medium at 37 °C overnight for plasmid and fosmid preparations. Plasmid and fosmid preparations were carried out using the QIAprep spin Miniprep kit (Qiagen). All restriction and modification enzymes were purchased from New England Biolabs. Blast searches were carried out using either the BLASTX function or the BLASTP function at the NCBI homepage (www.ncbi.nlm.nih.gov/blast/Blast.cgi (17Altschul S.F. Madden T.L. Schaffer A.A. Zhang J. Zhang Z. Miller W. Lipman D.J. Nucleic Acids Res. 1997; 25: 3389-3402Crossref PubMed Scopus (61076) Google Scholar)). Multiple sequence alignments were made using MegAlign (DNAStar, GATC Biotech). Kinetic data were analyzed using Sigma Plot 8.0 with the Enzyme Kinetics Module 1.1 (SPSS Inc.). Isolation of Genomic DNA and PCR with Degenerate Primers—In this study, all genomic DNA from B. brevis ATCC 8185 was isolated and purified using the Genomic-tip 100/G kit (Qiagen) as described in the manufacturer's manual. From a DNA sequence alignment of the conserved cores E5 and E6 of E domains, the following primers were deduced and successfully used in a PCR: NK 5, 5′-aaa ggg atc gg(ct) tac ga(gc) at-3′, and NK 6, 5′-cga (ca)gt (tc)aa cca (tg)cc (ga)a(tc) cgt-3′. The E domains used in the alignment were from Bacillus subtilis ATCC 21332 (srfA), B. brevis ATCC 9999 (grsA), and B. brevis ATCC 8185 (tycA, tycB). The PCR was carried out using the Roche Applied Science PCR kit using the following parameters for annealing: 5 cycles, 60- 0.5 °C/cycle; 5 cycles, 57.5–0.4 °C/cycle; and 25 cycles, 55.5 °C. Elongation time was set to 5 min and 20 s. Only two of the DNA bands observed were dependent upon the addition of both primers to the PCR, having the size of ∼7 and ∼10.5 kbp, the latter one being the corresponding fragment from the tyc operon. The 7-kbp fragment was extracted from the gel using Qiaex II (Qiagen) and cloned into pCR-XL-TOPO (Invitrogen) according to the manufacturer's manual, and one of the few resulting clones carried the right insert as confirmed by sequencing. Sequencing reactions were carried out by the chain termination method (18Sanger F. Nicklen S. Coulson A.R. Proc. Natl. Acad. Sci. U. S. A. 1977; 74: 5463-5467Crossref PubMed Scopus (55549) Google Scholar) with dye-labeled dideoxy terminators from a PRISM ready reaction dyedeoxy terminator cycle sequencing kit with AmpliTaq FS polymerase (Applied Biosystems) according to the manufacturer's protocol and analyzed on an ABI 310 genetic analyzer. Mapping and Inverse PCR—The up- and downstream region of the 7-kbp fragment (encoding ′E-stop start-C-A-PCP-C-A-PCP-E′) was mapped using the enzymes SspI, XmaI, SmaI, EcoRI, AgeI, BamHI, AvaI, BglI, NcoI, and PstI in a Southern blot screen (19Southern E.M. J. Mol. Biol. 1975; 98: 503-517Crossref PubMed Scopus (22109) Google Scholar) carried out as described in the ECL random prime labeling and detection system manual (Amersham Biosciences/Buchler Instruments). The 5′ and 3′ DNA region coding for the E domains of the 7-kbp region were used as probes. The most promising result was an ∼5.5-kbp PstI fragment identified, directed upstream of the 7-kbp fragment. 10 μg of chromosomal DNA from B. brevis ATCC 8185 was digested with 10 units of PstI overnight at 37 °C in a total Volume of 20 μl. This sample was split the next day into 2-, 5-, and 13-μl aliquots and was religated overnight at 16 °C by adding 400 units of T4 DNA ligase to each sample and adjusting the final Volume to 20, 20, and 30 μl, respectively. From each ligation, 1 μl was used as template in a PCR with primers NK 20 (5′-ggc aga tgc gaa agc gct tc-3′) and NK 53 (5′-gaa gac gtg ttt ggc tcc-3′) using the following parameters for annealing: 10 cycles, 57–0.2 °C/cycle; 20 cycles, 55–0.1 °C/cycle. In all three cases, a single band at ∼5.5–6 kbp was observed and after gel purification with Qiaex II cloned into pCR-XL-TOPO. Sequencing confirmed that the insert was the upstream part expected. Generating a Fosmid Library—The fosmid library was generated using the CopyControl fosmid library production kit (Epicenter) according to the manual except that four times the recommended amount of DNA was used. Shotgun sequencing of the selected fosmids was carried out in publication quality by Qiagen with a coverage of 13- and 16-fold of fosmids 3 and 5, respectively. Cloning, Overproduction, and Purification of FAT, ATE, CAT (3Weinstein S. Wallace B.A. Morrow J.S. Veatch W.R. J. Mol. Biol. 1980; 143: 1-19Crossref PubMed Scopus (94) Google Scholar), and PCP-R—All constructs were cloned using the pBAD directional TOPO kit (Invitrogen) as described in the manual. The corresponding genes cloned into pBAD202 were as follows: FAT from bp 11,945 to 14,244 (2,298 bp), ATE from bp 15,521 to 18,763 (3,242 bp), CAT (3Weinstein S. Wallace B.A. Morrow J.S. Veatch W.R. J. Mol. Biol. 1980; 143: 1-19Crossref PubMed Scopus (94) Google Scholar) from bp 18,795 to 21,899 (3,104 bp), and PCP-R from bp 71,410 to 72,858 (1,448 bp). The corresponding vectors were transformed into Escherichia coli BL21 (DE3), and the desired proteins were overproduced as described in the manual. The recombinant proteins carried an N-terminal thioredoxin-fusion protein and a C-terminal V-epitope with His6 tag and were purified using Ni2+-nitrilotriacetic acid affinity chromatography as described before (20Mootz H.D. Marahiel M.A. J. Bacteriol. 1997; 179: 6843-6850Crossref PubMed Google Scholar). Purified proteins were controlled via SDS-PAGE (21Laemmli U.K. Nature. 1970; 227: 680-685Crossref PubMed Scopus (212382) Google Scholar) and concentrated using Vivaspin 20-ml concentrators (membrane 50,000 MWCO polyethersulfone (PES); purchased from Vivascience/Sartorius) to a concentration of about 3–5 mg/ml. ATP-PPi Exchange Assay and Kinetic Studies—The ATP-PPi exchange assay was conducted essentially as reported previously (9Stachelhaus T. Mootz H.D. Bergendahl V. Marahiel M.A. J. Biol. Chem. 1998; 273: 22773-22781Abstract Full Text Full Text PDF PubMed Scopus (270) Google Scholar). The enzyme concentrations were typically 50 pmol for FAT and ATE and 250 pmol for CAT (3Weinstein S. Wallace B.A. Morrow J.S. Veatch W.R. J. Mol. Biol. 1980; 143: 1-19Crossref PubMed Scopus (94) Google Scholar). Reactions were started by the addition of the amino acids and ATP and incubated at 37 °C for 15 min in Hepes buffer, pH 8.0. The apparent Km values for the amino acids were determined with substrate concentrations ranging from 0.1 to 10 mm. The experimental procedure was reported previously (7Stachelhaus T. Marahiel M.A. J. Biol. Chem. 1995; 270: 6163-6169Abstract Full Text Full Text PDF PubMed Scopus (141) Google Scholar). Reductase Assay—The purified PCP-R enzyme was dialyzed against a buffer containing 5 mm Hepes, 5 mm NaCl, pH 7. The assay contained the following ingredients: 60 μm enzyme, 5 μm Sfp, 100 μm substrate (peptidyl-CoA, see "Results"), 300 μm NADPH, 10 mm MgCl2, 10 μm MnCl2. The volume was adjusted to 100 μl using a buffer containing 20 mm Hepes, 50 mm NaCl, pH varying from 5 to 8. The four negative controls lacked enzyme, Sfp, substrate, or NADPH. The assay was thoroughly mixed and incubated at 20 °C for 30 min. The reaction was stopped by the addition of 1 ml of MeOH, kept on ice for >30 min, and then centrifuged at maximum speed in a conventional tabletop centrifuge for 30 min. The solvent from the supernatant was then removed under vacuum at 30 °C for 3 h, and the pellet was resuspended in 100 μl of 50% methanol containing 0.05% formic acid and analyzed using HPLC (1100 series, Agilent) coupled with a 1100MSD-A ESI-Quadrupol mass spectrometer (Agilent). Samples (95 μl of each) were applied to a 125/2-Nucleodur-C18-Gravity column (Macherey-Nagel) with a particle diameter of 3 μm. The gradient of solvent A (water, 0.05% formic acid) and solvent B (acetonitrile) used was as follows: linear from 30% B to 60% B within 10 min, increasing to 95% B within 2 min, and holding 95% B for additional 3 min at a flow rate of 0.3 ml/min and a column temperature of 45 °C. UV detection was carried out at 215 nm. Mass-sensitive detector (MSD) parameters were as follows. Mass range was set from 500 to 760 atomic mass units in positive ion mode, the gain was set to 2.0, and the fragmentor was set to 70. The drying gas flow (N2) was 13 liters/min, the nebulizer pressure was 30 p.s.i.g., the drying gas temperature was 350 °C, and the capillary voltage was 4300 V. Cloning and Sequencing of the lgr Region—We based our model of the lgr synthetase on the assumption that it would be a linear NRPS (type A) as defined by Mootz et al. (22Mootz H.D. Schwarzer D. Marahiel M.A. Chembiochem. 2002; 3: 490-504Crossref PubMed Scopus (277) Google Scholar). Because gramicidin contains 6 d-amino acids, we postulated that the corresponding NRPS should be of the modular structure F-A-T-C-A-PCP-(C-A-PCP-C-A-PCP-E)6-C-A-PCP-C-A-PCP-R. The PCR strategy using degenerate primers as well as the following inverse PCR was successful (see "Experimental Procedures"). Sequencing and analysis of the amplified DNA fragment revealed an NRPS with the domain structure ′A-PCP-C-A-PCP-E-3′ 5′-C-A-PCP-C-A-PCP-E′ of a so far undetermined origin. According to the specificity conferring code (23Stachelhaus T. Mootz H.D. Marahiel M.A. Chem. Biol. 1999; 6: 493-505Abstract Full Text PDF PubMed Scopus (1005) Google Scholar), the substrate specificity of the identified four A domains was determined to be Ala, Val, Val, and Val, respectively, meaning that the structural element of l-Ala-d-Val-l-Val-d-Val should be present in the product. As this is the case in gramicidin, we then generated a fosmid library. Using probes from the DNA region encoding the first A and the last E domain of the 13-kbp fragment in a Southern blot screen, we identified and sequenced two nonoverlapping fosmids covering the whole up- and down-stream region of the fragment, yielding a total of 74 kbp of genetic information (accession number AJ566197). The overall G + C content of the region sequenced is 55.43%, which is slightly higher than the G + C content of the tyc operon (20Mootz H.D. Marahiel M.A. J. Bacteriol. 1997; 179: 6843-6850Crossref PubMed Google Scholar) and higher than that reported for other bacilli. The lgr Synthetases Are Encoded by Four Large ORFs Termed lgrA, lgrB, lgrC, and lgrD—Sequence analysis revealed 11 significant open reading frames as judged by blast searches (17Altschul S.F. Madden T.L. Schaffer A.A. Zhang J. Zhang Z. Miller W. Lipman D.J. Nucleic Acids Res. 1997; 25: 3389-3402Crossref PubMed Scopus (61076) Google Scholar) (Fig. 1), and among them, revealed four very large ORFs spanning almost 61 kbp of the region sequenced with a G + C content of 56.3%. The first of these four ORFs, lgrA (6,822 bp), begins with a GTG start codon at position 11,945 bp of the region sequenced, preceded by a putative RBS. The gene product, LgrA (2273 aa and 257,822 Da), has high similarity to other known peptide synthetases, as do LgrB, LgrC, and LgrD. The first 200 amino acids of LgrA show in particular high similarity to methionin-tRNA-formyltransferases, suggesting this part to be a formylation domain (see below). It is followed by A-PCP-C-A-PCP-E, where the E domain is highly unexpected (see below). The amino acids activated by the two A domains are proposed to be Val/Ile (A1) and Gly (A2) from the product sequence and Leu/Val/Ile (A1) and Cys/Gly (A2) when analyzed using the amino acid specificity conferring code (raynam.chm.jhu.edu/~nrps/index.html; 198 binding pocket constituents are used where the specificity has been proven experimentally (24Challis G.L. Ravel J. Townsend C.A. Chem. Biol. 2000; 7 (25.): 211-224Abstract Full Text Full Text PDF PubMed Scopus (671) Google Scholar) (Table I). lgrB (15,489 bp) starts with a GTG start codon 29 bp downstream of the stop codon of lgrA and is preceded 7 bp by a putative RBS. LgrB (5162 aa and 577,862 Da) harbors four modules with the domain structure (C-A-PCP-C-A-PCP-E)2. The prediction of the A domain specificity is difficult for modules 1 and 3 but clearly Leu for module 2 and Val for module 4. According to the product, we expect the specificity to be Ala, Leu, Ala, and Val, respectively. However, module 1 has the highest similarities regarding the binding pocket with the second module of LgrA (Gly). lgrC (23,271 bp) starts with an ATG 23 bp upstream of the stop codon of lgrB and is also preceded by a putative RBS. The corresponding gene product LgrC (7756 aa and 866,306 Da) is composed of six modules bearing the domain structure (C-A-PCP-C-A-PCP-E)3. The specificity is expected to be Val-Val-Trp-Leu-(Trp/Tyr/Phe)-Leu according to the product. The prediction confirms this for the Val- and Leu-activating domains 1, 2, 4, and 6 but remains unclear for modules 3 and 5. The last large ORF, lgrD (15,258 bp), starts 76 bp downstream of the stop codon of lgrC and is also preceded by a putative RBS. LgrD (5085 aa and 567,448 Da) consists of four modules with the domain structure C-A-PCP-C-A-PCP-E-C-A-PCP-C-A-PCP-R. A putative termination loop (35 bp) starts 7 bp after the stop codon. The putative reductase domain at the C terminus of LgrD has high similarity to reductases from polyketide synthases and other NADPH-dependent reductases such as MxcG (11Gaitatzis N. Kunze B. Müller R. Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 11136-11141Crossref PubMed Scopus (110) Google Scholar) and SafA (25Pospiech A. Bietenhader J. Schupp T. Microbiology (Reading). 1996; 142: 741-746Crossref PubMed Scopus (86) Google Scholar). We expect the four A domains to activate Trp, Leu, Trp, and Gly according to the product, strongly suggesting that ethanolamine is the product of a reduced glycine residue. The prediction is again unclear for the Trp-activating A domains 1 and 3 but clearly Leu for domain 2 and Cys/Gly for domains 4. An alignment of the residues responsible for the substrate specificity of the A domains shows clearly that the A domains that activate similar or identical amino acids cluster (Fig. 2).Table IAmino acid residues responsible for substrate specificityPheA numberingAmino acids235236239278299301322330331517PredictedProductLgrA1DGLYIGGIMKLeu/Val/IleVal/IleLgrA2DIANLCIIYKCys/Gly?GlyLgrA3DVANFAIIYKCys/Gly?AlaLgrA4DAWFLGQVVKLeuLeuLgrA5DLYPNALTYKCys/Arg/Pro?AlaLgrA6DAFWLGGTFKValValLgrA7DAFWLGGTFKValValLgrA8DAFWLGGTFKValValLgrA9DVSSIGCVCKLys?TrpLgrA10DAWFLGQVVKLeuLeuLgrA11DVSAIGCVTKLys?Trp/Phe/TyrLgrA12DAWFLGQVVKLeuLeuLgrA13DVSSIGCVCKLys?TrpLgrA14DAWFLGQVVKLeuLeuLgrA15DVSSEGCVGKLys?TrpLgrA16DLYTIALVYKCys/Gly?Gly Open table in a new tab Fig. 2Phylogenetic tree of a multiple sequence alignment of all 16 binding pocket constituents as described inTable I. The putative specificity was assigned using the sequence of the product. It is shown that those binding pockets of A domains that supposedly activate the same or similar substrate cluster together.View Large Image Figure ViewerDownload Hi-res image Download (PPT) Amplification, Expression, and Biochemical Investigation of the First Three Internal A Domains—DNA fragments encoding module 1 (FAT), module 2 lacking the C domain (ATE), and the complete module 3 (CAT) were amplified from the corresponding fosmid and cloned into pBAD vectors as described under "Experimental Procedures." The enzymes were overproduced in E. coli BL21 (DE3) as His6-tagged proteins and purified using Ni2+-nitrilotriacetic acid affinity chromatography. All proteins were obtained in soluble form in good yield and purity. Determination of the A domain specificity (Fig. 3) was carried out as described above. We found module 1, FAT, to activate l-Val (100%), l-Ile (48%), and l-Leu (12%; the highest activity was set at 100%; background was usually below 1%). No other proteinogenic amino acids were activated. The apparent Km for Val was determined to be 0.84 mm, and the apparent Km for Ile was determined to be 2.4 mm, clearly showing that Val is the preferred substrate of the A domain and explaining its dominating presence in the product, gramicidin. The truncated module 2, ATE, was found to activate Gly alone (100%) without any considerable side specificities. Module 3, CAT, activated Gly (100%), l-Ala (50%), and to a minor extent, l-Leu (7%), l-Pro (6%), and l-Val (5%). The apparent Km values for Gly and Ala were determined to be 2.2 and 1.5 mm, respectively, showing a preference of the A domain for Ala, which is solely found in the product at this position. E Domain of LgrA—As the alignment of all seven lgr-E domains shows (Fig. 4), the first E domain is not as well conserved as the others. Core motif 7 is missing, and motifs 1, 2, and 4 are only poorly conserved. The catalytically essential His (HHXISDG(WV)S) in core 2 (26Stachelhaus T. Walsh C.T. Biochemistry. 2000; 39: 5775-5787Crossref PubMed Scopus (139) Google Scholar) has been mutated to a Gln residue, suggesting this E domain to be inactive. This is supported by the fact that the cognate amino acid is Gly, which is achiral. Formylation Domain—The first 200 aa of LgrA show high similarity to methionin-tRNA-formyltransferases from other bacteria such as Bacillus anthraces (33% identity, 52% similarity), Fusobacterium nucleatum (34% identity, 58% similarity), or Clostridium tetanii (35% identity, 56% similarity; Fig. 5). However, the overall similarity with the putative F domain from Anabaena strain 90 (16Rouhiainen L. Paulin L. Suomalainen S. Hyytiainen H. Buikema W. Haselkorn R. Sivonen K. Mol. Microbiol. 2000; 37: 156-167Crossref PubMed Scopus (141) Google Scholar) is rather low (only 17.2% identity). However, we found 35% identity and 50% similarity when comparing only the amino acids 70–170 of both proteins (N-terminal region). Taking a closer look at this F domain of ApdA from Anabaena Strain 90 (size ∼550 aa), we found a stretch of about 270 aa at its C terminus, which clearly belongs to the C terminus of a poorly conserved C domain. From sequence analysis, we propose N10-formyl-tetrahydrofolate (formyl-THF) to be the C1 carrier that donates the formyl group for the N-formylation of the first amino acid Val. As shown in the alignment (Fig. 5), the formyl-THF-binding motif SLLP is present, located within the conserved N-terminal part as is the case for other formyltransferases (27Schmitt E. Blanquet S. Mechulam Y. EMBO J. 1996; 15: 4749-4758Crossref PubMed Scopus (60) Google Scholar). From the alignment (Fig. 5), we propose a core motif IN(VL)HXSLLPXXRG for F domains as well as formyltransferases that utilize formyl-THF. Future biochemical studies have to prove the function and mechanism of the F domain. Reductase Domain—The putative reductase domain has high similarity with several NADPH-dependent reductases from other NRPSs and polyketide

Referência(s)