Artigo Acesso aberto Revisado por pares

Three-dimensional Models of Proteases Involved in Patterning of the Drosophila Embryo

2003; Elsevier BV; Volume: 278; Issue: 13 Linguagem: Inglês

10.1074/jbc.m211820200

ISSN

1083-351X

Autores

Thierry Rose, Ellen K. LeMosy, Angelene M. Cantwell, Dolly Banerjee-Roy, James B. Skeath, Enrico Di,

Tópico(s)

Enzyme Production and Characterization

Resumo

Three-dimensional models of the catalytic domains of Nudel (Ndl), Gastrulation Defective (Gd), Snake (Snk), and Easter (Ea), and their complexes with substrate suggest a possible organization of the enzyme cascade controlling the dorsoventral fate of the fruit fly embryo. The models predict that Gd activates Snk, which in turn activates Ea. Gd can be activated either autoproteolytically or by Ndl. The three-dimensional models of each enzyme-substrate complex in the cascade rationalize existing mutagenesis data and the associated phenotypes. The models also predict unanticipated features like a Ca2+ binding site in Ea and a Na+ binding site in Ndl and Gd. These binding sites are likely to play a crucial role in vivo as suggested by mutant enzymes introduced into embryos as mRNAs. The mutations in Gd that eliminate Na+ binding cause an apparent increase in activity, whereas mutations in Ea that abrogate Ca2+ binding result in complete loss of activity. A mutation in Ea predicted to introduce Na+ binding results in apparently increased activity with ventralization of the embryo, an effect not observed with wild-type Ea mRNA. Three-dimensional models of the catalytic domains of Nudel (Ndl), Gastrulation Defective (Gd), Snake (Snk), and Easter (Ea), and their complexes with substrate suggest a possible organization of the enzyme cascade controlling the dorsoventral fate of the fruit fly embryo. The models predict that Gd activates Snk, which in turn activates Ea. Gd can be activated either autoproteolytically or by Ndl. The three-dimensional models of each enzyme-substrate complex in the cascade rationalize existing mutagenesis data and the associated phenotypes. The models also predict unanticipated features like a Ca2+ binding site in Ea and a Na+ binding site in Ndl and Gd. These binding sites are likely to play a crucial role in vivo as suggested by mutant enzymes introduced into embryos as mRNAs. The mutations in Gd that eliminate Na+ binding cause an apparent increase in activity, whereas mutations in Ea that abrogate Ca2+ binding result in complete loss of activity. A mutation in Ea predicted to introduce Na+ binding results in apparently increased activity with ventralization of the embryo, an effect not observed with wild-type Ea mRNA. Easter zymogen chymotrypsin activated Easter Gastrulation Defective zymogen activated Gastrulation Defective Nudel zymogen activated Nudel Snake zymogen activated Snake Spätzle thrombin trypsin Protein Data Bank Flybase Swiss Protein Several genes in the dorsal group (1LeMosy E.K. Hong C.C. Hashimoto C. Trends Cell Biol. 1999; 9: 102-107Google Scholar) are involved in extracellular events that lead to dorsoventral polarization of theDrosophila melanogaster embryo. nudel,pipe, and windbeutel are expressed by somatic follicle cells during mid-oogenesis, whereas easter,gastrulation defective, snake, andspätzle are expressed by the nurse cells and oocyte. These genes were identified in several large scale genetic screens for maternal effect mutations that cause homozygous mutant females to produce embryos with abnormal cell fates (2Anderson K.V. Nüsslein-Volhard C. Nature. 1984; 311: 223-227Google Scholar). Among the dorsal group genes, nudel, gastrulation defective,snake, and easter encode proteins containing serine protease domains of the trypsin family. Easter (Ea),1 Snk, Gd, and Ndl are expressed and secreted during oogenesis as inactive zymogens into a thin, fluid-filled perivitelline space that lies between the eggshell and the oocyte. Genetic and molecular studies suggest that these proteins act in a proteolytic cascade many hours later in the early embryo (1LeMosy E.K. Hong C.C. Hashimoto C. Trends Cell Biol. 1999; 9: 102-107Google Scholar, 3LeMosy E.K. Tan Y.Q. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 5055-5060Google Scholar, 4Han J.H. Lee S.H. Tan Y.Q. LeMosy E.K. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 9093-9097Google Scholar, 5Dissing M. Giordano H. DeLotto R. EMBO J. 2001; 20: 2387-2393Google Scholar). The cascade resembles in its general organization those controlling the innate immune response and blood coagulation (6Krem M.M. Di Cera E. Trends Biochem. Sci. 2002; 27: 67-74Google Scholar). Ovulation of the egg in some way triggers the self-activation of Ndl into Ndl*. Gd can be activated either by Ndl* or by self-activation in the presence of Snk. Subsequently, Gd* activates diffusible Snk and Snk* activates diffusible Ea. The result of this cascade is cleavage by Ea* of the diffusible dimeric nerve growth factor-like Spz (7DeLotto Y. DeLotto R. Mech. Dev. 1998; 72: 141-148Google Scholar). The processed Spz appears to function as a dimer to activate the transmembrane receptor Toll only on the embryo surface that will become ventralized through the Toll signaling pathway. In contrast to the significant knowledge garnered from previousin vivo studies, quantitative information on activity and specificity of various members of the cascade has so far eluded characterization involving purified proteins. Several questions remain regarding the activation of Ndl and Gd (1LeMosy E.K. Hong C.C. Hashimoto C. Trends Cell Biol. 1999; 9: 102-107Google Scholar, 3LeMosy E.K. Tan Y.Q. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 5055-5060Google Scholar, 4Han J.H. Lee S.H. Tan Y.Q. LeMosy E.K. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 9093-9097Google Scholar, 5Dissing M. Giordano H. DeLotto R. EMBO J. 2001; 20: 2387-2393Google Scholar) and the specificity of Gd* and Snk*. Elucidation of these timely and important questions would benefit from the knowledge of the structural organization of the enzymes involved in the cascade. However, none of the members of the cascade has been crystallized so far or even expressed successfully for detailed in vitro characterization. Hence, we felt that the construction of three-dimensional models of Ndl*, Gd*, Snk*, and Ea* in complex with their targets could fill a critical structure-function gap in the field as recently shown for thrombin interactions with the platelet receptors (8Ayala Y.M. Cantwell A.M. Rose T. Bush L.A. Arosio D. Di Cera E. Proteins. 2001; 45: 107-116Google Scholar) and fibrinogen (9Rose T. Di Cera E. J. Biol. Chem. 2002; 277: 18875-18880Google Scholar). The value of these models stems from their timeliness and the new insight offered for future mutagenesis studies, as illustrated in the present work by the effect on embryo polarization when putative cation binding sites of examined proteases were mutated. The fly sequences came from the strain Berkeley in the Flybase (FB) and Swiss Protein (SP) databases: Ea (FBgn0000533, SP-P13582), Gd (FBgn0000808, SP-O62589), Ndl (FBgn0002926, SP-P98159), and Snk (FBgn0003450, SP-P05049). These sequences were aligned with 1800 serine proteases from the trypsin family pulled from the non-redundant data base at the National Center for Biotechnology Information (NCBI) (National Library of Medicine, National Institutes of Health, Bethesda, MD) and the Flybase image at the NCBI using trypsin homologues as seeds with the BLAST program and aligned together with ClustalX as described recently (10Rose T. Di Cera E. J. Biol. Chem. 2002; 277: 19243-19246Google Scholar). Sequences were clustered into 100 groups from a neighbor junction tree accounting 500 bootstraps with ClustalX. One hundred sequences were selected, one per cluster. Three-dimensional models of 70 structures were built by comparative modeling based on 12 of 20 crystal structures of serine proteases used in the sequence core. These models were used to refine the alignment of the 100-sequence core (10Rose T. Di Cera E. J. Biol. Chem. 2002; 277: 19243-19246Google Scholar). The theoretical three-dimensional models of activated protease domains Ndl* (central or Ndl1*-(1146–1385) and C-terminal or Ndl2*-(2017–2616)); Gd*-(256–528); Snk*-(191–430); and Ea*-(127–392) were constructed by comparative modeling using the program Modeller 4 (11Sali A. Blundell T.L. J. Mol. Biol. 1993; 234: 779-815Google Scholar). The following crystal structures of serine proteases downloaded from the Protein Data Bank (PDB) (12Berman H.M. Westbrook J. Feng Z. Gilliland G. Bhat T.N. Weissig H. Shindyalov I.N. Bourne P.E. Nucleic Acids Res. 2000; 28: 235-242Google Scholar) were used as templates: trypsin (PDB code 1tld, 1.50 Å-resolution); chymotrypsin (PDB code 4cha, 1.68 Å); tPA (PDB code 1rtf, 2.30 Å); plasmin (PDB code 1bui, 2.65 Å); plasma kallikrein (PDB code 2pka, 2.05 Å); thrombin (PDB code 1ppb, 1.92 Å); factor Xa (PDB code 1hcg, 2.20 Å); factor IXa (PDB code 1rfn, 2.80 Å); factor VIIa (PDB code1dan, 2.00 Å); and activated protein C (PDB code 1aut, 2.80 Å). These proteases were chosen because they span the breadth of diversity of trypsin-related domains and regulatory cation binding sites. Alignments extracted from the 100-sequence core were optimized manually during preliminary comparative modeling processes according to the distance violation from templates provided by the Modeller program output files. Two hundred models were built for each protease with different frameshifts of alignment in the poorly conserved loops and different seeds for the number generator. Models were then checked and ranked for stereochemistry, structural topology features, and amino acid spatial distribution with Procheck (13Laskowski R.A. Moss D.S. Thornton J.M. J. Mol. Biol. 1993; 231: 1049-1067Google Scholar), WhatCheck (14Hooft R.W.W. Vriend G. Sander C. Abola E.E. Nature. 1996; 381: 272Google Scholar), and Verify3D (15Luthy R. Bowie J.U. Eisenberg D. Nature. 1992; 356: 83-85Google Scholar). Conformers in the same clusters were pooled when the root mean square deviations of their backbone was 0.50, wherer i is the percentage of overall solvent accessibility and r sc,i is the percentage of side chain solvent accessibility. Ca2+ and Na+ binding sites were identified with the program VALE (16Nayal M. Di Cera E. Proc. Natl. Acad. Sci. U. S. A. 1994; 88: 817-821Google Scholar) using a grid of 0.1 Å, water molecule radius of 1.4 Å, and a minimum threshold for the sum of oxygen-cation bond-strength contributions of 0.8. Three-dimensional models of protease-substrate complexes were built by comparative modeling in a thorough or quick mode. In the thorough mode, the protease-fragment complexes were threaded over thrombin-peptide crystal structures (8Ayala Y.M. Cantwell A.M. Rose T. Bush L.A. Arosio D. Di Cera E. Proteins. 2001; 45: 107-116Google Scholar, 9Rose T. Di Cera E. J. Biol. Chem. 2002; 277: 18875-18880Google Scholar). We used the following templates: peptide-Ac-DFLAEGGGVR from PDB (1bbr and 1ucy); PPACK from PDB (1ppb); hirugen peptide-NGDFEEIPEEYL from PDB (1hah); and peptide-LDPR from PDB (1nrs). 50 three-dimensional models were built and ranked in terms of stereochemistry quality and lowest potential binding energies. The accepted computer-generated models of protease-peptide substrate complexes had root mean square deviations of <1.5 Å for the protease backbone and peptide residues <10 Å from protease residues. Models containing a ligand with root mean square deviations of <2 Å from a higher ranked model were discarded. We selected the best ten models, extracted the ligand, docked it on the best free protease three-dimensional model as a starting point for a new modeling process, and optimized the best complex as described above for the free proteases. The thorough mode screened 1 of 50 three-dimensional models of enzyme-target peptide complexes and was used to screen every putative activation cleavage site of zymogens with every selected protease to assess activator-activated pairs. The quick mode was used to screen possible cleavage sites all along sequence targets in and out of the catalytic domain. We used only one of seven peptide three-dimensional models to template the position P1–P11, chosen according to the length of the loop between P1 and the closest hydrophobic side chain from P4 to P10 (seven possibilities). Five three-dimensional models of the complex were provided by Modeler runs, and then the best one was minimized as in the thorough mode. We examined the relative binding free energies of substrates on proteases by applying an empirical method on bound and free components. We used the potential energy of the system as an enthalpy term (force field CFF91), a conformational entropy term based on solvent-accessible surface area (SASA) of residues and a hydration free energy term based on finite difference approximation of the Poisson-Boltzmann equation. The predicted free energy of association between receptor (R) and peptide (P), ΔG, was calculated considering that free R and P have the same conformation as in the complex RP from ΔG = ΔG RP − ΔG R − ΔG P with ΔG x = ΔG x,gas(ε=1) − ΔG x,hyd(ε=80) and x =RP, R, or P. The value of ΔG x,gas was calculated from its enthalpic and entropic contributions expressed as ΔG x,gas = ΔH x,gas −TΔS x,gas with ΔH x,gas = E x,vdw +E x,coul and ΔS x,gas = ΔS x,conf,gas + ΔS x,rt,gas + ΔS x,vib,gas. The enthalpy ΔH x,gas is a function of the van der Waals (E vdw) and coulombic (E coul) components, whereas ΔS x,gas is defined in terms of the rotational, configurational, and vibrational components.E vdw and E coul were computed from the CFF91 force-field without cut-off with ε = 2. The value of the conformational entropy ΔS x,conf,gas was computed from the loss of side and main chain rotation freedom using the definition as shown in Equation 1,TΔSX,conf,gas=TΔSX,confsc,gas+TΔSX,confmc,gas=∑iΔf1(rsc,i)TΔSi+∑iRTlnΔf2((ri−1+ri+ri+1)/3)ρiEquation 1 wheref 1(r sc,i) =r sc,i8/(r sc,i8+0.5) and r sc,i are the relative accessibility of thei th residue side chain, r sc,i= SASAsc,i/SASAsc,i,GXG. SASAsc,i,GXG refers to the side chain solvent-accessible surface area of amino acidX in the tripeptide Gly-X-Gly. The empirical scales of side chain rotation freedom, Δs i were taken from Pickett and Sternberg (17Pickett S.D. Sternberg M.J.E. J. Mol. Biol. 1993; 231: 825-839Google Scholar). The functionf 1 decreases the entropy values when the accessibility of the side chain is <50%. The loss of freedom of residue i main chain dihedral angles ϕ and ψ was roughly considered as a function of the steric hindrance around residuei − 1 to i + 1, affecting the access of allowed and core region in the Ramachandran graph.r i is the smallest value of SASAmc,i/SASAmc,i,GXG (accessibility of the main chains only) or SASAi/SASAi,GXG (overall accessibility) weighted by the attenuation functionf 2(x) =x 7/(x 7 + 0.5). The accessible area fraction, ρi was fixed for each residue dihedral pair ϕ and ψ of X from the tripeptide Ala-X-Ala in the Ramachandran graph: 0.28 forX = Pro; 0.56 for X = Gly; and 0.40 for all other amino acids according to the allowed and core region in Procheck graphs (13Laskowski R.A. Moss D.S. Thornton J.M. J. Mol. Biol. 1993; 231: 1049-1067Google Scholar). The size of the different ligands is very similar, and the resulting loss of rotational and translational entropy upon binding ΔS rt,gas between different ligands is negligible. For 25-residue peptides associated to proteases modeled by quick and thorough mode TΔS rt,gas was ∼18–20 kcal/mol at 298 K (18Finkelstein A.V. Janin J. Protein Eng. 1989; 3: 1-3Google Scholar). ΔS vib,gas was not computed in the absence of experimental data on the examined structures or their normal mode vibrations. The main modes of vibrations are weakly affected for peptides of similar length, targeting the same site of a protease in a slightly different conformation. The values of ΔG x,hyd were calculated from their electrostatic energy G e and non-polar energy of hydrationG n as ΔG hyd = ΔG e + ΔG n. The electrostatic energies ΔG e were computed using the finite difference Poisson-Boltzmann method implemented in the program DelPhi (19Nicholls A. Sharp K.A. Honig B. Proteins. 1991; 11: 281-296Google Scholar) averaged from eight 1-Å resolution grids decayed by 0.5 Å in one, two, or three of the x, y, and z directions. The choice of the grid position and resolution affects final values (the mean ± S.D. is 0.8–1.8 kcal/mol). G e values were computed for the transfer of the solute in water from ε = 2.0 to 80.0,G e(80.0,2.0), and then in gas from ε = 2.0 to 1.0, G e(1.0,2.0), as ΔG e =G e(80.0,2.0) − G e(1.0,2.0). The radius was fixed to 1.4 Å for solvent molecules and 2 Å for ions. Ionic strength was set at 145 mm, and the protonation state and partial charge distribution were assigned by the program Biopolymer according to the pH fixed at 7.0. The non-polar contributionG n was considered as linearly dependent on the molecule solvent-accessible surface area using a surface tension coefficient of 25 cal/mol/Å2 (20Sharp K.A. Proteins. 1998; 33: 39-48Google Scholar), i.e.ΔG n = 25 ΔSASA. Based on the above definitions, the free energy for the receptor-peptide complex becomes ΔG = ΔH gas − TΔS rt,gas − TΔS vib,gas − TΔS conf,gas + ΔG hyd. Some of the terms cancel if we compare the association of same length peptides bound to the same protease. The approximation of the relative binding free energy is given by ΔΔG ∼ ΔΔH gas − TΔΔS conf,gas + ΔΔG hyd. This approach does not allow comparison of the binding of a peptide to two different proteases unless the vibrational entropy variation upon binding is comparable. The ΔΔG values refer to selected conformations and are affected by the choice of the “best model” according to global potential energy of the system and the goodness of its stereochemistry. The mean ± S.D. is ∼2.4 kcal/mol between the ΔΔGof the 10 best models of Snk* when it is bound to the activation site of Ea. Lower deviations were estimated as 1.2 kcal/mol for Ea* with Spz peptide, 1.7 kcal/mol for Ndl1* with Gd peptide, and 1.4 kcal/mol for Gd* with Snk peptide. The plasmid pNB-GD2 containing a full-length gd cDNA was obtained from J. L. Marsh (University of California, Irvine, CA) (21Konrad K.D. Goralski T.J. Mahowald A.P. Marsh J.L. Proc. Natl. Acad. Sci. U. S. A. 1998; 95: 6819-6824Google Scholar). The plasmid pGEM7Zf(+) containing a full-length ea cDNA was obtained from K. V. Anderson (Sloan-Kettering Institute) (22Chasan R. Anderson K.V. Cell. 1989; 56: 291-400Google Scholar). Mutations were introduced using the QuikChange Exchange kit (Stratagene). We mutated Phe-225 to Ala and Pro in the putative Na+ binding site of Gd. We also mutated separately Phe-225 to Ile, Ser, and Tyr to create the putative Na+ binding site of Ea, and we mutated Glu-70 to Ala and Lys in the putative Ca2+ binding site of Ea. mRNAs encoding wild-type and mutant Gd and Ea were transcribed from plasmids by using the SP6 mMessage mMachine kit (Ambion, Austin, TX) and were dissolved in water in a range of concentration from 0.06 to 1 mg/ml as estimated by UV absorbance (4Han J.H. Lee S.H. Tan Y.Q. LeMosy E.K. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 9093-9097Google Scholar). The mutations and allelic combinations used here were described previously:gd 7/gd 7 (23Konrad K.D. Goralski T.J. Mahowald A.P. Dev. Biol. 1988; 127: 133-142Google Scholar) andea 4/ea 5022rx1 (24Jin Y.S. Anderson K.V. Cell. 1990; 60: 873-881Google Scholar). Embryos (0.5–1.5 h post-fertilization) were injected centrally at 40–60% egg-length after the removal of the outer eggshell layer according to a standard procedure (2Anderson K.V. Nüsslein-Volhard C. Nature. 1984; 311: 223-227Google Scholar). Injected embryos were visually examined during gastrulation, and their cuticles were prepared for examination as described previously (25Anderson K.V. Jürgens G. Nüsslein-Volhard C. Cell. 1985; 42: 779-789Google Scholar, 26Jürgens G. Nüsslein-Volhard C. Roberts D.B. Drosophila: A Practical Approach. IRL Press at Oxford University Press, Oxford, United Kingdom1986: 199-227Google Scholar). The injection of mRNAs encoding wild-type Gd or Ea was used as positive controls (4Han J.H. Lee S.H. Tan Y.Q. LeMosy E.K. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 9093-9097Google Scholar). Zymogens Gd (amino acid 528), Snk (amino acid 430), and Ea (amino acid 392) are organized in three domains: an N-terminal signal that is cleaved during protein secretion and a zymogen that gives rise to A (N-terminal) and catalytic B (C-terminal) chains (Fig.1). The A and B chains remain covalently linked through disulfide bridges after proteolytic activation. The topology of Ndl is more complex and unusual because it carries two S1a protease domains. The first catalytic domain (Ndl1*-(1145–1385)) is central, and the second (Ndl2*-(2017–2616)) is C-terminal. Eleven low density lipoprotein (LDL) receptor-binding repeats intercalate the two protease domains (27Willnow T.E. Orth K. Herz J. J. Biol. Chem. 1994; 269: 15827-15832Google Scholar). Four LDL receptor repeats are inserted in the second protease catalytic domain. To locate cleavage positions (↓) of signal peptides, we used the program SignalP (28Nielsen H. Engelbrecht J. Brunak S. von Heijne G. Int. J. Neural. Syst. 1997; 8: 581-599Google Scholar). This yielded the following sites of cleavage: Ndl 1–47↓48–2616 (VYH↓GL, score 0.54, threshold 0.48); Gd 1–19↓20–528 (TKA↓VA, score 0.85, threshold 0.48); Snk 1–27↓28–430 (LEA↓LD, score 0.75, threshold 0.48); Ea 1–21↓22–392 (SAG↓QF, score 0.82, threshold 0.48), and Spz 1–25↓26–326 (YEA↓KE, score 0.93, threshold 0.48). Alignment of Ea, Snk, Gd, and Ndl with other serine proteases suggests the following cleavage sites for zymogen activation: Ndl1* 48–1144↓1145–2616 (GDGR↓IVGG; trypsin-like cleavage); Snk* 28–183↓184–430 (SVPL↓IVGG; chymotrypsin-like cleavage); and Ea* 22–127↓128–392 (LSNR↓IYGG; trypsin-like cleavage) (Fig.1). The predicted underivatized Ea* A chain (106 amino acids, theoretical mass 12,086 Da), Ea* B chain (265 amino acids, theoretical mass 28,951 Da), Snk* A chain (156 amino acids, theoretical mass 17,372 Da), and Snk* B chain (247 amino acids, theoretical mass 27,319 Da) agree with Western blots described by Dissing et al. (5Dissing M. Giordano H. DeLotto R. EMBO J. 2001; 20: 2387-2393Google Scholar). In the case of Gd, no basic or hydrophobic residue occupies the canonical position, and the closest putative cleavage site is either 30 or 22 residues upstream, specifically Gd* 20–211↓212–528 (GEPK↓SSDG; trypsin-like cleavage) or Gd* 20–220↓221–528 (TSPV↓FVDD; chymotrypsin-like cleavage). The fragment 212–528 expressed in S2 insect cells is an active protease (3LeMosy E.K. Tan Y.Q. Hashimoto C. Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 5055-5060Google Scholar). DeLotto (29DeLotto R. EMBO Rep. 2001; 2: 721-726Google Scholar) proposed the cleavage site Gd* 20–136↓137–528 (EHIR↓KLSF; trypsin-like cleavage) located 83 residues upstream of the canonical activation site. The proposed cleavage site is 12 residues upstream of a type A von Willebrand repeat motif (LLLDXXEXXVRXXD) as described for complement factor B and C2. With such a cleavage, the predicted underivatized Gd* A chain (116 amino acids, theoretical mass 13,396 Da) and Gd* B chain (390 amino acids, theoretical mass 43579 Da) agree with the Western blots described by Dissing et al. (5Dissing M. Giordano H. DeLotto R. EMBO J. 2001; 20: 2387-2393Google Scholar). The alignment of Spz with the protease activation sites proposes 26–220↓221–326 (VSSR↓VGGS; trypsin-like cleavage) as the best site of cleavage to produce the fragments documented by SDS-polyacrylamide gel electrophoresis (5Dissing M. Giordano H. DeLotto R. EMBO J. 2001; 20: 2387-2393Google Scholar). Ndl* could also be processed to release only the central catalytic domain Ndl1* (241 amino acids, theoretical mass 26,862 Da) by a second cleavage 1145–1385↓1386–2616 (TTPR↓LLPK; trypsin-like cleavage) as shown by LeMosy et al. (30LeMosy E.K. Kemler D. Hashimoto C. Development. 1998; 125: 4045-4053Google Scholar). The cleavage site 1386–2016↓2017–2616 (NLMR↓LLNV; trypsin-like cleavage) is also detectable in the C-terminal domain Ndl2 (600 amino acids). Ea and Snk show 25% identity overall, 33% within the B chain, and feature the same potential disulfide bridges (Fig. 1). Alignment with other proteases suggests that only one disulfide bridge,1-122 in the chymotrypsin numbering, 2Underlined numbers refer to positions aligned with chymotrypsin(ogen). Non-underlined positions refer to the corresponding zymogen precursor sequence. links the A and B chains. The disulfide bonds 42-58,168-182, and 191-220 are highly conserved in the catalytic B chain of serine proteases. Gd is proposed to retain the disulfide bonds 42-58 and168-182 as well as 1-122linking the A and B chains (Fig. 1). Ndl could have five disulfide bonds in Ndl1* (1-122 between the A and B chains and 42-58, 136-201,168-182, and 191-220within the B chain). Ndl2* features only the42-58 disulfide bond within the B chain and1-122 between the A and B chains (Fig. 1). Ndl1*-(1145–1385), Ndl2*-(2195–2616), Gd*-(212–528), Snk*-(184–430), and Ea*-(128–392) were folded using the trypsin scaffold (CATH 2.40.10.20; SCOP B.47.1.1) with two orthogonal six-stranded β-barrels flanking the active site groove hosting the catalytic triad (Fig. 2). Insertions or deletions relative to chymotrypsin occur in loops at the protein surface and outside the active site. Although the identity between Ndl2* and other trypsin-like proteases is low, it spreads uniformly among all domains and especially at the level of the two β-barrels. Ndl2* features an unusual catalytic triad, where the nucleophile Ser-195 is coupled to Glu-102 and Ser-57 that replace the canonical Asp-102 and His-57. There is no other example of a Ser-Glu-Ser catalytic triad among 1800 other serine proteases in the NCBI (www.ncbi.nlm.nih.gov) and MEROPS (www.merops.co.uk) databases, which suggests that Ndl2* may not participate in the cascade as an active protease. The LDL domain is inserted away from the potential active site in the 186-loop, where insertions of various length also exists in thrombin and tissue- plasminogen activator. Fig. 2 displays the water-accessible surfaces of Ndl1*, Gd*, Snk*, and Ea*. The overall architecture of the active site is similar in all models, but their surfaces show notable differences in amino acid composition. The four proteases feature the catalytic triad His-57, Asp-102, Ser-195, and the important ancillary residues Cys-42 and Cys-58(SS-linked), Gly-193, Gly-196, Gly-211, and Ser-214. The Cys-168/Cys-182 disulfide bond stabilizes the intervening loop that forms part of the binding site and is conserved in all four proteases. The Cys-191/Cys-220disulfide bond is present in Ndl1*, Snk*, and Ea* but not in Gd*. This bond bridges the 186-loop and 220-loop that shape the bottom of the primary specificity pocket. Binding site pockets around residue 189 and the hydrophobic core around residues 99, 174, and 215differ among the four proteases. The presence of Asp-189 in the S1 (31Schechter I. Berger A. Biochem. Biophys. Res. Commun. 1967; 27: 157-162Google Scholar) pocket shows that specificity is unambiguously trypsin-like for Ndl1* and Ea*, whereas Ser-189 suggests a chymotrypsin-like specificity for Gd*. The presence of Gly-189 in Snk* makes the prediction of specificity ambiguous. The shape and volume of the S1 pocket in Snk* could accommodate a variety of side chains. Leukocyte elastase, which carries Gly189 and cleaves after Val in P1, shows a 23% identity with Snk* in the catalytic B chain. The structure of elastase (PDB code 1ppg) complexed with the tetrapeptide AAPV (32Wei A.Z. Mayr I. Bode W. FEBS Lett. 1988; 234: 367-373Google Scholar) shows that Val-190 defines the S1 specificity toward hydrophobic P1 residues. In the model of Snk*, the unusual His-190(His-371) points out of the S1 pocket and interacts with Asp-194 (Asp-375), thereby leaving the S1 pocket free to interact with a variety of side chains besides hydrophobic residues. We predicted the position of potential protease targets in every protease sequence based on 25-resi

Referência(s)