Intrinsic Protein Disorder, Amino Acid Composition, and Histone Terminal Domains
2005; Elsevier BV; Volume: 281; Issue: 4 Linguagem: Inglês
10.1074/jbc.r500022200
ISSN1083-351X
AutoresJeffrey C. Hansen, Xu Lu, Eric D. Ross, Robert W. Woody,
Tópico(s)RNA and protein synthesis mechanisms
ResumoCore and linker histones are the most abundant protein components of chromatin. Even though they lack intrinsic structure, the N-terminal “tail” domains (NTDs) of the core histones and the C-terminal tail domain (CTD) of linker histones bind to many different macromolecular partners while functioning in chromatin. Here we discuss the underlying physicochemical basis for how the histone terminal domains can be disordered and yet specifically recognize and interact with different macromolecules. The relationship between intrinsic disorder and amino acid composition is emphasized. We also discuss the potential structural consequences of acetylation and methylation of lysine residues embedded in intrinsically disordered histone tail domains. Core and linker histones are the most abundant protein components of chromatin. Even though they lack intrinsic structure, the N-terminal “tail” domains (NTDs) of the core histones and the C-terminal tail domain (CTD) of linker histones bind to many different macromolecular partners while functioning in chromatin. Here we discuss the underlying physicochemical basis for how the histone terminal domains can be disordered and yet specifically recognize and interact with different macromolecules. The relationship between intrinsic disorder and amino acid composition is emphasized. We also discuss the potential structural consequences of acetylation and methylation of lysine residues embedded in intrinsically disordered histone tail domains. The core (H2A, H2B, H3, H4) and linker (H1 family) histones make up the fundamental protein components of chromatin fibers (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar). The N-terminal “tail” domains (NTDs) 2The abbreviations used are: NTD, N-terminal “tail” domain; CTD, C-terminal tail domain; IDP, intrinsically disordered protein. of the core histones and the C-terminal tail domain (CTD) of linker histones are intrinsically disordered, yet they are able to bind to many different macromolecular partners in chromatin. For example, the histone H3 and H4 NTDs interact with sites on other nucleosomes during chromatin condensation (3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar) and bind to proteins such as Sir3p (5Hecht A. Laroche T. Strahl-Bolsinger S. Gasser S.M. Grunstein M. Cell. 1995; 80: 583-592Abstract Full Text PDF PubMed Scopus (694) Google Scholar) and p300 (6An W. Roeder R.G. J. Biol. Chem. 2003; 278: 1504-1510Abstract Full Text Full Text PDF PubMed Scopus (23) Google Scholar). The H1 CTD interacts with linker DNA in a chromatin fiber (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar) and also binds to proteins such as DFF/40CAD (7Widlak P. Kalinowska M. Parseghian M.H. Lu X. Hansen J.C. Garrard W.T. Biochemistry. 2005; 44: 7871-7878Crossref PubMed Scopus (54) Google Scholar). This article focuses on the roles of intrinsic protein disorder in histone function. We highlight recent findings indicating that amino acid composition is the key determinant of molecular recognition by the histone tail domains and other intrinsically disordered protein regions. We also discuss how acetylation and methylation of lysine residues may modulate macromolecular interactions by altering the local physicochemical properties of intrinsically disordered histone domains. Proteins (or sizeable regions of proteins) that lack a well defined conformation under native conditions are referred to as “intrinsically disordered.” Many intrinsically disordered proteins (IDPs) are functional, adopting a well defined conformation upon interacting with a target molecule. Thus, the principle that protein function requires a well defined conformation must be modified; an isolated protein need not have a unique conformation, but the protein-target complex must. A corollary is that if the protein in question interacts with more than one target, it may adopt a corresponding number of different conformations. One of the more surprising aspects of IDPs is their ubiquity, especially in eukaryotes. The most rigorous analysis to date predicts that long IDP regions are found in an average of 33% of eukaryotic proteins but in only 2% of archaeal and 4% of eubacterial proteins (8Ward J.J. Sodhi J.S. McGuffin L.J. Buxton B.F. Jones D.T. J. Mol. Biol. 2004; 337: 635-645Crossref PubMed Scopus (1613) Google Scholar). Conformational adaptability is generally considered to be one of the driving forces for the evolution of IDPs. IDPs have several features that distinguish them from classical globular proteins. Experimentally, IDPs are recognized by a far-UV CD spectrum characteristic of unordered proteins: sharp peaks in NMR, low dispersion of chemical shifts, negative 1H-15N nuclear Overhauser effects, a radius of gyration or hydrodynamic radius comparable with that of the protein in concentrated urea or guanidinium chloride, and a marked susceptibility to proteases (9Dunker A.K. Lawson J.D. Brown C.J. Wlliams R.M. Romero P. Oh J.S. Oldfield C.J. Campen A.M. Ratliff C.M. Hipps K.W. Ausio J. Nissen M.S. Reeves R. Kang C. Kissinger C.R. Bailey R.W. Griswold M.D. Chiu W. Garner E.C. Obradovic Z. J. Mol. Graph. Model. 2001; 19: 26-59Crossref PubMed Scopus (1853) Google Scholar, 10Uversky V.N. Protein Sci. 2002; 11: 739-756Crossref PubMed Scopus (1514) Google Scholar). The absence of order in a crystal structure often is indicative of an intrinsically disordered domain as well. IDPs can be predicted from amino acid sequence data with good accuracy. In one study of more than 900 nonhomologous proteins, predictions of disordered regions more than 40 residues in length gave less than 6% false positives (11Dunker A.K. Brown C.J. Lawson J.D. Iakoucheva L.M. Obradovic Z. Biochemistry. 2002; 41: 6573-6582Crossref PubMed Scopus (1491) Google Scholar). Predictive algorithms score sequences according to the flexibility, hydropathy, charge, and other physicochemical properties of the amino acid residues (12Uversky V.N. Gillespie J.R. Fink A.L. Proteins Struct. Funct. Genet. 2000; 41: 415-427Crossref PubMed Scopus (1773) Google Scholar, 13Romero P. Obradovic Z. Li X. Garner E.C. Brown C.J. Dunker A.K. Proteins Struct. Funct. Genet. 2001; 42: 38-48Crossref PubMed Scopus (1367) Google Scholar, 14Vucetic S. Brown C.J. Dunker A.K. Obradovic Z. Proteins Struct. Funct. Genet. 2003; 52: 573-584Crossref PubMed Scopus (324) Google Scholar, 15Jones D.T. Ward J.J. Proteins Struct. Funct. Genet. 2003; 53: 573-578Crossref PubMed Scopus (178) Google Scholar). Compositional bias is a common feature of IDPs (9Dunker A.K. Lawson J.D. Brown C.J. Wlliams R.M. Romero P. Oh J.S. Oldfield C.J. Campen A.M. Ratliff C.M. Hipps K.W. Ausio J. Nissen M.S. Reeves R. Kang C. Kissinger C.R. Bailey R.W. Griswold M.D. Chiu W. Garner E.C. Obradovic Z. J. Mol. Graph. Model. 2001; 19: 26-59Crossref PubMed Scopus (1853) Google Scholar, 14Vucetic S. Brown C.J. Dunker A.K. Obradovic Z. Proteins Struct. Funct. Genet. 2003; 52: 573-584Crossref PubMed Scopus (324) Google Scholar). Some amino acids are substantially more abundant in IDPs than in the average folded protein, whereas others are rare or absent. The bias generally favors hydrophilic amino acids and discriminates against hydrophobic residues. Thus, IDPs are generally rich in Arg, Gln, Glu, Lys, Pro, and Ser. They are deficient in Cys, Ile, Leu, Phe, Trp, Tyr, and Val. The other amino acids are present at levels comparable with those in the average folded protein (Met, Thr) or are enriched in some IDPs and depleted in others (Ala, Asn, Asp, Gly, His). As will be discussed below, there is now compelling evidence indicating that the relationship between amino acid composition and IDPs is far more complex than the simple trends described above. The compositional bias of IDPs accounts for their inability to fold; the paucity of hydrophobic groups precludes the formation of a hydrophobic core about which the chain can fold. Further, many IDPs have a large excess of basic or acidic amino acids and hence are highly charged at neutral pH. The charge on such proteins acts to destabilize a compact structure. Interaction with target proteins or nucleic acids overcomes these problems, allowing the IDP to undergo a concerted folding-binding process. Hydrophobic groups of the IDP are buried in the IDP-target interface, interacting with exposed hydrophobic groups of the target. The target usually has a charge opposite to that of the IDP, at least locally, leading to a lower charge density in the complex. What advantages of IDPs account for their abundance, especially in eukaryotes? 1) The coupling of folding and binding provides enhanced specificity at the expense of binding affinity (16Spolar R.S. Record Jr., M.T. Science. 1994; 263: 777-784Crossref PubMed Scopus (1373) Google Scholar, 17Kriwacki R.W. Hengst L. Tennant L. Reed S.I. Wright P.E. Proc. Natl. Acad. Sci. U. S. A. 1996; 93: 11504-11509Crossref PubMed Scopus (490) Google Scholar). The negative ΔS of folding is paid for by a large negative ΔH of binding. 2) The binding energy for an IDP is more favorable than for a compact protein with the same number of residues because the area of the interface is substantially larger for the IDP (18Gunasekaran K. Tsai C.J. Kumar S. Zanuy D. Nussinov R. Trends Biochem. Sci. 2003; 28: 81-85Abstract Full Text Full Text PDF PubMed Scopus (288) Google Scholar). 3) The flexibility of an IDP allows it to bind a number of different targets. 4) Flexibility also permits more rapid binding to the target through the mechanism of “flycasting” (19Shoemaker B.A. Portman J.J. Wolynes P.G. Proc. Natl. Acad. U. S. A. 2000; 97: 8868-8873Crossref PubMed Scopus (850) Google Scholar). An IDP is extended and thus presents multiple points of attachment for the bimolecular step of encounter complex formation. The encounter complex can then undergo rapid unimolecular steps to the stable complex. 5) Flexibility permits ready access of a side chain to modifying enzymes and to targets that recognize the modification. 6) A flexible, extended structure can be rapidly degraded by intracellular proteases, providing a facile pathway for down-regulation. Susceptibility to proteases is also a potential disadvantage for IDPs. Survival in the cell implies that IDPs must be complexed to targets most of the time. The coupled process by which an IDP folds and binds to its target bears some resemblance to the induced fit concept of enzyme-substrate binding and allostery. However, in induced fit, ligand binding perturbs an equilibrium between two compact, well defined protein conformations, whereas binding of an IDP to a target involves a disorder → order transition of the IDP concomitant with formation of a macromolecular complex. The target may have a compact, well defined conformation, or it may be an IDP itself. In the final state, the IDP-target complex has a compact structure with classical protein folds (at least as a core), possibly with one or more flexible appendages. Most of the functions of IDPs are related to molecular recognition of DNA, RNA, and other proteins. Fully or partially disordered proteins are especially common in processes such as transcription, cell cycle regulation, signal transduction, and chaperoning the folding of proteins and RNA (20Tompa P. FEBS Lett. 2005; 579: 3346-3354Crossref PubMed Scopus (595) Google Scholar, 21Dyson H.J. Wright P.E. Nat. Rev. Mol. Cell Biol. 2005; 6: 197-208Crossref PubMed Scopus (3038) Google Scholar, 22Fink A.L. Curr. Opin. Struct. Biol. 2005; 15: 35-41Crossref PubMed Scopus (606) Google Scholar). Partially disordered regions are commonly found at the amino and carboxyl ends of proteins but can be present at internal sites as well. IDPs have been grouped into two main categories based on function: mediators of macromolecular interactions and entropic connectors/springs (20Tompa P. FEBS Lett. 2005; 579: 3346-3354Crossref PubMed Scopus (595) Google Scholar). Because of their ubiquity, other functions are likely to be identified as well. Linker histones comprise a family of nucleosome-binding proteins that stabilize condensed chromatin and regulate genome function (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar, 23Bustin M. Catez F. Lim J.H. Mol. Cell. 2005; 17: 617-620Abstract Full Text Full Text PDF PubMed Scopus (188) Google Scholar). The linker histones of most eukaryotes have a very simple domain organization, consisting of a central winged helix fold, a short N-terminal extension, and a long basic C-terminal domain (Fig. 1). Little is known about the NTD region. The winged helix domain interacts with nucleosomes (1Wolffe A. Chromatin: Structure and Function. 3rd Ed. Academic Press, San Diego1998Google Scholar, 2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar). The CTD is ∼100 residues in length, enriched in Lys, Ala, and Pro, and unstructured in aqueous solution (24van Holde K.E. Chromatin. Springer-Verlag, New York1988Google Scholar). The determinants required to stabilize chromatin fibers in highly condensed conformations lie in the CTD (25Allan J. Mitchell T. Harborne N. Bohm L. Crane-Robinson C. J. Mol. Biol. 1986; 187: 591-601Crossref PubMed Scopus (263) Google Scholar, 26Lu X. Hansen J.C. J. Biol. Chem. 2004; 279: 8701-8707Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar). There are six somatic linker histone isoforms in most higher eukaryotes. Although the primary sequence of the isoform CTDs has diverged (24van Holde K.E. Chromatin. Springer-Verlag, New York1988Google Scholar), the amino acid composition of the CTDs is surprisingly similar (Table 1). Each of the CTDs consists of ∼40% Lys, ∼20–35% Ala, and ∼15% Pro. Ser, Thr, Gly, and Val are present in all isoform CTDs in smaller, variable amounts. His, Tyr, Trp, Met, and Cys are never found in any of the isoform CTDs, and the other seven amino acids are sporadically present once or twice in a particular CTD. Val is the only hydrophobic amino acid found in all CTDs. The characteristic amino acid composition of the linker histone CTDs suggests that this domain functions as an IDP region. Recent experimental evidence supports this idea and has focused attention on the relationship between intrinsic disorder and amino acid composition.TABLE 1Amino acid composition of linker histone CTDs, core histone NTDs, macroH2A connector region, and yeast prion domainsIDP regionResiduesaThe start methionines have been excluded.Lys DbD, disorder-producing; N, neutral; O, order-producing. See Ref. 43.Pro DGly DArg DAsn DGln DSer DGlu DAsp DMet DAla NThr NVal OHis OPhe OIle OLeu OCys OTrp OTyr OMouse H1° CTD9741.2cPercentage of total amino acids.12.41.02.1007.22.11.0017.55.29.301.000000Human H1° CTD9742.312.41.01.0007.22.11.0017.56.26.201.01.01.0000Human H1-1 CTD9840.813.35.10003.100024.55.18.20000000Human H1-2 CTD10539.014.35.71013.800023.85.74.8001.00000Human H1-3 CTD10841.712.03.70001.900034.32.83.70000000Human H1-4 CTD10441.313.53.80002.900033.73.81.00000000Human H1-a CTD9738.111.33.12.11.007.200019.610.36.20001.0000Human macroH2A connector3834.213.27.92.602.610.52.60013.22.62.6005.32.6000Human H2B NTD2733.314.87.4003.77.43.73.7018.53.73.70000000Xenopus H2B NTD2733.314.87.4003.77.43.73.7014.87.43.70000000Human H3 NTD3724.35.410.88.105.42.700018.913.55.40002.72.700Xenopus H3 NTD3721.65.410.810.805.45.400021.613.52.70002.7000Xenopus H4 NTD2718.5029.614.83.73.73.703.703.703.73.703.77.4000Xenopus H2A NTD1323.1030.815.407.77.70007.77.700000000Yeast Ure2p PDdpD, prion domain.881.105.74.537.511.411.43.42.31.11.15.74.51.12.33.43.4000Yeast Sup35p PD1130.94.416.81.817.728.33.501.804.40001.800.90017.7a The start methionines have been excluded.b D, disorder-producing; N, neutral; O, order-producing. See Ref. 43Weathers E.A. Paulaitis M.E. Woolf T.B. Hoh J.H. FEBS Lett. 2004; 576: 348-352Crossref PubMed Scopus (109) Google Scholar.c Percentage of total amino acids.d pD, prion domain. Open table in a new tab CTD truncation mutants were used to define the location of the amino acid residues involved in mouse H1° CTD function during chromatin condensation (26Lu X. Hansen J.C. J. Biol. Chem. 2004; 279: 8701-8707Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar). The determinants for both linker DNA binding and chromatin fiber stabilization were localized to two distinct, separated regions of the CTD (Fig. 1). The functional regions are somewhat enriched in Val, but otherwise the amino acid composition of all CTD regions examined was similar. The two functional CTD regions can be interchanged, 3X. Lu and J. C. Hansen, unpublished data. even though their primary sequences are different. This suggests that the key properties involved in DNA binding and chromatin condensation are amino acid composition and location of the CTD region relative to the winged helix domain, not primary sequence. The H1 CTD also has been shown to mediate the protein-protein interactions involved in H1-dependent activation of the apoptotic nuclease, DFF40/CAD (7Widlak P. Kalinowska M. Parseghian M.H. Lu X. Hansen J.C. Garrard W.T. Biochemistry. 2005; 44: 7871-7878Crossref PubMed Scopus (54) Google Scholar). The CTD region that binds to the enzyme is large and partially overlaps with the two CTD regions that bind linker DNA and stabilize condensed chromatin (Fig. 1). Interestingly, all somatic linker histone isoforms activated the enzyme identically in vitro. Moreover, all free CTD peptides that were at least 47 residues in length could bind to and activate the enzyme, regardless of their primary sequence and original location in the intact CTD. Thus, amino acid composition and location of the CTD region relative to the winged helix domain also appear to be the determinants of CTD-protein interactions. Together, the studies of linker histone CTD involvement in chromatin condensation and DFF40/CAD activation demonstrate that the H1 CTD is an IDP region capable of interacting with both DNA and proteins and suggest that CTD function is linked to a distinctive amino acid composition. The functions of the core histone NTDs have been investigated extensively (2Hansen J.C. Annu. Rev. Biophys. Biomol. Struct. 2002; 31: 361-392Crossref PubMed Scopus (419) Google Scholar, 3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar, 27Hansen J.C. Tse C. Wolffe A.P. Biochemistry. 1998; 37: 17637-17641Crossref PubMed Scopus (216) Google Scholar). These domains currently are of particular interest because specific patterns of NTD acetylation and methylation regulate gene expression and other nuclear processes (28Fischle W. Wang Y. Allis C.D. Curr. Opin. Cell Biol. 2003; 15: 172-183Crossref PubMed Scopus (986) Google Scholar, 29Kurdistani S.K. Grunstein M. Nat. Rev. Mol. Cell Biol. 2003; 4: 276-284Crossref PubMed Scopus (552) Google Scholar, 30Henikoff S. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5308-5309Crossref PubMed Scopus (76) Google Scholar). The NTDs are not observed in the crystal structures of the nucleosome (31Luger K. Curr. Opin. Genet. Dev. 2003; 13: 127-135Crossref PubMed Scopus (232) Google Scholar). Free NTD peptides are disordered (see Ref. 27Hansen J.C. Tse C. Wolffe A.P. Biochemistry. 1998; 37: 17637-17641Crossref PubMed Scopus (216) Google Scholar). In nucleosomes, the NTDs adopt increased α-helical content when bound to DNA (32Baneres J.L. Martin A. Parelló J. J. Mol. Biol. 1997; 273: 503-508Crossref PubMed Scopus (76) Google Scholar, 33Wang X. Moore S.C. Laszckzak M. Ausió J. J. Biol. Chem. 2000; 275: 35013-35020Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar). All four of the core histone NTDs participate in the internucleosomal interactions that drive chromatin fiber condensation (3Dorigo B. Schalch T. Bystricky K. Richmond T.J. J. Mol. Biol. 2003; 327: 85-96Crossref PubMed Scopus (420) Google Scholar, 4Gordon F. Luger K. Hansen J.C. J. Biol. Chem. 2005; 280: 33701-33706Abstract Full Text Full Text PDF PubMed Scopus (113) Google Scholar). In addition, the H3 and H4 NTDs also bind to proteins such as Sir3p and p300 (5Hecht A. Laroche T. Strahl-Bolsinger S. Gasser S.M. Grunstein M. Cell. 1995; 80: 583-592Abstract Full Text PDF PubMed Scopus (694) Google Scholar, 6An W. Roeder R.G. J. Biol. Chem. 2003; 278: 1504-1510Abstract Full Text Full Text PDF PubMed Scopus (23) Google Scholar). The amino acid composition of the core histone NTDs is shown in Table 1. The NTDs have a low percentage of hydrophobic residues and are highly enriched in Lys, Gly, and Arg residues. By all available criteria, the core histone NTDs also possess the characteristics of an IDP region. Unlike the linker histone CTDs, their primary sequences are highly conserved. A closer examination of the amino acid composition of the core histone NTDs reveals several interesting trends (Table 1). The composition of the H2A and H4 NTDs is very similar but differs significantly from that of the linker histone CTDs. Specifically, the H2A and H4 NTDs have no Pro, more Gly and Arg, and fewer Ala than the linker histone CTDs. On the other hand, the amino acid composition of the H2B and H3 NTDs is surprisingly similar to that of the linker histone CTDs (Table 1). Based on amino acid composition, at least two different types of IDP regions are involved in histone function. It is of note that the characteristic amino acid composition of the H1 CTDs also is found in other proteins. A region of 38 residues in the core histone variant, macroH2A, has a very similar amino acid composition as the linker histone CTDs (Table 1). However, in this case, the IDP region is located internally and connects two structured domains (Fig. 1). The amino acid composition of the macroH2A connector domain suggests that it is an IDP region. The broader implication is that the linker histone CTD actually is a specific type of IDP region that is found in different locations within different proteins. Further support for the existence of specific types of IDP regions comes from studies of yeast prions (infectious proteins). The yeast prion proteins Ure2p and Sup35p each contain an N-terminal prion domain that is sufficient for prion formation but dispensable for the normal function of the protein (34Wickner R.B. Edskes H.K. Roberts B.T. Baxa U. Pierce M.M. Ross E.D. Brachmann A. Genes Dev. 2004; 18: 470-485Crossref PubMed Scopus (64) Google Scholar). In both cases the prion domains are intrinsically disordered, but upon conversion to the prion conformation, they self-associate to form self-propagating amyloid-like fibrils (35Chien P. Weissman J.S. DePace A.H. Annu. Rev. Biochem. 2004; 73: 617-656Crossref PubMed Scopus (288) Google Scholar). The prion conformation is a folded β-domain structure, as with other amyloid-forming proteins. Randomizing the primary sequence of the Sup35p and Ure2p prion domains while keeping the amino acid composition constant does not inhibit prion formation (36Ross E.D. Edskes H.K. Terry M.J. Wickner R.B. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 12825-12830Crossref PubMed Scopus (173) Google Scholar, 37Ross E.D. Baxa U. Wickner R.B. Mol. Cell. Biol. 2004; 24: 7206-7213Crossref PubMed Scopus (148) Google Scholar), indicating that amino acid composition is the fundamental determinant of amyloid formation in these systems and not the primary sequence. The amino acid composition of the prion domains is consistent with that of an IDP but differs significantly from that of the H1 CTDs (Table 1). In particular, Ure2p and Sup35p are highly enriched in Asn and Gln rather than Lys, Ala, and Pro. An intriguing possibility is that amino acid composition determines which type of secondary structure is formed when an IDP region binds to a target, e.g. Asn/Gln-rich IDP regions may form β-domains, whereas Lys/Ala/Pro IDP regions may form α-helices. (Although Pro is generally considered to be a strong helix breaker, it is also a strong helix initiator, frequently occurring as the N-terminal residue of α-helical segments (38Richardson J.S. Richardson D.C. Science. 1988; 240: 1648-1652Crossref PubMed Scopus (1298) Google Scholar).) In summary, the relationship between amino acid composition and IDP function is complex. The same is true for the relationship between amino acid composition and primary sequence. Using amino acid composition as a criterion, it appears that there are many different types of functional IDP regions just as there are many types of different functional protein folds. Lysine residues within the intrinsically disordered core histone NTDs are modified through addition of methyl or acetyl groups. Specific patterns of NTD acetylation and methylation are involved in the regulation of transcription, replication, and other nuclear processes (28Fischle W. Wang Y. Allis C.D. Curr. Opin. Cell Biol. 2003; 15: 172-183Crossref PubMed Scopus (986) Google Scholar, 29Kurdistani S.K. Grunstein M. Nat. Rev. Mol. Cell Biol. 2003; 4: 276-284Crossref PubMed Scopus (552) Google Scholar, 30Henikoff S. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5308-5309Crossref PubMed Scopus (76) Google Scholar). These patterns of modifications often function by establishing or disrupting specific binding surfaces for other proteins. For example, the proteins HP1 (39Jacobs S.A. Khorisanizadeh S. Science. 2002; 295: 2080-2083Crossref PubMed Scopus (649) Google Scholar, 40Nielsen P.R. Nietlispach D. Mott H.R. Callaghan J. Bannister A. Kouzarides T. Murzin A.G. Murzina N.V. Laue E.D. Nature. 2002; 416: 103-107Crossref PubMed Scopus (496) Google Scholar) and polycomb (41Fischle W. Wang Y. Jacobs S.A. Kim Y. Allis C.D. Khorisanizadeh S. Genes Dev. 2003; 17: 1870-1881Crossref PubMed Scopus (793) Google Scholar) can only bind to an H3 NTD peptide if the peptide is di- or trimethylated on Lys-9. A question that has remained largely unanswered at the molecular level is how acetylation and methylation influence the ability of the core histone NTDs to participate in specific protein-protein interactions. Acetylation and methylation both affect the charge density, size, and hydrophobicity of the Lys side chain. Hydrophobicity may be particularly important because there are very few hydrophobic amino acids in IDP regions. Acetylation of Lys makes formation of secondary structures more favorable by decreasing the positive charge density and enhancing hydrophobic character. The free charged NH group is converted into a neutral amide linkage capped with a hydrophobic methyl group. Methylation of Lys leaves the positive charge density unaltered but replaces up to three polar NH groups capable of hydrogen bonding with hydrophobic methyl groups. Acetylation and methylation of Lys ultimately create unique amino acids with unusual properties. Hence we do not believe that acetylation and methylation simply create patterns of “marks” that are recognized by other proteins. Rather, we feel that acetylation and methylation alter the fundamental IDP properties of the NTDs as a prerequisite for coupled NTD folding and target binding. This view is supported by the finding that nonspecific hyperacetylation of the core histone NTDs increases their average α-helical content (33Wang X. Moore S.C. Laszckzak M. Ausió J. J. Biol. Chem. 2000; 275: 35013-35020Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar). Moreover, Dion et al. (42Dion M.F. Altschuler S.J. Wu L.F. Rando O.J. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 5501-5506Crossref PubMed Scopus (310) Google Scholar) have shown that H4 acetylation at Lys-5, -8, and -12 functions in vivo through a nonspecific, cumulative mechanism. X-ray and NMR studies of complexes between methylated H3 peptides and the HP-1 and polycomb chromodomains have provided insight at the molecular level (39Jacobs S.A. Khorisanizadeh S. Science. 2002; 295: 2080-2083Crossref PubMed Scopus (649) Google Scholar, 40Nielsen P.R. Nietlispach D. Mott H.R. Callaghan J. Bannister A. Kouzarides T. Murzin A.G. Murzina N.V. Laue E.D. Nature. 2002; 416: 103-107Crossref PubMed Scopus (496) Google Scholar, 41Fischle W. Wang Y. Jacobs S.A. Kim Y. Allis C.D. Khorisanizadeh S. Genes Dev. 2003; 17: 1870-1881Crossref PubMed Scopus (793) Google Scholar). In all such complexes, the disordered methylated H3 peptide adopts an extended chain structure, actually serving to fill in a β-sheet in several cases. The extended chain structure optimizes interactions of both the backbone and side chain groups of the peptide with those of the protein. Importantly, modified Lys residues are recognized by specific features. For example, chromodomains have three aromatic side chains that form a hydrophobic cage that interacts with the methyl group(s) of the methylated Lys side chain. Without the hydrophobicity imparted by the methyl groups, binding would not be possible and the NTD peptide would not assume the extended chain secondary structure. The biological need for the core histone NTDs and linker histone CTD to interact with many modifying enzymes and recognition modules with widely varying structures can be readily accommodated if these domains are intrinsically disordered. We envision that the histone terminal domains interact with their targets through several different modes. In many cases, they bind as extended chains to sites that recognize the local sequence properties as in the case of the recognition motifs discussed in the previous section. In other cases, these IDP regions can fold into α-helical segments, β-hairpins, or other simple motifs, burying hydrophobic groups introduced by modifications and/or in combination with hydrophobic groups on the target. If binding depends primarily on the stability of the secondary structure formed, it may be more important to conserve amino acid composition rather than primary sequence. In this regard, the sequence conservation of the core histone NTDs may be related to maintaining unique post-translational modification sites more so than the primary sequence per se. The IDP regions of linker histones and yeast amyloid proteins challenge the paradigm that the primary amino acid sequence and corresponding main and side chain interactions dictate formation of a unique local polypeptide conformation with the lowest free energy state. In these systems, a specific amino acid composition is conserved and correlated with protein function, whereas the primary sequence varies. A recent study of 718 IDP sequences using support vector machine analysis (43Weathers E.A. Paulaitis M.E. Woolf T.B. Hoh J.H. FEBS Lett. 2004; 576: 348-352Crossref PubMed Scopus (109) Google Scholar) concluded that amino acid composition is the only parameter needed to accurately recognize IDPs and that IDP regions are defined by physical properties of a short stretch of amino acids rather than the interactions dictated by the primary sequence of amino acids. Evidence is mounting that in many cases there is a direct correlation between amino acid composition, intrinsic disorder, and protein function. Even in situations where the primary sequence is conserved (such as the core histone NTDs), local amino acid composition may be the key property required for molecular recognition. Examination of the relationships between amino acid composition and IDP function in a wide range of biological systems is likely to reveal new principles of protein structure and molecular recognition.
Referência(s)