Artigo Acesso aberto Revisado por pares

Solution NMR structure of the ARID domain of human AT-rich interactive domain-containing protein 3A: A human cancer protein interaction network target

2010; Wiley; Volume: 78; Issue: 9 Linguagem: Inglês

10.1002/prot.22718

ISSN

1097-0134

Autores

Gaohua Liu, Yuanpeng J. Huang, Rong Xiao, Dongyan Wang, Thomas Acton, G.T. Montelione,

Tópico(s)

Cancer Mechanisms and Therapy

Resumo

The Northeast Structural Genomics Consortium has constructed a Human Cancer Protein Interaction Network (HCPIN), providing structure–function annotations of key proteins associated with human cancer and developmental biology.1 The long-range goal of the HCPIN project is to provide a comprehensive 3D structure-function database of human-cancer-associated proteins and protein complexes in the context of functional networks, using both experimental structures and high quality homology models (e.g. using protein templates with >80% sequence identity).1 The human AT-rich interactive domain-containing protein 3A (ARID3A) is one of these HCPIN proteins. As a part of this large-scale human cancer biology project, the AT-rich interactive domain (ARID) of human ARID3A, residues 218–351 (NESG ID HR4394C) has been selected for structural characterization. ARID3A, an ARID family transcription factor, also called dead ringer-like protein 1 (Dril1), B-cell regulator of IgH transcription (Bright), and E2F-binding protein 1 (E2FBP1), belongs to the ARID family of DNA-binding proteins known to play important roles in embryonic patterning, cell lineage gene regulation, cell cycle control, chromatin remodeling, and transcriptional regulation.2-5 ARID proteins have been identified in all sequenced higher eukaryotic genomes; the consensus sequence of the ARID domain spans about 100 amino acid residues.3 ARID proteins are partitioned into three structural classes: (i) minimal ARID proteins that consist of a core domain formed by six α-helices; (ii) ARID proteins that supplement the core domain with an N-terminal α-helix; and (iii) extended-ARID proteins, which contain the core domain and additional α-helices at their N- and C-termini.6 The 15 distinct human ARID family proteins can be divided into seven subfamilies based on the degree of sequence identity between individual members.3 The majority of ARID subfamilies (i.e. five out of seven) bind DNA without obvious sequence preference, though DNA-binding affinity varies, somewhat, between subfamilies. Structural studies have identified the DNA major groove contact site as a modified helix-turn-helix motif.3, 4 The third mammalian ARID subfamily, ARID3, contains three human proteins (ARID3A, ARID3B, and ARID3C). These paralogs are the most direct mammalian counterparts of the drosophila "dead ringer" protein Dri.3 Like drosophila Dri, these human ADID3 proteins have both N- and C- terminal extensions beyond the consensus ARID core sequence, which does not occur in other human ARID family members.3, 4 ARID domains generally function as dsDNA-binding domains. The ARID domain of human ARID3A, the subject of this study, has ∼78% sequence identity with drosophila "dead ringer" protein Dri, an essential ARID-containing transcription factor. ARID3A is also an ARID family transcription factor, ubiquitously expressed in all tissues examined.7 It is a direct p53 target gene.2 It can immortalize primary mouse fibroblasts, bypass RASV12-induced cellular senescence, and collaborates with RASV12 or MYC in mediating oncogenic transformation.2, 5 ARID3A also activates immunoglobulin heavy chain transcription and engages in heterodimer formation with E2F to stimulate E2F-dependent transcription.5, 8 While the regulatory mechanisms of ARID3A function remain largely unknown, SUMO (Small Ubiquitin-related Modifier) modification of ARID3A was recently revealed to play crucial roles in ARID3A transcriptional activity.5 ARID3A was found to interact with the SUMO-conjugating enzyme Ubc9, and is sumoylated both in vitro and in vivo at Lys398. PIASy, a member of the Protein Inhibitor of Activated STAT (PIAS) family, functions as a specific SUMO E3-ligase for ARID3A, and promotes its sumoylation both in vitro and in vivo.5 The ARID DNA-binding domain of ARID3A was selected as a NESG target as part of our HCPIN human cancer biology theme project.1 The NMR structure of the ARID domain of human ARID3A reported here provides a structural basis for elucidating the regulatory mechanisms of ARID3A function, and the molecular mechanism of ARID3A interactions with DNA. It also has potential value in future drug discovery and design. The ARID domain of ARID3A from Homo sapiens (UniProtKB/Swiss-Prot ID Q99856/ARI3A_HUMAN, residues 218–351) was cloned, expressed, and purified following standard, largely-automated NESG protocols, to produce a uniformly 13C, 15N-enriched protein sample.9 Briefly, the truncated ARI3A_HUMAN (218-351) gene was cloned into a modified pET14-15C expression vector (Novagen), as described in Ref. 9, yielding the plasmid pHR4394C-14.6. The resulting construct contains 11 non-native residues at the N-terminus (MGHHHHHHSHM) that facilitate protein purification. Escherichia coli BL21 (DE3) pMGK cells were transformed with pHR4394C-14.6, and cultured in MJ9 minimal medium10 containing (U-15NH4)2SO4 and U-13C-glucose as sole nitrogen and carbon sources, respectively. U-13C, 15N ARID3A was purified using an ÄKTAxpress™ (GE Healthcare) two-step protocol consisting of IMAC (HisTrap HP) and gel filtration (HiLoad 26/60 Superdex 75) chromatographies. The final yield of purified U-13C, 15N ARID3A (>98% homogeneous by SDS-PAGE; 18.2 kDa by MALDI-TOF mass spectrometry) was ∼38 mg/L. In addition, a U-15N and 5% 13C-enriched sample was generated for stereo-specific assignment of isopropyl methyl groups. Both U-13C,15N and 5%13C,U-15N ARID3A were dissolved at concentrations of ∼0.9 mM in 95% H2O/5% 2H2O buffer containing 20 mM MES, 200 mM NaCl, 10 mM DTT, 5 mM CaCl2, and 0.02% NaN3, at pH 6.5. Static light scattering data9 and rotational correlation time measurements (Supporting information Figures S3 and S4) indicate that the protein is monomeric in solution under the conditions used for these NMR studies. All NMR spectra were recorded at 25°C, using cryogenic NMR probes. Triple resonance NMR data were collected on Varian INOVA 600 MHz spectrometer, while simultaneous 3D 15N/13Caliphatic/13Caromatic-edited NOESY11 (mixing time 100 ms) in H2O and 3D 13C-edited NOESY (mixing time 100 ms) in 2H2O spectra were acquired on a Bruker AVANCE 800 MHz spectrometer. 2D constant-time [13C, 1H]-HSQC spectra, with 28 ms and 42 ms constant-time delays, were recorded for the 5% 13C-enriched sample on the Varian INOVA 600 MHz spectrometer to obtain stereo-specific assignments for isopropyl groups of valines and leucines.12 All NMR data were processed using the program NMRPipe13 and analyzed using the program XEASY.14 Spectra were referenced to external DSS. Sequence-specific resonance assignments were determined as described previously.15 Backbone assignment (HN/N/C', Hα/Cα, and Hβ/Cβ) were obtained in largely automated fashion with the program AUTOASSIGN.16 These assignments, together with random coil sidechain chemical shift values, were then used to simulate peak lists which facilitated manual analysis of side-chain resonance assignments. Simultaneous 3D 15N/13Caliphatic/13Caromatic-NOESY and CCH-TOCSY were then analyzed manually to obtain nearly complete side-chain assignments. Assignments were obtained for 93% of backbone and side-chain chemical shifts assignable with the NMR experiments listed above (excluding N-terminal NH3+, Lys NH3+, Arg NH2, OH of Ser, Thr, and Tyr, 13Cγ of Asp and Asn, 13Cδ of Glu and Gln, and aromatic 13Cγ shifts, Supporting information Table S1). Chemical shifts were deposited in the BioMagResBank on 06/14/2009 with accession code 16348. The locations of regular secondary structure elements were next identified based on backbone chemical shift data.17 A NOESY peak list containing expected intraresidue, sequential, and α-helical medium-range NOE peaks was initially generated and then manually edited by visual inspection of the simultaneous-NOESY spectrum. Subsequent manual peak picking was then used to identify remaining, primarily long-range NOEs.15 The programs CYANA18, 19 and AUTOSTRUCTURE20, 21 were used in parallel to automatically assign long-range NOEs. Assignments identically obtained by both programs ('consensus assignments')15 were retained and established the starting point for iterative cycles of noise/artifact peak removal, peak picking, and NOE assignment. Statistical summaries of 1H - 1H upper-bound distance constraints used for structure calculations are given in Table I. In addition, backbone dihedral angle constraints were derived from chemical shifts using the program TALOS+28 for residues located in well-defined secondary structure elements (Table I). No predetermined distance constraint was used throughout the structure determination. The final structure calculation was performed with CYANA 3.0, and the 20 conformers with the lowest target function value were refined in an 'explicit water bath'29 using the program CNS.30 The coordinates were deposited in the Protein Data Bank on 06/14/2009 (accession code 2KK0). Structural statistics and global structure quality factors including Verify3D,23 ProsaII,24 PROCHECK,25 and MolProbity26 raw and statistical Z-scores were computed using the PSVS 1.3 software package.27 The global goodness-of-fit of the final structure ensembles with the NOESY peak list data were determined using the RPF analysis program.31 All structure figures were made using MOLMOL22 or PyMOL 1.1.32 The solution NMR structure of ARID3A ARID domain consists of eight α-helices α0–α7 (residues 231–234, 239–254, 272–282, 285–291, 294–300, 310–321, 324–329, 335–346) and a short β hairpin consisting of antiparallel strands β1 and β2 (residues 264–265, 268–269) [Fig. 1(a,b)]. Helices α0 and α1 form a V shape, helices α2–α4 and helices α5–α7 form two U shapes, and the V and two U shapes pack orthogonal to each other to form the protein fold. Structural statistics for this largely helical domain are given in Table I; this solution NMR structure exhibits good quality assessment scores. The ARID core structure, including helices α1–α6 and the short β hairpin, is well defined. Backbone amides of residues Asn260, Val269, Arg295, and N-terminal residues 224–230 were not observed in [15N-1H]-HSQC spectra due to line broadening; these apparently exchange-broadened residues are located in the same region of the ARID core structure to which the extended N-terminal helix α0 is anchored. The N-terminal region, from residues 218–230, is flexible in solution. In contrast, helix α7, formed by the C-terminal extended sequence characteristic of the ARID3 subfamily, is well defined and packs against the ARID core. Most residues buried in the helical bundle are hydrophobic, and the structure is mainly stabilized by a hydrophobic core that is well defined by the extensive interhelical NOE interaction network. NMR structures of ARID3A. (a) Backbone trace of residues 218–351 of 20 representative CNS conformers of ARID3A, after superposition of backbone N, Cα, and C' atoms of the regular secondary structure elements for minimal root-mean-square deviation (RMSD). The N- and C-termini are labeled as "N" and "C". (b) Ribbon drawing of residues 218–351 of the ARID3A conformer with the lowest CNS energy. α-helices α0–α7, β strands β1–β2, and loops L1–L2 are labeled and colored; the N- and C-termini are labeled as "N" and "C". (c) Comparison of the ARID domain of human ARID3A (red) with drosophila "dead ringer" Dri in the presence of dsDNA (blue, PDB code 1kqq) and in the absence of dsDNA (green, PDB code 1c20) after superposition of backbone N, Cα, and C' atoms of regular secondary structure elements for minimal RMSD. Side-chains of the 16 residues in ARID3A corresponding to those involved in DNA interactions in the Dri-DNA complex are colored red (d) Surface representation of the ARID3A ARID domain conformer with the lowest CNS energy. The structure shown on the right is rotated by 180° about the vertical axis. Surface colors represent the electrostatic potential. All figures were prepared with the programs MOLMOL31 or PYMOL32. Some 3D structures of ARID family members are available in PDB, both with and without bound dsDNA.6, 33 A search for structurally similar proteins in the PDB using the DALI34 server revealed significant structural similarity between ARID3A and these other ARID domains, including Dri ARID from Drosophila melanogaster bound to dsDNA33 [Fig. 1(c), PDB code 1kqq with DALI Z score 17.6, RMSD 2.1 Å, 78% SeqID), Dri ARID without dsDNA6 [Fig. 1(c), PDB code 1c20 with DALI Z score 12.3, RMSD 3.1 Å, 78% SeqID], ARID1B (PDB code 2eh9 with DALI Z score 12.6, RMSD 2.0 Å, SeqID 43%), the ARID domain from the human JARD1C protein35 (PDB code 2jrz with DALI Z score 11.5, RMSD 2.5 Å, SeqID 25%), and the ARID domain from the histone H3K4 demethylase RBP236 (PDB code 2jxj with DALI Z score 9.8, RMSD 2.4 Å, SeqID 32%). The ARID domain structures of ARID3A and Dri have similar global folds, as expected from their high sequence similarity (78% SeqID, Supporting information Figure S5), and they also have significant structural differences (as described below). ARID3A has a large basic charged surface distribution on one face that may facilitate recognition of the negatively charged DNA duplex [Fig. 1(d)], and a mainly negative charged distribution on the opposite side of the molecule. Similar surface electrostatic distributions are observed for the drosophila Dri.33 The complex structure of Dri bound to dsDNA33 shows that side-chains of 16 residues of Dri have close contact with dsDNA, including (i) the modified helix-turn-helix motif (α4-Loop2-α5 region) that directly contacts the DNA major groove, and (ii) the β hairpin loop L1 and C-terminal extended helix α7 regions that have contacts with the DNA regions outside the major groove.3, 33 All of these residues are solvent exposed, and 15 out of the 16 residues are identical between ARID3A and Dri (Supporting information Figure S5); hence ARID3A likely binds to dsDNA using a similar mechanism as Dri. Although ARIDs of ARID3A and Dri have similar overall structures, there are also some differences between these structures. Interestingly, the ARID3A structure is more similar to the Dri ARID domain bound to dsDNA than to the structure of Dri determined in the absence dsDNA, especially the β-hairpin loop L1 region. The backbone RMSD values to dsDNA-bound and unbound Dri are 2.1 Å and 3.1 Å, respectively [Fig. 1(c)].6 In the ARID domain of ARID3A, the N-terminal extension sequence is three residues shorter than that of drosophila Dri; as the resonances of several residues in this N-terminal segment are too weak to be observed, presumably due to exchange broadening, we cannot accurately characterize the length of helix α0 in this human ARID domain. These exchange-broadened residues are helical in a CS_Rosetta37 structure of this same construct (unpublished results). The two loops, L1 between β1 and β2, and L2 between α4 and α5, in ARID3A are better defined in the solution NMR structure of ARID3 than those in the unbound Dri; loop L2 is actually disordered in the unbound Dri.6 As RMSD values across an ensemble of NMR structures should only be used qualitatively in estimating the precision and/or flexibility of a NMR structure,38 the higher degree of order in loops L1 and L2 of the human ARID domain compared with Dri was carefully investigated, and is supported by extensive NOESY and heteronuclear NOE data summarized in Supporting information Figure S2 and S6. Besides unobserved backbone amides of Asn260 and Val269 located where the N-terminal helix α0 is anchored, the backbone amide resonances of nearby residues Ile262, Met265, Ala266, and Leu270 are also slightly broadened, suggesting there may be some intermediate conformational exchange of loop L1 or exchange between an ordered loop L1 contacting and not contacting the helix α0. Such exchange can result in inconsistent distance constraints due to NOEs arising from multiple members of the conformational ensemble; satisfying these constraints can result in a tighter bundle than one might expect for a loop with modest flexibility.38 Though this loop is much more ordered in ARID3A ARID domain than in the Dri domain in the absence of DNA, we cannot rule out the possibility that this DNA-binding loop has modest conformational flexibility in the ARID domain structure. The 3D structure of ARID3A shows that side-chains of Met265 and Ala266 are packed into a hydrophobic pocket formed by the side-chains of Ala310, Thr313, and Leu314 from helix α5 and Leu303, Pro304, and Ile307 from loop L2. This is supported by observed NOE distance constraints (Supporting information Table S2 and Fig. S6). This feature is also observed in the Dri structure in the presence of dsDNA,33 but not in Dri in the absence of dsDNA.6 It has been suggested that dsDNA binding drives the rotation of the N-terminal subdomain (comprised of helices α0, α1, and β hairpin) relative to the remainder of the polypeptide.33 However, the similarity between human ARID3A ARID domain in the absence of dsDNA and the Dri ARID domain bound to DNA indicates less conformational rearrangement upon DNA-binding by human ARID3A than is observed for the drosophila "dead ringer" Dri protein, and/or that the equilibrium distribution between apo and dsDNA-binding forms of the domain are different in these human and drosophila ARID domain constructs. Interestingly, of the 16 residues that contact the DNA in the Dri-dsDNA complex, only one is different between the Dri and human ARID3A; i.e. Gln338 in Dri is replaced by basic Arg295 in human ARID3A, which may affect the dsDNA binding affinity. Comparisons of DNA-binding by human ARID3A ARID and Dri ARID domains may thus provide useful insights into the functional role of conformational rearrangements in the process of dsDNA recognition by this important class of eukaryotic transcription factors. Additional Supporting Information may be found in the online version of this article. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

Referência(s)