Modular Organization of Phylogenetically Conserved Domains Controlling Developmental Regulation of the Human Skeletal Myosin Heavy Chain Gene Family
2002; Elsevier BV; Volume: 277; Issue: 31 Linguagem: Inglês
10.1074/jbc.m203162200
ISSN1083-351X
AutoresStéphane König, James M. Burkman, Julie C. Fitzgerald, Marilyn A. Mitchell, Leonard T. Su, Hansell H. Stedman,
Tópico(s)Cardiomyopathy and Myosin Studies
ResumoThe mammalian skeletal myosin heavy chain locus is composed of a six-membered family of tandemly linked genes whose complex regulation plays a central role in striated muscle development and diversification. We have used publicly available genomic DNA sequences to provide a theoretical foundation for an experimental analysis of transcriptional regulation among the six promoters at this locus. After reconstruction of annotated drafts of the human and murine loci from fragmented DNA sequences, phylogenetic footprint analysis of each of the six promoters using standard and Bayesian alignment algorithms revealed unexpected patterns of DNA sequence conservation among orthologous and paralogous gene pairs. The conserved domains within 2.0 kilobases of each transcriptional start site are rich in putative muscle-specific transcription factor binding sites. Experiments based on plasmid transfection in vitroand electroporation in vivo validated several predictions of the bioinformatic analysis, yielding a picture of synergistic interaction between proximal and distal promoter elements in controlling developmental stage-specific gene activation. Of particular interest for future studies of heterologous gene expression is a 650-base pair construct containing modules from the proximal and distal human embryonic myosin heavy chain promoter that drives extraordinarily powerful transcription during muscle differentiation in vitro. The mammalian skeletal myosin heavy chain locus is composed of a six-membered family of tandemly linked genes whose complex regulation plays a central role in striated muscle development and diversification. We have used publicly available genomic DNA sequences to provide a theoretical foundation for an experimental analysis of transcriptional regulation among the six promoters at this locus. After reconstruction of annotated drafts of the human and murine loci from fragmented DNA sequences, phylogenetic footprint analysis of each of the six promoters using standard and Bayesian alignment algorithms revealed unexpected patterns of DNA sequence conservation among orthologous and paralogous gene pairs. The conserved domains within 2.0 kilobases of each transcriptional start site are rich in putative muscle-specific transcription factor binding sites. Experiments based on plasmid transfection in vitroand electroporation in vivo validated several predictions of the bioinformatic analysis, yielding a picture of synergistic interaction between proximal and distal promoter elements in controlling developmental stage-specific gene activation. Of particular interest for future studies of heterologous gene expression is a 650-base pair construct containing modules from the proximal and distal human embryonic myosin heavy chain promoter that drives extraordinarily powerful transcription during muscle differentiation in vitro. myosin heavy chain analysis of variance group of overlapping clones conserved human embryonic MyHC element muscle regulatory factor honestly significant difference Throughout metazoan evolution, the range of molecular mechanisms used to achieve functional diversity in muscle partially reflects the anatomic complexity of the species under consideration. In vertebrates, the maximal contractile speed and energy utilization rate of individual muscle fibers is largely controlled by the structure of the motor protein myosin (reviewed in Ref. 1Schiaffino S. Reggiani C. Physiol. Rev. 1996; 76: 371-422Crossref PubMed Scopus (1271) Google Scholar). The conventional myosins of striated muscle are hexameric proteins composed of paired trimers of myosin heavy chain (which contains the ATP-splitting motor domain) and one each of the essential and regulatory light chains. The human genome has at least 11 distinct striated myosin heavy chain (MyHC)1 genes (2Desjardins P. Burkman J. Shrager J. Allmond L. Stedman H. Mol. Biol. Evol. 2002; 4: 375-393Crossref Scopus (67) Google Scholar), six of which are abundantly expressed in skeletal muscle and are tandemly linked at a single locus on chromosome 17 (3Leinwand L.A. Fournier R.E.K. Nadal-Ginard B. Shows T. Science. 1983; 221: 766-768Crossref PubMed Scopus (110) Google Scholar, 4Yoon S.J. Seiler S. Kucherlapati R. Leinwand L. Proc. Natl. Acad. Sci. U. S. A. 1992; 89: 12078-12082Crossref PubMed Scopus (43) Google Scholar, 5Shrager J. Desjardins P. Burkman J. Konig S. Stewart D., Su, L. Shah M. Bricklin E. Tewari M. Hoffman R. Rickels M. Jullien E. Rubinstein N. Stedman H. J. Muscle Res. Cell Motil. 2000; 21: 345-355Crossref PubMed Scopus (53) Google Scholar). The members of the human skeletal MyHC locus have 38 coding exons and two 5′ noncoding exons. Unlike striated MyHC isoform diversification inDrosophila, which reflects a complex pattern of alternative exon splicing (6Hastings G.A. Emerson C.P.J. J. Cell Biol. 1991; 114: 263-276Crossref PubMed Scopus (55) Google Scholar, 7Bernstein S. Milligan R. J. Mol. Biol. 1997; 271: 1-6Crossref PubMed Scopus (54) Google Scholar), the process in vertebrates relies on sequential activation of a family of distinct genes, each encoding a single MyHC (8Mahdavi V. Strehler E.E. Periasamy M. Wieczorek D. Izumo S. Grund S. Strehler M.-A. Nadal-Ginard B. Emerson C.F.D. Nadal-Ginard B. Siddique M.A. Molecular Biology of Muscle Development. Alan R. Liss, New York1986: 345-361Google Scholar). By virtue of the extraordinary abundance of these proteins (35% of muscle protein), a large body of experimental data on MyHC isoform switching was assimilated before isolation of the encoding genes (9Whalen R. Sell S. Butler-Browne G. Schwartz K. Bouveret P. Pinset-Harstom I. Nature. 1981; 292: 805-809Crossref PubMed Scopus (415) Google Scholar). During embryonic myogenesis in mammals, the transcriptionally 5′-most gene in the skeletal MyHC locus is activated first. During fetal muscle development, the perinatal gene is selectively activated in a large proportion of the muscle cells with progressive down-regulation of the embryonic gene until birth. During subsequent postnatal growth, the perinatal MyHC is progressively replaced by the three "type II" (adult fast twitch) MyHCs. Throughout adult life, reversible transitions are possible in response to cycles of degeneration and regeneration or alterations in the hormonal environment, innervation, and pattern of locomotive recruitment of individual muscle fibers (1Schiaffino S. Reggiani C. Physiol. Rev. 1996; 76: 371-422Crossref PubMed Scopus (1271) Google Scholar,10Pette D. Staron R. Int. Rev. Cytol. 1997; 170: 143-223Crossref PubMed Google Scholar). A sixth gene at the skeletal MyHC locus is expressed only in selected extraocular and laryngeal muscle fibers (11Wieczorek D. Periasamy M. Butler-Browne G. Whalen R. Nadal-Ginard B. J. Cell Biol. 1985; 101: 618-629Crossref PubMed Scopus (248) Google Scholar). The order of sequential activation during development contrasts with the physical order of the linked genes: 5′-embryonic, IIa, IId/x, IIb, perinatal, extraocular-3′. A seventh MyHC abundantly expressed in type I (slow twitch) skeletal muscle fibers is encoded by the physically unlinked cardiac β MyHC. At all times, the stoichiometry of myofibril assembly must be maintained such that inactivation or down-regulation of one skeletal MyHC gene must be precisely offset by activation or up-regulation of another. Two exceptional attributes of the mammalian skeletal MyHC locus, size and tandem gene number, have limited the study of transcriptional regulatory mechanisms within this gene family. Predating the modern era of general information about transcription factor binding sites, studies of a 1.4-kb region of genomic DNA upstream from the rat embryonic MyHC gene suggested a complex pattern of regulation by counteracting cis-elements (12Bouvagnet P.F. Strehler E.E. White G.E. Strehler-Page M.A. Nadal-Ginard B. Mahdavi V. Mol. Cell. Biol. 1987; 7: 4377-4389Crossref PubMed Scopus (61) Google Scholar). Studies of a large number of other single copy genes expressed during myogenesis have more recently defined roles and binding specificities for several transcription factors, providing a basis for general models of promoter and enhancer function during commitment and terminal differentiation. Transcription factors of the MyoD/MRF and MEF-2 families play pivotal and synergistic roles in the activation and maintenance of the myogenic differentiation pathway (13Weintraub H. Davis R. Tapscott S. Thayer M. Krause M. Benezra R. Blackwell T.K. Turner D. Rupp R. Hollenberg S. Science. 1991; 251: 761-766Crossref PubMed Scopus (1232) Google Scholar, 14McKinsey T.A. Zhang C.L. Olson E.N. Curr. Opin. Genet. Dev. 2001; 11: 497-504Crossref PubMed Scopus (352) Google Scholar). Although the precise mechanisms by which these factors achieve transcriptional activation remain unclear, there is increasing evidence for a central role of histone modification and chromatin remodeling. In addition to MRF and MEF-2, numerous ubiquitous and muscle-specific transcription factors have been found to play important roles in gene up-regulation during terminal myogenesis. A partial list reflecting the extent of characterization includes AP1, AP2, GR, Oct-1, SRF, Sp1, TEF, and TR (15Andreucci J.J. Grant D. Cox D.M. Tomc L.K. Prywes R. Goldhamer D.J. Rodrigues N. Bedard P.A. McDermott J.C. J. Biol. Chem. 2002; 277: 16426-16432Abstract Full Text Full Text PDF PubMed Scopus (62) Google Scholar, 16Perkins K.J. Burton E.A. Davies K.E. Nucleic Acids Res. 2001; 29: 4843-4850Crossref PubMed Google Scholar, 17Sun X. Fischer D.R. Pritts T.A. Wray C.J. Hasselgren P.O. Am. J. Physiol. Regul. Integr. Comp. Physiol. 2002; 282: R509-R518Crossref PubMed Scopus (40) Google Scholar, 18Santalucia T. Moreno H. Palacin M. Yacoub M.H. Brand N.J. Zorzano A. J. Mol. Biol. 2001; 314: 195-204Crossref PubMed Scopus (66) Google Scholar, 19Gupta M. Kogut P. Davis F.J. Belaguli N.S. Schwartz R.J. Gupta M.P. J. Biol. Chem. 2001; 276: 10413-10422Abstract Full Text Full Text PDF PubMed Scopus (58) Google Scholar, 20Lakich M.M. Diagana T.T. North D.L. Whalen R.G. J. Biol. Chem. 1998; 273: 15217-15226Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar). Factors traditionally associated with other cell lineages (e.g. GATA and hematopoeisis) have also been found to play important roles in the expression of some muscle-specific genes (21McGrew M.J. Bogdanova N. Hasegawa K. Hughes S.H. Kitsis R.N. Rosenthal N. Mol. Cell. Biol. 1996; 16: 4524-4534Crossref PubMed Scopus (64) Google Scholar). For all of the factors listed above, information about binding specificities has been incorporated into publicly available nucleotide weight matrices (22Wingender E. Chen X. Hehl R. Karas H. Liebich I. Matys V. Meinhardt T. Pruss M. Reuter I. Schacherer F. Nucleic Acids Res. 2000; 28: 316-319Crossref PubMed Scopus (1027) Google Scholar). The sequence specificity of transcription factors and the recruitment of these proteins into multisubunit complexes dictate intriguing patterns of evolutionary sequence conservation in noncoding DNA. In recent years, a number of World Wide Web-based tools have been developed to facilitate the comparative analysis of DNA sequence emerging from the various genome projects. Although the publicly available murine genome sequence currently consists of highly fragmented files in draft form, islands of synteny with the human genome allow reconstruction of orthologous gene alignments that expedite the prospective identification of transcriptional control regions. As currently implemented, "percentage identity plot" analysis provides a low resolution snapshot of large genomic regions (23Schwartz S. Zhang Z. Frazer K.A. Smit A. Riemer C. Bouck J. Gibbs R. Hardison R. Miller W. Genome Res. 2000; 10: 577-586Crossref PubMed Scopus (986) Google Scholar), whereas "Bayesian phylogenetic footprinting" allows a detailed look at regions of up to 2 kb (24Wasserman W.W. Palumbo M. Thompson W. Fickett J.W. Lawrence C.E. Nat. Genet. 2000; 26: 225-228Crossref PubMed Scopus (381) Google Scholar). The latter approach has been applied to other muscle-specific promoters with the finding that the majority of relevant transcription factor binding sites are concentrated within the small percentage of total sequence that has the highest probability of alignment (i.e. meets the most stringent criteria for conservation). The complementary use of transcription factor matrices to screen phylogenetically conserved DNA sequence for high affinity binding sites has the capacity to yield a detailed annotation of putative cis-acting regulatory domains. Hypotheses generated on the basis of this annotation can provide a useful basis for the experimental study of gene regulation in vitro and in vivo. In the present study, we use draft murine genome sequence from several fragmented draft files, initially identified on the basis of BLAST (25Altschul S.F. Gish W. Miller W. Myers E.W. Lipman D.J. J. Mol. Biol. 1990; 215: 403-410Crossref PubMed Scopus (70762) Google Scholar) comparisons with the human genome, to reconstruct a working model for the complete murine skeletal MyHC locus. Each of the six human MyHC promoter regions is analyzed in detail using bioinformatic tools, as exemplified by those cited above, to yield annotated maps reflecting both the level of sequence conservation and the precise location of matches to transcription factor matrices. Regions of strong conservation between the orthologous human and murine MyHC promoters occur in patches scattered throughout at least 2 kb immediately upstream of the transcriptional start sites. Paralogous promoter comparisons reveal surprisingly strong homology within 300 bp of the transcriptional start sites, providing evidence for a conserved proximal "core" promoter domain. Transfection of myotubes in vitro and electroporation of myofibers in vivo are used to further dissect the relevant domains of four of these promoters with the findings that 1) isolated proximal promoter regions can retain developmental stage specificity despite homology to a core consensus sequence and 2) an E box-rich distal region of the human embryonic MyHC promoter dramatically amplifies the in vitroactivity of all of the proximal core promoters tested incis. These findings suggest several testable hypotheses regarding potential short and long range interactions between transcriptional control elements within the skeletal MyHC tandem gene locus. Deduced full-length cDNA sequences derived from an annotated version of the human skeletal myosin heavy chain locus (2Desjardins P. Burkman J. Shrager J. Allmond L. Stedman H. Mol. Biol. Evol. 2002; 4: 375-393Crossref Scopus (67) Google Scholar) were used to query the high throughput genomic DNA sequence databases accessible through the National Center for Biotechnology Information Homepage (www.ncbi.nlm.nih.gov/BLAST/). Species-specific repeats in the reconstructed human locus were identified using the REPEATMASKER algorithm as implemented on the Washington University server (ftp.genome.washington.edu/cgi-bin/RepeatMasker). As soon as draft sequences appeared online to confer complete coverage of all six of the genes at the murine locus, we ordered the draft sequence fragments to correspond to the order of the orthologous human sequences. A reconstructed murine sequence draft spanning the entire locus was annotated, initially by using GENSCAN as implemented on the MIT server (genes.mit.edu/GENSCAN) and later by cross-checking with the human annotated sequence to verify exon splice site predictions. We next used sequence from our full-length cDNA clones to prospectively identify the transcriptional start sites for four of the genes at the human locus. These predictions were confirmed and extended to include all six genes by using the Promoter Prediction by Neural Network program as implemented on the Berkeley Drosophila Genome server (www.fruitfly.org/seq_tools/promoter.html). Orthologous regions were easily identified in the annotated murine MyHC locus. Regions upstream of the predicted human and murine transcriptional start sites were analyzed with the several programs in the MacVector package (version 6.0.1; Oxford Molecular Group) as implemented on the Mac OS platform and as cited throughout. In addition, these regions were analyzed using the TESS and Bayesian Phylogenetic Footprint Homepage programs, as implemented on the University of Pennsylvania (www.cbil.upenn.edu/cgi-bin/tess/tess?SEA-FR-Query) and Wadsworth servers (bayesweb.wadsworth.org/cgi-bin/bayes_align12.pl), respectively. The outputs from the latter analyses were merged by importing the results into Microsoft Word files and reproducing the color coding of the nucleotide sequence to identify phylogenetically conserved regions. The complete files are available on request form the corresponding author. Only the general results have been integrated into the figures in this paper. All of the promoter constructions were made in the pGL3-basic plasmid, a promoterless gene encoding the firefly luciferase (Promega). The different promoters were cloned using theNheI and BglII sites of the pGL3-basic plasmid. For this purpose, all of the promoters were amplified by 30 cycles of PCR using the proofreading Pwo polymerase (Roche Molecular Biochemicals) according to the manufacturer's protocol. All primers were designed to have a 58 °C annealing temperature and to contain the NheI site in 5′ or the BglII site in 3′ as showed in Table I to allow a unidirectional cloning. The plasmid CMV-βGal (CLONTECH) was used for the quantification of transfection efficiency. The hybrid plasmids between the embryonic and the IIb promoters (KNX, KNF) were obtained by the ligation of two PCR fragments through a SalI site introduced into the primer (Table I). The KOF insert was obtained by a recombinant PCR to join both independent PCR fragments.Table ISequences of primers used for the cloning of the different deleted or mutated promotersPrimersSequence 5′ → 3′PlasmidEmbryonic promoter PCR.ProE/Nhe1-SensGGGGGCTAGCGAACGGAGGTGKNE, KNS, KNM PCR.ProE/Bgl2-AntiGGGGAGATCTCCGGGGCTTTTATAGEmbryonic cis-elements PCR-EmbF13-sensTTCCTCTGCGTCTTCTCGAAGCKLZ, KNS, KLT, KMA PCR-EmbB25-antiTCTATGCTGCCAGCTCGTATTCC PCR-EmbF12-sensATGCGTCTGTAGCATCCACAGGACKLY PCR-EmbB24-antiAGGGGGGGCTGTCTAACAAATG PCR-Emb1064/Kpn1-sensGGGGGGTACCAACCACAGGGTGCCCKOQ PCR-EmbB25/Sst1-antiGGGGGAGCTCTCTATGCTGCCAGCTCG PCR-Emb1064/Kpn1-sensGGGGGGTACCAACCACAGGGTGCCCKOP PCR-EmbE5E6.Sst1-antiGGGGGAGCTCGGAAAGACCAGCTGCTG PCR-EmbE5/Kpn1-sensGGGGGGTACCTCTGCAGGCCACCKOT PCR-EmbB25/Sst1-antiGGGGGAGCTCTCTATGCTGCCAGCTCGSite-directed mutagenesis1-aThe introduced mutations are underlined. Only the forward primers are represented; the reverse primers used are the reverse and complement of the representated one. Mut-EmbE5-sensCCCCATCCCAGCAGAATCTGGTCTTTCCTAATTGGKOV, KPB, KPC Mut-EmbE6-sensCACAGGTCCTCAGGAATCTGTCGCCGGGAATACKOW, KPB, KPD Mut-Emb2.MEF2-sensGATGGATCCTGCTCATTTCGAGTTATACCTTTCCCTTGGCKPA, KPH Mut-Emb1-seqA-sens (YY1)CATTCTGCGGTCGGAACATGTTGATGGATCCTGCTCKOZ, KPH Mut-Emb.MEF2rev-sensGATGGATCCTGCTCATTTCTAATTATACCTTTCCCTTGGCKPI, KPJ1-a The introduced mutations are underlined. Only the forward primers are represented; the reverse primers used are the reverse and complement of the representated one. Open table in a new tab The C2C12 cell line proliferates at low density in Dulbecco's modified Eagle's medium plus 20% fetal calf serum. The differentiation into myotubes was induced by replacing the proliferation medium by Dulbecco's modified Eagle's medium + 0.5% fetal calf serum + ITS (insulin, transferrin, and selenium from Roche Molecular Biochemicals at 5 μg/ml each). These cells were transfected using Fugene6 (Roche Molecular Biochemicals). Briefly, at day 1, myoblasts were plated at 200,000 cells/well in a six-well plate. At day 2, 2 h prior to the transfection, the medium was removed and replaced by differentiation medium. Transfection was achieved by incubating 6 μl of FuGene6 in 94 μl of Dulbecco's modified Eagle's medium at room temperature. After 5 min, 2 μg of the luciferase reporter plasmid and 1 μg of CMV-βGal plasmid were then incubated for another 15 min. The transfection mixture (100 μl) was added to the cell drop by drop. Cells were collected at day 5 for luciferase assay using 200 μl of 1× lysis buffer per well according to the luciferase assay system (Promega). We used 10 μl of cell extracts for the determination of the luciferase activity. The β-galactosidase activity was measured using the chemiluminescent reporter gene assay system for the detection of the β-galactosidase (Galacthon-Light, Tropix PE biosystems) with 10 μl of cell extracts. Electroporation of rat tibialis anterior muscles was performed as per Ref. 30Aihara H. Miyazaki J. Nat. Biotechnol. 1998; 16: 867-870Crossref PubMed Scopus (839) Google Scholar. All electroporation and subsequent euthanizing and procurement procedures were performed in accordance with University of Pennsylvania IACUC protocols. Liquid nitrogen frozen muscles were pulverized to powder and solubilized with 1× lysis buffer (luciferase assay system; Promega). After a 5-min incubation and one freeze/thaw cycle, samples were centrifuged, and 20 μl of supernatant were processed for luciferase and β-galactosidase assays. The luciferase results are expressed as the ratio between the luciferase activity and the β-galactosidase activity. Furthermore, for each series of transfections, we transfected the promoterless plasmid pGL3-basic for the quantification of background. The normalized luciferase results obtained with the pGL3-basic plasmid are defined as 1. All of the other activities of the series are then expressed as the x-fold trans-activation above the background. Data sets on levels of marker gene expression were analyzed (JMP; SAS Institute, Inc., Cary, NC) to determine arithmetic means, S.D. values, S.E. values, S.E. values of the means, and the probabilities associated with one-way analysis of variance (ANOVA), Student's t test, and the Tukey-Kramer HSD test (i.e. the probability of incorrectly rejecting the null hypothesis about the difference between means among multiple groups within a data set). The Tukey-Kramer HSD provides a conservative calculation of statistical significance in the analysis of intergroup comparisons that minimizes the risk of type I error by increasing the quantile multiplied into the S.E. values to create the least significant difference. To provide a basis for phylogenetic analysis of putative transcriptional control domains within the skeletal MyHC locus, we periodically screened high throughput DNA sequence databases until a representation of the entire murine locus could be identified in draft form. Three overlapping bacterial artificial chromosomes spanned the entire locus as depicted in Fig.1 A, although the data contained in the relevant accession files was highly fragmented at the time of this analysis. Based on synteny with the human locus, we reconstructed a single file that contained a properly ordered and annotated draft of the murine locus. As expected, coding regions exhibit greater than 90% sequence homology, without gaps. Although the orthologous genes occupy approximately the same relative proportion of the length of the entire locus, the murine locus is only 80% of the size of the human locus. From the standpoint of transcriptional regulation, the regions upstream of the first coding exons of the orthologous and paralogous genes are of greatest interest. We identified transcriptional start sites by comparison with full-length cDNA sequences or by analyzing homologous intergenic regions for TBP binding site and initiator consensus sequences using Promoter Prediction by Neural Network NNPP (Lawrence Berkley National Laboratory). An initial screen of 8 kb upstream of each transcriptional start site revealed a pattern exemplified by the orthologous embryonic and IIb promoters in which there is strong conservation interrupted by blocks of nonconserved sequence, which in most cases precisely coincide with the boundaries of species-specific repeats as annotated previously using the REPEATMASKER algorithm. There are blocks of >100-bp DNA sequence that attain 60% conservation (note peak height in theraised relief image) out to at least 8 kb upstream of both the human embryonic and IIb MyHC transcriptional start sites (Fig. 1, B and C, embryonic MyHC upstream 8 × 8 kb and IIb MyHC upstream 8 × 8 kb). The conserved regions identified as conserved human embryonic MyHC element (ChemE)-1, -2, and -3 are defined below. With the sole exception of the extraocular MyHC promoter pair, the pattern of conservation within 2 kb of the start sites provided striking evidence for selection against mutation by insertion or deletion. This pattern suggests that the order of and/or the physical distance between transcription factor binding sites spread over at least 2 kb is crucial to the proper developmental regulation of MyHC gene activation/repression. To address the possibility of a shared core promoter structure, we extended the pairwise comparison to include paralogous human MyHC promoters (Fig.1 D). Homology among promoter paralogs is uniquely detected within the most proximal 300 bp, as shown for the IIa, IId/x, IIb, and perinatal genes. A limitation of this analysis for the identification of transcriptional regulatory elements is its dependence on user-defined parameters such as window size, DNA scoring matrix, and gap penalty. These biases have been largely eliminated through the recent introduction of a Bayesian statistical method based on a Gibbs sampling algorithm (24Wasserman W.W. Palumbo M. Thompson W. Fickett J.W. Lawrence C.E. Nat. Genet. 2000; 26: 225-228Crossref PubMed Scopus (381) Google Scholar, 26Zhu J. Liu J.S. Lawrence C.E. Bioinformatics. 1998; 14: 25-39Crossref PubMed Scopus (89) Google Scholar). The relationship between level of sequence conservation and distance upstream from the transcriptional start sites is quantitatively displayed in Bayesian "phylogenetic footprint" plots for five of the promoter pairs (Fig. 2). Thez axis in each of the three-dimensional plots indicates the local probability of mouse-human DNA sequence alignment based on a modification of the Bayes block aligner algorithm (24Wasserman W.W. Palumbo M. Thompson W. Fickett J.W. Lawrence C.E. Nat. Genet. 2000; 26: 225-228Crossref PubMed Scopus (381) Google Scholar). The overall pattern of gene conservation as a function of distance from the transcriptional start site is similar for the perinatal, IIa, IId/x, and IIb genes. Relative to these four genes, the embryonic gene has a comparatively larger block of conserved sequence in the interval −600 to −1000 (ChemE-2) and a smaller block of conserved sequence in the interval +1 to −300 (ChemE-1). In addition, 4.1 kb upstream of the human embryonic gene transcriptional start site is a 0.5-kb block of extraordinarily conserved noncoding sequence (ChemE-3), a pattern unique among the six genes at this locus. Finally, sequence conservation among the paralogous genes is typified by the human IIaversus IId/x pairing, with four distinct spikes of homology within 300 bp of the transcriptional start. A position weight matrix-based search for binding sites in a transcription factor data base was filtered to identify candidate protein-DNA interactions in the phylogenetically conserved portion of each sequence pair, as shown schematically above the footprints (Fig.2, lettering above individual plots). Three groups of transcription factors known to participate in the regulation of muscle-specific gene expression are identified as having high concentrations of sites within these conserved domains: MRFs, MEF-2, and SRF. Additional transcription factors identified by this approach include AP-2, AhR, CCAAT enhancer-binding protein, GR, HEB, Oct-1, RREB-1, Sp1, STAT4, TEF-1, and YY1. As previously demonstrated for other muscle-specific promoters, the concentration of predicted sites is higher in the conserved than in the divergent sequence domains (24Wasserman W.W. Palumbo M. Thompson W. Fickett J.W. Lawrence C.E. Nat. Genet. 2000; 26: 225-228Crossref PubMed Scopus (381) Google Scholar). The overall distribution of these sites is not appreciably conserved among the paralogous promoters except in the most proximal 300 bp of sequence. We used ClustalW (27Thompson J.D. Higgins D.G. Gibson T.J. Nucleic Acids Res. 1994; 22: 4673-4680Crossref PubMed Scopus (55767) Google Scholar) to align all of the proximal promoters for each orthologous gene pair (Fig. 3). This alignment draws attention to four core domains and provides a consensus sequence for these and the intervening regions. A computational analysis of each of the individual sequences and the consensus sequence identifies several "high affinity" matches to the Transfac matrices for MEF-2, MyoD, Oct-1, CCAAT enhancer-binding protein, and TBP, the former of which is outlined on the sequence alignment. In this alignment, the embryonic promoter pair is distinguished from the other four pairs in having the highest affinity MEF-2 site (28Fickett J.W. Mol. Cell. Biol. 1996; 16: 437-441Crossref PubMed Scopus (82) Google Scholar) nearest to the transcriptional start site (proximal domain 2 as compared with proximal domain 3 for the others). In fact, this site achieves the highest binding score of any predicted MEF-2 site within 2 kb of an MyHC promoter. Three MEF-2 consensus sequences are identified in the IIa, IId/x, and perinatal promoters. MEF-2 binding is predicted from position weight matrices for the corresponding regions of the other two promoters, but the binding strength falls below the default limits at the most distal site for the embryonic promoter and at the most proximal site for the IIb promoter. Recapitulating the bioinformatic analysis, five o
Referência(s)