Bidirectional Gene Organization
2002; Cell Press; Volume: 109; Issue: 7 Linguagem: Inglês
10.1016/s0092-8674(02)00758-4
ISSN1097-4172
AutoresNoritaka Adachi, Michael R. Lieber,
Tópico(s)RNA and protein synthesis mechanisms
ResumoAfter completion of the human genome sequence, much effort has been devoted to the identification of new genes and their functions. Significant biological information can also be gained from the analysis of genome organization. One of the major features of bacterial genomes is the arrangement of genes in operons. This organization ensures effective gene coregulation and is thought to facilitate replication of heavily transcribed areas owing to the co-orientation of both processes (Blattner et al. 1997Blattner F.R. Plunkett G. Bloch C. Perna N. Burland V. Riley M. Collado-Vides J. Glasner J. Rode C. Mayhew G. et al.The complete genome sequence of Escherichia coli.Science. 1997; 277: 1453-1474Crossref PubMed Scopus (5746) Google Scholar). Another interesting aspect of genome organization is gene density. Both the bacterial and yeast genomes are densely organized. In Saccharomyces cerevisiae, at least 6274 genes are arranged in a 13 Mb genome (Kumar et al. 2002Kumar A. Harrison P.M. Cheung K.H. Lan N. Echols N. Bertone P. Miller P. Gerstein M.B. Snyder M. An integrated approach for finding overlooked genes in yeast.Nat. Biotech. 2002; 20: 58-63Crossref PubMed Scopus (96) Google Scholar, Mewes et al. 1997Mewes H.W. Albermann K. Bahr M. Frishman D. Gleissner A. Hani J. Heumann K. Kleine K. Maierl A. Oliver S.G. et al.Overview of the yeast genome.Nature. 1997; 387: 7-65Crossref PubMed Scopus (5) Google Scholar), and the mean size of one gene per ∼2 kb. In contrast, there are estimated to be about 30,000–40,000 human genes within the 3 Gb genome (Lander et al. 2001Lander E.S. Linton L.M. Birren B. Nusbaum C. Zody M.C. Baldwin J. Devon K. Dewar K. Doyle M. FitzHugh W. et al.Initial sequencing and analysis of the human genome.Nature. 2001; 409: 860-921Crossref PubMed Scopus (16517) Google Scholar, Venter et al. 2001Venter J.C. Adams M.D. Myers E.W. Li P.W. Mural R.J. Sutton G.G. Smith H.O. Yandell M. Evans C.A. Holt R.A. et al.The sequence of the human genome.Science. 2001; 921: 1304-1351Crossref Scopus (9973) Google Scholar), with a calculated average spacing of ∼85 kb. Many of us have assumed that the human genome is dispersed relative to simpler organisms, notwithstanding well-known gene clusters, but in fact, the analysis of such questions is still in its early phases (Lander et al. 2001Lander E.S. Linton L.M. Birren B. Nusbaum C. Zody M.C. Baldwin J. Devon K. Dewar K. Doyle M. FitzHugh W. et al.Initial sequencing and analysis of the human genome.Nature. 2001; 409: 860-921Crossref PubMed Scopus (16517) Google Scholar). Rare exceptions involving extremely close proximity have been reported because they were unexpected. For example, at least twenty loci in which two genes are located closely in a bidirectionally divergent (head-to-head) fashion have been described; these loci include BRCA1/NBR2, DNA-PKcs/MCM4, ATM/NPAT, DHFR/MSH3, G6PD/NEMO, and Ku86(KARP-1)/TERP (Braastad et al. 2002Braastad C.D. Leguia M. Hendrickson E.A. Ku86 autoantigen related protein-1 transcription initiates from a CpG island and is induced by p53 through a nearby response element.Nucleic Acids Res. 2002; 30: 1713-1724Crossref PubMed Scopus (12) Google Scholar, Connelly et al. 1998Connelly M.A. Zhang H. Kieleczawa J. Anderson C.W. The promoters for human DNA-PKcs (PRKDC) and MCM4 divergently transcribed genes located chromosome 8 band q11.Genomics. 1998; 47: 71-83Crossref PubMed Scopus (37) Google Scholar, Galgoczy et al. 2001Galgoczy P. Rosenthal A. Platzer M. Human-mouse comparative sequence analysis of the NEMO gene reveals an alternative promoter within the neighboring G6PD gene.Gene. 2001; 271: 93-98Crossref PubMed Scopus (17) Google Scholar, Platzer et al. 1997Platzer M. Rotman G. Bauer D. Uziel T. Savitsky K. Bar-Shira A. Gilad S. Shiloh Y. Rosenthal A. Ataxia-telangiectasia locus sequence analysis of 184 kb of human genomic DNA containing the entire ATM gene.Genome Res. 1997; 7: 592-605Crossref PubMed Scopus (98) Google Scholar, Shimada et al. 1989Shimada T. Fujii H. Linn H.J. A 165-base pair sequence between the dihydrofolate reductase gene and the divergently transcribed upstream gene is sufficient for bidirectional transcriptional activity.J. Biol. Chem. 1989; 264: 20171-20174Abstract Full Text PDF PubMed Google Scholar, Xu et al. 1997Xu C.F. Chambers J.A. Solomon E.J. Complex regulation of the BRCA1 gene.J. Biol. Chem. 1997; 272: 20994-20997Crossref PubMed Scopus (122) Google Scholar). We have found an additional locus with such a divergent configuration, FEN1/C11orf10 (N.A. and M.R.L., unpublished observation). Since the BRCA1, DNA-PKcs, ATM, MSH3, Ku86, and FEN1 genes are implicated in DNA repair, we used the genome browser of the University of California at Santa Cruz (http://genome.ucsc.edu/) to examine the organization of other human DNA repair genes (Ronen and Glickman 2001Ronen A. Glickman B.W. Human DNA repair genes.Environ. Mol. Mutagen. 2001; 37: 241-283Crossref PubMed Scopus (123) Google Scholar, Wood et al. 2001Wood R.D. Mitchell M. Sgouros J. Lindahl T. Human DNA repair genes.Science. 2001; 291: 1284-1289Crossref PubMed Scopus (1068) Google Scholar). Among 120 genes examined, 50 genes (42%) were arranged in a bidirectionally divergent configuration where both transcription start sites were less than 1 kb apart (Figure 1B and Supplemental Figure S1A available online at http://www.cell.com/cgi/content/full/109/5/807/DC1). Even more surprisingly, 40 of these 50 gene pairs were less than 300 bp apart, and in some cases, the two genes were overlapping. We omitted (as upstream gene candidates) any unspliced ESTs as well as any predicted genes with no corresponding spliced ESTs or homologs in other vertebrate organisms. Therefore, given the presence of untranslated RNA genes or undiscovered ESTs, these values might even be underestimates. (As information about each gene pair develops, some of these upstream gene candidates may change category or be determined not to encode protein.) A common feature of these bidirectional loci is the presence of a CpG island between the genes (Figure 1A). In nearly all cases, a CpG island overlapped partially or entirely the first exon (or occasionally more than one exon) of each gene. The only exception was the PARP2 gene; however, its adjacent gene encodes an RNase P RNA subunit, which is transcribed by RNA polymerase III, and not polymerase II (Ame et al. 2001Ame J.C. Schreiber V. Fraulob V. Dolle P. de Murcia G. Niedergang C.P. A bidirectional promoter connects the poly(ADP-ribose) polymerase 2 (PARP-2) gene to the gene for RNase P RNA. Structure and expression of the mouse PARP-2 gene.J. Biol. Chem. 2001; 276: 11092-11099Crossref PubMed Scopus (43) Google Scholar). Thus, essentially all RNA polymerase II-transcribed genes associated with a bidirectional locus have a CpG island between them, in agreement with the fact that CpG islands are commonly found in housekeeping genes (Gardiner-Garden and Frommer 1987Gardiner-Garden M. Frommer M.J. CpG islands in vertebrate genomes.J. Mol. Biol. 1987; 196: 261-282Crossref PubMed Scopus (2537) Google Scholar). The size of the CpG island, as assessed by the number of CpG dinucleotides, is not a determinant of bidirectionality, however. We also examined non-DNA repair genes with housekeeping functions, including those involved in nuclear processes, such as DNA replication or cell cycle regulation, as well as those involved in metabolic pathways, such as nucleotide, lipid, amino acid, or carbohydrate metabolism (Supplemental Figure S1B and S1C). Among 170 such genes examined, 37 genes (22%) had an upstream divergent transcription unit. Remarkably, 7 out of 14 genes (50%) implicated in the initiation of DNA replication and 7 out of 17 citric acid cycle genes (41%) were arranged in a divergent fashion. In total, we examined 290 genes with housekeeping functions (both repair and non-repair), and 30% (87 genes) corresponded to divergent bidirectional transcription units within 1 kb of each other (Figure 1B). Sixty-six of these 87 genes (76%) had a proximity within 0.3 kb. In order to evaluate large regions of the genome rather than functional classes of genes, we examined all of the genes on human chromosomes 21 and 22. Among 144 known genes located on chromosome 21, 31 genes (22%) were arranged divergently and within 1 kb. Likewise, among 319 known genes on chromosome 22, 56 genes (18%) were arranged divergently and within 1 kb (see Figure 1B legend regarding calculation of the percentage of bidirectional genes). We therefore conclude that closely located genes arranged in a divergent configuration are a common organizational motif of the human genome. Chromosome 21 has one of the lowest gene densities, whereas chromosome 22 is a gene-rich chromosome (Wright et al. 2001Wright, F.A., Lemon, W., Zhao, W., Sears, R., Zhuo, D., Wang, J., Yang, H., Baer, T., Stredney, D., Spitzner, J., et al. (2001). A draft annotation and overview of the human genome. Genome Biol. 2, research0025.Google Scholar). This indicates that the divergent (bidirectional) feature has no correlation with gene density per se. Another analysis that considered only known genes concluded that the average intergenic distance for divergent genes was 73 kb on chromosome 21 and 20 kb on chromosome 22 (Chen et al. 2002Chen C. Gentles A.J. Jurka J. Karlin S. Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22.Proc. Natl. Acad. Sci. USA. 2002; 99: 2930-2935Crossref PubMed Scopus (57) Google Scholar). The differences between the analyses are as follows. First, we included spliced ESTs, mRNAs, and hypothetical genes with conserved vertebrate homologs. This roughly doubles the number of transcription units considered and is consistent with similar estimates made based on the entire human genome (Wright et al. 2001Wright, F.A., Lemon, W., Zhao, W., Sears, R., Zhuo, D., Wang, J., Yang, H., Baer, T., Stredney, D., Spitzner, J., et al. (2001). A draft annotation and overview of the human genome. Genome Biol. 2, research0025.Google Scholar). Second, database updates have extended the 5′UTR for many genes, bringing them closer. Third, the mean intergenic distance would represent the average of the close gene pairs described here and the larger number of more distantly spaced genes. Hence, the numerical mean intergenic distance overlooks the interesting subset of close gene pairs (<1 kb, and, particularly, <0.3 kb) highlighted here. We find that subsets of extremely close (< 0.3 kb) convergent gene pairs and close tandem (two genes in the same direction) gene pairs occurred at 70% and 0% of the frequency of divergent gene pairs, respectively, in a randomly chosen 7 Mb segment of human chromosome 22 (base position 32,500,000 to 39,500,000). In the same region, convergent pairs and tandem gene pairs within 1 kb of each other occurred at a frequency of 82% and 64%, respectively, of the divergent frequency. The divergent configuration is the only one where the two promoters may overlap, potentially utilizing common promoter elements, involving two very closely positioned transcription assembly complexes, and sharing the one or two nucleosomes of chromatin structure that cover the <300 bp shared region. The function and physiologic consequences of such a significant fraction of the genes of the human genome (especially genes involved in DNA repair or replication) being organized as bidirectional gene pairs is currently unclear. One possibility is that it might permit two genes to share one CpG island for purposes of coordinate expression. At least in some bidirectional loci, the expression of two divergent genes is coregulated, and a promoter with bidirectional activity is often observed (Platzer et al. 1997Platzer M. Rotman G. Bauer D. Uziel T. Savitsky K. Bar-Shira A. Gilad S. Shiloh Y. Rosenthal A. Ataxia-telangiectasia locus sequence analysis of 184 kb of human genomic DNA containing the entire ATM gene.Genome Res. 1997; 7: 592-605Crossref PubMed Scopus (98) Google Scholar, Shimada et al. 1989Shimada T. Fujii H. Linn H.J. A 165-base pair sequence between the dihydrofolate reductase gene and the divergently transcribed upstream gene is sufficient for bidirectional transcriptional activity.J. Biol. Chem. 1989; 264: 20171-20174Abstract Full Text PDF PubMed Google Scholar). This function would be equivalent to that of a regulon. Though in many cases the two genes do not have an obvious functional relationship, some of these genes may be coordinately regulated for certain cellular responses. A second possible function of bidirectional loci may relate to the mechanism of CpG island promoters. Little is known about how CpG island promoters function, in contrast to tissue-specific promoters. It is conceivable that a substantial fraction of CpG island promoters function in a manner that lends itself to bidirectional transcription. Whatever the biological significance, in gene knockout studies that utilize mouse ES or human somatic cells, such close proximity must be kept in mind to ensure that only one gene is being disrupted. Specifically, disrupting only the promoter region and/or the first exon may be more challenging than previously thought. Similarly, in human diseases, when mutations occur in promoter regions, we must consider the possibility that two genes are affected in some of these cases rather than only one. Download .pdf (.02 MB) Help with pdf files Supplemental Table S1. Lists of the Genes Examined Download .pdf (.01 MB) Help with pdf files (A). DNA Repair Genes Download .pdf (.01 MB) Help with pdf files (B). Non-DNA-Repair Nuclear Process Genes Download .pdf (.01 MB) Help with pdf files (C). Genes Encoding Metabolic Pathway Proteins Shown are the bidirectional status, the number of CpG dinucleotides (# of CpG), human chromosomal location (Chr.), and the name of the adjacent gene. The bidirectional status is divided into six categories: A, two genes that are overlapping. For designations B, C, D, E, and x, the distance between the two genes is 10 kb, respectively. In categories A, B, C, D, and E, only the divergent loci are included; hence, any loci where two genes are arranged either in tandem (head-to-tail) or convergently (tail-to-tail) are included in category x. Rows of genes belonging to categories A, B, or C are shadowed. As for CpG islands, "No" indicates that no canonical CpG island is found around exon 1 of the gene. When found, the number of CpG dinucleotides in the CpG island is shown. As for an adjacent gene, an accession number is given when it is an unknown gene (mRNA, spliced EST, or a predicted gene; priority in this order). In cases where more than two mRNAs or spliced ESTs have been reported, the accession number of a representative clone is listed. Because of its uncertainty, any unspliced EST is not regarded as a gene. Predicted genes (with the use of an FGENESH program) are listed only when their homolog(s) are found in other vertebrate genomes; hence, other predicted genes are not regarded as bona fide genes. We initially used the University of California at Santa Cruz (http://genome.ucsc.edu/; version 8) web site for this analysis. The most recent version (version 12) has differences in some genes, illustrating that some small percentage of the genes counted or not counted as close to one another will change status as more information about each locus is gathered. Hence, any one gene pair may or may not remain in the designated distance category, as refinement of the human genome continues.
Referência(s)