Twenty Pairs of Sox
2002; Elsevier BV; Volume: 3; Issue: 2 Linguagem: Inglês
10.1016/s1534-5807(02)00223-x
ISSN1878-1551
AutoresGoslik Schepers, Rohan D. Teasdale, Peter Koopman,
Tópico(s)Genomics and Chromatin Dynamics
ResumoThe genomics era is characterized by the rapid identification of genes, gene fragments, and gene paralogs within species, and orthologs between species. The highly conserved HMG box that defines the Sox family of developmental transcription factor genes (Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar) has been exploited in many laboratories to identify approximately 30 vertebrate and over a dozen invertebrate Sox genes or gene fragments. However, the actual number of Sox genes in the mouse and human genomes has remained unknown. With the availability of complete drafts of these genome sequences, we can now determine the precise number of Sox genes, assign names, and identify orthologs. This in turn provides a basis for similar efforts in other model organisms as sequence data become available. In this analysis, we examined all published Sox sequences, and recent releases of the human and mouse genome sequence from the relevant public sequencing consortia (Mouse Genome Assembly v3, 2 May 2002, http://www.ensembl.org and Human Genome Assembly build 29, 5 April 2002, http://www.ncbi.nlm.nih.gov/genome/seq) and from Celera Genomics (Celera Discovery System, indexed 2 May 2002, http://www.celera.com) (Lander et al. 2001Lander E.S. Linton L.M. Birren B. Nusbaum C. Zody M.C. Baldwin J. Devon K. Dewar K. Doyle M. FitzHugh W. et al.Initial sequencing and analysis of the human genome.Nature. 2001; 409: 860-921Crossref PubMed Scopus (16517) Google Scholar, Venter et al. 2001Venter J.C. Adams M.D. Myers E.W. Li P.W. Mural R.J. Sutton G.G. Smith H.O. Yandell M. Evans C.A. Holt R.A. et al.The sequence of the human genome.Science. 2001; 291: 1304-1351Crossref PubMed Scopus (9973) Google Scholar). SOX proteins other than SRY were defined by the presence of the HMG domain signature sequence RPMNAFMVW (Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar). Orthologous Sox genes were identified by sequence similarity and chromosomal location within regions of conserved synteny, determined by comparison of gene order. The mouse and human genomes were found to contain 20 orthologous pairs of Sox genes (Table 1). The paired Sox genes show identical genomic organization, with the exception of Sox6 and Sox13, which varied between mouse and human by the loss or gain of an intron in the untranslated region. No novel Sox genes were identified.Table 1Pairing of Mouse and Human Sox GenesGeneSox GroupaSox groupings as determined by >Bowles et al., 2000.Major Known (or Deduced) FunctionsbFunctions demonstrated by human mutant or mouse knockout phenotype; other possible functions (in parentheses) deduced from expression, cell transfection, or other studies. See Bowles et al., 2000; Wegner, 1999, and references therein; Cohen-Barak et al., 2001; Hosking et al., 2001; Katoh, 2002; and Takash et al., 2001. Also, see Uwanogho, 2001 (GenBank accession number AY069926).SpeciesAccession NumberNumber of ExonsChromosomal LocationSryATestis determinationMouseNM_00115641Y (3cM)HumanNM_003140Yp11.3Sox1B1Lens development, (neuralMouseNM_00923318 (4cM)determination)HumanNM_00598613q34Sox2B1Neural induction, (lens induction,MouseNM_01144313 (15cM)pluripotency)HumanBC0139233q26.3Sox3B1(Neural determination,MouseNM_0092371X (24.3cM)lens induction)HumanNM_005634Xq27Sox4CHeart, lymphocyte, thymocyteMouseNM_009238113 (20cM)developmentHumanNM_0031076q22.3Sox5DChondrogenesisMouseNM_01144415cGene subject to alternative splicing; value given indicates total number of utilized exons.6 (69.5cM)HumanNM_00694012p11.1Sox6DChondrogenesis,MouseNM_011445177 (55cM)(cardiac myogenesis)HumanNM_0333261611p15.3Sox7F(Development of vascular and manyMouseNM_011446214 (28cM)dChromosomal location determined by comparison with the closest mapped gene.other tissues)HumanNM_0314398p22Sox8E(Development of many tissues)MouseAF191325317 (8cM)HumanNM_01458716p13.3Sox9EChondrogenesis, sex determinationMouseBC024958311 (69.5cM)HumanNM_00034617q25Sox10ENeural crest specificationMouseAF047043315 (46.5cM)HumanNM_00694122q13Sox11C(Neuronal, glial maturation)MouseNM_009234112 (11cM)dChromosomal location determined by comparison with the closest mapped gene.HumanNM_0031082p25Sox12eHuman ortholog previously named SOX22 (see text).C(Development of many tissues)MouseBF714412fPartially characterized gene that may contain additional exons.12 (86cM)dChromosomal location determined by comparison with the closest mapped gene.HumanNM_00694320p13Sox13D(Development of arterial walls,MouseAB006329131 (70cM)dChromosomal location determined by comparison with the closest mapped gene.pancreatic islets)HumanNM_005686141q31Sox14B2(Interneuron specification, limb development)MouseAF19343719 (53cM)HumanNM_0041893q22Sox15gHuman ortholog previously named SOX20 (see text).G(Myogenesis)MouseAB014474211 (39cM)HumanNM_00694217p13Sox17FEndoderm specificationMouseNM_01144131 (7cM)dChromosomal location determined by comparison with the closest mapped gene.HumanNM_0224548q11.2Sox18FVascular and hair follicleMouseNM_00923622 (96cM)dChromosomal location determined by comparison with the closest mapped gene.developmentHumanNM_01841920p13.3Sox21B2(CNS patterning)MouseBE647677fPartially characterized gene that may contain additional exons.114 (50cM)dChromosomal location determined by comparison with the closest mapped gene.HumanNM_00708413q32Sox30H(Male germ cell maturation)MouseAV2553265cGene subject to alternative splicing; value given indicates total number of utilized exons.11 (20cM)dChromosomal location determined by comparison with the closest mapped gene.HumanNM_0070175q35a Sox groupings as determined by >Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar.b Functions demonstrated by human mutant or mouse knockout phenotype; other possible functions (in parentheses) deduced from expression, cell transfection, or other studies. See Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar, Wegner 1999Wegner M. From head to toes the multiple facets of SOX proteins.Nucleic Acids Res. 1999; 27: 1409-1420Crossref PubMed Scopus (729) Google Scholar, and references therein; Cohen-Barak et al. 2001Cohen-Barak G. Hagiwara N. Arlt M. Horton J. Brilliant M. Cloning, characterization and chromosome mapping of the human SOX6 gene.Gene. 2001; 265: 157-164Crossref PubMed Scopus (38) Google Scholar, Hosking et al. 2001Hosking B. Wyeth J. Pennisi D. Wang S. Koopman P. Muscat G. Cloning and functional analysis of the Sry-related HMG box gene, Sox18.Gene. 2001; 262: 239-247Crossref PubMed Scopus (38) Google Scholar, Katoh 2002Katoh M. Molecular cloning and characterization of human SOX17.Int. J. Mol. Med. 2002; 9: 153-157PubMed Google Scholar; and Takash et al. 2001Takash W. Canizares J. Bonneaud N. Poulat F. Mattei M.G. Jay P. Berta P. SOX7 transcription factor sequence, chromosomal localisation, expression, transactivation and interference with Wnt signalling.Nucleic Acids Res. 2001; 29: 4274-4283Crossref PubMed Scopus (136) Google Scholar. Also, see Uwanogho, 2001 (GenBank accession number AY069926 ).c Gene subject to alternative splicing; value given indicates total number of utilized exons.d Chromosomal location determined by comparison with the closest mapped gene.e Human ortholog previously named SOX22 (see text).f Partially characterized gene that may contain additional exons.g Human ortholog previously named SOX20 (see text). Open table in a new tab We and others have previously noted that in Drosophila melanogaster and Caenorhabditis elegans, the number of Sox genes is relatively small (five and eight, respectively; Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar, Crémazy et al. 2001Crémazy F. Berta P. Girard F. Genome-wide analysis of Sox genes in Drosophila melanogaster.Mech. Dev. 2001; 109: 371-375Crossref PubMed Scopus (62) Google Scholar) and that a single gene in these organisms typically corresponds to a group or subgroup of Sox genes in vertebrates. Further, it is conspicuous that 9 of the 20 human/mouse Sox genes are single exons and that these are distributed evenly throughout the genome in both species. These properties likely reflect expansion of this ancient gene family via nontandem duplication and retroposition (Ohno 1970Ohno S. Evolution by Gene Duplication. Springer-Verlag, Berlin1970Crossref Google Scholar). We have found evidence of tandem duplication in only two cases, where fragments similar to parts of human SOX17 and -30 lie adjacent to these genes (see below). Several human and mouse genes predicted in previous studies from partial PCR amplification of the HMG box are absent from the genome. These sequences probably represent amplification or sequencing errors and most likely correspond to some of the 20 bona fide Sox genes (Table 2). One of these fragments, originally designated human SOX29, shows significant sequence similarity to SOX5, but has a 2 base pair deletion in the HMG box, suggesting that it may correspond to a SOX5 pseudogene (Wunderle et al. 1996Wunderle V.M. Critcher R. Ashworth A. Goodfellow P.N. Cloning and characterization of S0X5, a new member of the human SOX gene family.Genomics. 1996; 36: 354-358Crossref PubMed Scopus (65) Google Scholar, Crémazy et al. 1998Crémazy F. Soullier S. Berta P. Jay P. Further complexity of the human SOX gene family revealed by the combined use of highly degenerate primers and nested PCR.FEBS Lett. 1998; 438: 311-314Abstract Full Text Full Text PDF PubMed Scopus (18) Google Scholar). We find that this gene lacks ESTs and introns, confirming that it is a pseudogene, which we name SOX5P (Table 2). Other fragments corresponding to parts of SOX2, -17, -20, and -30 can be found in the human genome, but these are short, lack an HMG box, and contain gaps, insertions, or in-frame stop codons, indicating that they are not segments of functional Sox genes (Table 2). No pseudogenes or pseudogene fragments were found in the mouse genome.Table 2Illegitimate Mouse and Human Sox Gene Fragments and PseudogenesRecorded FragmentAccession NumberSpeciesLikely IdentityNotesPCR-Derived HMG Box Sequences Submitted to GenBankaWunderle et al., 1996; Crémazy et al., 1998. See also Layfield et al., 1994 (GenBank accession number L29084).Sox16L29084MouseSox15SOX25AF032449HumanSOX21SOX26AF032450HumanSOX20SOX27AF032452HumanSOX20SOX28AF032452HumanSOX14SOX29AF032454HumanSOX5PSOX-Related Genomic FragmentsbGenomic fragments analyzed using NCBI LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/index.html).SOX29NT_008046 ( LOC138007 )HumanSOX5PPseudogene at 8q21.1NovelNT_023726 ( LOC206736 )HumanSOX2-related474 bp non-HMG-box fragment at 8q24.13, with in-frame stop codons and gapsNovelNT_008101 ( LOC137755 )HumanSOX17-related234 bp non-HMG-box fragment adjacent to SOX17 (8q11.22), with gapsNovelNT_033899 ( LOC220283 )HumanSOX20-related210 bp non-HMG-box fragment at 11q24.2, with gapsNovelNT_006788 ( LOC206350 )HumanSOX30-related668 bp non-HMG-box fragment adjacent to SOX30 (5q34), with in-frame stop codons and insertionsa Wunderle et al. 1996Wunderle V.M. Critcher R. Ashworth A. Goodfellow P.N. Cloning and characterization of S0X5, a new member of the human SOX gene family.Genomics. 1996; 36: 354-358Crossref PubMed Scopus (65) Google Scholar, Crémazy et al. 1998Crémazy F. Soullier S. Berta P. Jay P. Further complexity of the human SOX gene family revealed by the combined use of highly degenerate primers and nested PCR.FEBS Lett. 1998; 438: 311-314Abstract Full Text Full Text PDF PubMed Scopus (18) Google Scholar. See also Layfield et al., 1994 (GenBank accession number L29084 ).b Genomic fragments analyzed using NCBI LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/index.html). Open table in a new tab Sequence similarity between mouse Sox12 and human SOX22 has been reported previously (Jay et al. 1997Jay P. Sahly I. Goze C. Taviaux S. Poulat F. Couly G. Abitbol M. Berta P. SOX22 is a new member of the SOX gene family, mainly expressed in human nervous tissue.Hum. Mol. Genet. 1997; 6: 1069-1077Crossref PubMed Scopus (48) Google Scholar, Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar). The availability of the complete coding sequence reveals extensive non-HMG box sequence homology between these two genes. This homology, and the chromosomal location of both genes within regions of conserved synteny, confirm that Sox12 and SOX22 are orthologs. Similar observations indicate that SOX20 and Sox15 (Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar, Hiraoka et al. 1998Hiraoka Y. Ogawa M. Sakai Y. Taniguchi K. Fujii T. Umezawa A. Hata J. Aiso S. Isolation and expression of a human SRY-related cDNA, hSOX20.Biochim. Biophys. Acta. 1998; 1396: 132-137Crossref PubMed Scopus (15) Google Scholar) also are orthologs. We therefore rename human SOX22 as SOX12 and human SOX20 as SOX15 (Table 1). Our analysis suggests that no further nomenclature changes or additions will be required for the mouse and human Sox family. The current system of nomenclature, loosely based on the order of gene discovery, is firmly entrenched in the literature, and the likely confusion and noncompliance associated with a more systematic nomenclature revision in our view outweigh the potential benefits. Our recommendations have been endorsed by the HUGO Gene Nomenclature Committee (http://www.gene.ucl.ac.uk/nomenclature). Genomic idiosyncracies—notably pseudotetraploidy in Xenopus laevis and genome duplication in teleost fish—have hampered clear identification of Sox orthologs in some model organisms. Contentiously assigned full-length Sox genes isolated in such organisms are listed in Table 3, together with their closest mouse/human Sox homologs. Definitive nomenclature assignments are impossible in any species for which whole genome sequence has not been determined. We suggest that novel Sox genes identified in vertebrates be provisionally assigned the lowest available Sox number (currently 33), unless or until they can be confirmed as orthologs of existing mammalian genes.Table 3Contentiously Assigned Vertebrate Sox GenesSpeciesaFrog species, Xenopus laevis; trout species, Oncorhynchus mykiss.Published Gene NamebSee GenBank entries and Stevens et al., 1996; Sakai et al., 1997; Bowles et al., 2000; Kikuchi et al., 2001; Sakaguchi et al., 2001; Hosking et al., 2001.Accession NumberSOX GroupClosest Mammalian HomologcDetermined by BLAST and CLUSTALW analysis as being the closest mouse/human homolog.HumanHAF-1(deleted)FSOX17HumanHAF-2(deleted)FSOX18MouseSoxM/Sox21U66141ESox10FrogSoxDBAA32249IdXenopus Sox31 does not correspond to any of the 20 mouse/human Sox genes and is in a group (I: Bowles et al., 2000) that is not represented in these species.Sox31dXenopus Sox31 does not correspond to any of the 20 mouse/human Sox genes and is in a group (I: Bowles et al., 2000) that is not represented in these species.FrogSoxB1(deleted)B1Sox3FrogSox12BAA09119DSox13ZebrafishSox19X79821B1Sox3ZebrafishSox31AJ404687B1Sox3ZebrafishSox25/Sox30AF101266B2Sox21ZebrafishSox32/226D7NM_131851/ AB071895(non-Sox)CasanovaTroutSoxLZD61688DSox6TroutSoxP1D83256ESox8TroutSox23BAA24402DSox13TroutSox24BAA24575CSox11a Frog species, Xenopus laevis; trout species, Oncorhynchus mykiss.b See GenBank entries and Stevens et al. 1996Stevens S. Ordentlich P. Sen R. Kadesch T. HMG box-activating factors 1 and 2, two HMG box transcription factors that bind the human Ig heavy chain enhancer.J. Immunol. 1996; 157: 3491-3498PubMed Google Scholar, Sakai et al. 1997Sakai Y. Hiraoka Y. Konishi M. Ogawa M. Aiso S. Isolation and characterization of Xenopus laevis Xsox-b1 cDNA.Arch. Biochem. Biophys. 1997; 346: 1-6Crossref PubMed Scopus (14) Google Scholar, Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar, Kikuchi et al. 2001Kikuchi Y. Agathon A. Alexander J. Thisse C. Waldron S. Yelon D. Thisse B. Stainier D.Y. casanova encodes a novel Sox-related protein necessary and sufficient for early endoderm formation in zebrafish.Genes Dev. 2001; 15: 1493-1505Crossref PubMed Scopus (230) Google Scholar, Sakaguchi et al. 2001Sakaguchi T. Kuroiwa A. Takeda H. A novel sox gene, 226D7, acts downstream of Nodal signaling to specify endoderm precursors in zebrafish.Mech. Dev. 2001; 107: 25-38Crossref PubMed Scopus (63) Google Scholar, Hosking et al. 2001Hosking B. Wyeth J. Pennisi D. Wang S. Koopman P. Muscat G. Cloning and functional analysis of the Sry-related HMG box gene, Sox18.Gene. 2001; 262: 239-247Crossref PubMed Scopus (38) Google Scholar.c Determined by BLAST and CLUSTALW analysis as being the closest mouse/human homolog.d Xenopus Sox31 does not correspond to any of the 20 mouse/human Sox genes and is in a group (I: Bowles et al. 2000Bowles J. Schepers G. Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators.Dev. Biol. 2000; 227: 239-255Crossref PubMed Scopus (690) Google Scholar) that is not represented in these species. Open table in a new tab In summary, our genomic analysis defines the extent of the Sox family of transcription factor genes in humans and mice, confirms gene homologies based on sequence, genomic organization, and chromosomal locations, and streamlines the nomenclature for vertebrate Sox genes. We hope that this will provide a useful framework for comparative and functional studies in a range of developmental model systems. We thank Dr. Elspeth Bruford, HUGO Nomenclature Committee, for comments on the manuscript and helpful discussions. We apologize to colleagues whose work was not cited directly due to space constraints. P.K. is an Australian Research Council Professorial Research Fellow.
Referência(s)