A Human-specific Polymorphism in the Coding Region of the Aggrecan Gene
1997; Elsevier BV; Volume: 272; Issue: 21 Linguagem: Inglês
10.1074/jbc.272.21.13974
ISSN1083-351X
AutoresKurt Doege, Silvija N. Coulter, Lauren M. Meek, Kirstin Maslen, Jill G. Wood,
Tópico(s)Genetics and Neurodevelopmental Disorders
ResumoAggrecan, one of the major structural genes of cartilage, encodes a proteoglycan core protein composed of an extended central glycosaminoglycan-bearing domain, flanked by globular domains at each end. The central region consists of long stretches of repeating amino acids that serve as attachment sites for glycosaminoglycans such as chondroitin and keratan sulfate; the terminal globular domains interact with other cartilage components. The glycosaminoglycan attachment region is encoded in several species by a single large exon, within which are several different types of repeating sequences. Several species show within this exon a similar block of conserved repeats for attachment of chondroitin sulfate, but in humans this group of repeats is particularly well conserved. Examination of genomic DNA from a population of unrelated individuals by polymerase chain reaction or Southern blot assays shows this block of repeat sequences exists in multiple allelic forms, which differ by the number of repeats at this site in each allele. Thirteen different alleles have been identified, with repeat numbers ranging from 13 to 33. This is an unusual example of an expressed variable number of tandem repeat polymorphism. This polymorphism is apparently restricted to humans, of several species examined. This polymorphism results in individuals with differing length aggrecan core proteins, bearing different numbers of potential attachment sites for chondroitin sulfate. The possibility exists for a molecular understanding of biological variation in cartilage functional properties. Aggrecan, one of the major structural genes of cartilage, encodes a proteoglycan core protein composed of an extended central glycosaminoglycan-bearing domain, flanked by globular domains at each end. The central region consists of long stretches of repeating amino acids that serve as attachment sites for glycosaminoglycans such as chondroitin and keratan sulfate; the terminal globular domains interact with other cartilage components. The glycosaminoglycan attachment region is encoded in several species by a single large exon, within which are several different types of repeating sequences. Several species show within this exon a similar block of conserved repeats for attachment of chondroitin sulfate, but in humans this group of repeats is particularly well conserved. Examination of genomic DNA from a population of unrelated individuals by polymerase chain reaction or Southern blot assays shows this block of repeat sequences exists in multiple allelic forms, which differ by the number of repeats at this site in each allele. Thirteen different alleles have been identified, with repeat numbers ranging from 13 to 33. This is an unusual example of an expressed variable number of tandem repeat polymorphism. This polymorphism is apparently restricted to humans, of several species examined. This polymorphism results in individuals with differing length aggrecan core proteins, bearing different numbers of potential attachment sites for chondroitin sulfate. The possibility exists for a molecular understanding of biological variation in cartilage functional properties. Aggrecan is the major proteoglycan of cartilage (1Wight T.N. Heinegard D.K. Hascall V.C. Hay E.D. Cell Biology of the Extracellular Matrix. Plenum Publishing Corp., New York1991: 45-78Google Scholar) and is expressed at high levels only in this tissue; low levels of aggrecan expression have been reported in notochord (2Stirpe N.S. Goetinck P.F. Development. 1989; 107: 22-33Google Scholar, 3Domowicz M. Li H. Hennig A. Henry J. Vertel B.M. Schwartz N.B. Dev. Biol. 1995; 171: 655-664Google Scholar) and calvaria (4Wong M. Lawton T. Goetinck P.F. Kuhn J.L. Goldstein S.A. Bonadio J. J. Biol. Chem. 1992; 267: 5592-5598Google Scholar). Aggrecan is composed of two types of structural elements, an extended central core and three flanking globular domains. Two of the globular domains are at the amino terminus, share homology with link protein (5Doege K. Sasaki M. Horigan E. Hassell J.R. Yamada Y. J. Biol. Chem. 1987; 262: 17757-17767Google Scholar), and one, at least (G1), binds strongly to hyaluronan and link protein, effectively localizing the molecule in the cartilage matrix. The third globular domain, G3, shares intriguing structural features with the selectin family of cell adhesion molecules (6Tedder T.F. Steeber D.A. Chen A. Engel P. FASEB J. 1995; 9: 866-873Google Scholar) and may also provide important interactions with other cell or matrix components. The bulk of the molecule is made up of over 100 chondroitin sulfate chains attached to specific serine residues in the central portion of the core protein, the chondroitin-sulfate attachment region, or CS 1The abbreviations used are: CS, chondroitin sulfate; GAG, glycosaminoglycan; VNTR, variable number of tandem repeats; CEPH, Center Etude Polymorphisme Humaine. 1The abbreviations used are: CS, chondroitin sulfate; GAG, glycosaminoglycan; VNTR, variable number of tandem repeats; CEPH, Center Etude Polymorphisme Humaine. domain. These specific serine residues occur adjacent to glycine residues, and there has been considerable discussion as to what other features define a consensus attachment signal for chondroitin sulfate (7Ruoslahti E. Annu. Rev. Cell Biol. 1988; 4: 229-255Google Scholar); these features include a nearby acidic amino acid as well as a nonpolar residue. It has not been determined whether all the suitable serine-glycine pairs are equally likely to be substituted with chondroitin sulfate. It is the tight packing of these highly hydrophilic, GAG-substituted molecules within the constraining collagen fibrillar network that provides the unique swelling pressure and resistance to compressive forces essential to cartilage function in articular joint surfaces. Analysis of the cDNA sequences for rat (5Doege K. Sasaki M. Horigan E. Hassell J.R. Yamada Y. J. Biol. Chem. 1987; 262: 17757-17767Google Scholar) and human aggrecan (8Doege K.J. Sasaki M. Kimura T. Yamada Y. J. Biol. Chem. 1991; 266: 894-902Google Scholar) produced the observation that the serine-glycine pairs occurred in two distinct patterns of repeating sequences, designated the CS-1 and CS-2 repeat regions, and that the human cDNA contained a subregion of the CS-1 repeat that was remarkably conserved in 19 repeats of 19 amino acids each. The degree of conservation was that of identity for most of the repeats at the nucleic acid sequence level. Several restriction fragment length polymorphisms in the human aggrecan gene were identified for use in genetic linkage studies (9Finkelstein J. Doege K. Yamada Y. Pyeritz R. Graham J. Moeschler J. Pauli R. Hecht J. Francomano C. Am. J. Hum. Genet. 1991; 48: 97-102Google Scholar), and one, aHaeIII restriction fragment length polymorphism, was particularly polymorphic. The cDNA probe used in that case was pSA003, which encodes the CS domain including the highly repeated subregion. As the CS domain of aggrecan is encoded by a single large exon in the human gene (10Doege K. Sasaki M. Yamada Y. Biochem. Soc. Trans. 1990; 18: 200-202Google Scholar, 11Valhmu W.B. Palmer G.D. Rivers P.A. Ebara S. Cheng J.-F. Fischer S. Ratcliffe A. Biochem. J. 1995; 309: 535-542Google Scholar), and the highly repeated region within the CS domain is flanked by HaeIII sites, the possibility was suggested that this repeat region might constitute a rare VNTR within the structural coding region of a gene. The work reported here is a result of a study confirming this hypothesis, prompting the examination of the functional and biological consequences of this variation in a major component of cartilage. DNA samples were obtained for the 40 large family founder couples from the CEPH (Center Etude Polymorphisme Humaine) (12Dausset J. Cann H. Cohen D. Lathrop M. Lalouel J.-M. White R. Genomics. 1990; 6: 575-577Google Scholar) collection, as well as for three of the extended families including grandparents and offspring. Where available, lymphoblast cell stocks for these individuals were obtained from the NIGMS mutant cell repository (maintained by the Coriell Institute for Medical Research, Camden, NJ) and expanded in culture as directed by the supplier. Cells were grown in RPMI 1640 media containing 15% fetal bovine serum (Sigma), 1% l-glutamine, and 1% penicillin/streptomycin. DNA was prepared from the cultured cells using the Puregene Genomic DNA Isolation Kit (Gentra Systems, Inc.). DNA (non-CEPH) was also prepared from whole tissues, either surgical discards including human placentas, other cultured cell lines, as well as bovine liver, also by the Puregene Kit method. DNA for three extended CEPH families was provided by Dr. Michael Litt, Oregon Health Sciences University; several other DNA samples were kindly given by Mirta Machado, Shriners Hospital for Children, Portland. The primers used in the human genomic PCR assay were located at positions 2888–2911 of the human aggrecan cDNA sequence (8Doege K.J. Sasaki M. Kimura T. Yamada Y. J. Biol. Chem. 1991; 266: 894-902Google Scholar) for the upstream sense primer (5′-TAGAGGGCTCTGCCTCTGGAGTTG-3′), and for the downstream, antisense primer positions 4115–4092 (5′-AGGTCCCCTACCGCAGAGGTAGAA-3′). The bovine genomic PCR reaction used primers from the two published cDNA fragments (13Oldberg A. Antonsson P. Heinegard D. Biochem. J. 1987; 243: 255-259Google Scholar, 14Antonsson P. Heinegard D. Oldberg A. J. Biol. Chem. 1989; 264: 16170-16173Google Scholar); the upstream primer was located at position 1304–1324 of the 5′ bovine cDNA (14Antonsson P. Heinegard D. Oldberg A. J. Biol. Chem. 1989; 264: 16170-16173Google Scholar) (5′-AGGTTGCCCTCTGGGGGTGAG-3′), and the downstream, antisense primer at position 57–37 of the 3′ bovine cDNA fragment (13Oldberg A. Antonsson P. Heinegard D. Biochem. J. 1987; 243: 255-259Google Scholar) (5′-GGGTGCTTCTCCGCTCAGGTC-3′). The 50-μl PCR reaction typically included 50 pmol of each of the sense and antisense primers, 100 ng of genomic DNA, 0.2 mm dNTPs, 0.1% Triton X-100, 2.5 units ofTaq DNA polymerase, 10 mm Tris-HCl, pH 8.3, 1.5 mm MgCl2, and 50 mm potassium chloride. The annealing temperature for both the human and bovine reactions was 67 °C, and 29 cycles of amplification were carried out. PCR products were subcloned by ligation into the T-tailed vector pCR2.1 (Invitrogen). 2 μg of genomic DNA was cut withHaeIII and fractionated by agarose gel electrophoresis. The DNA in the gel was denatured, neutralized, and transferred to derivatized nylon membrane (ZetaProbe GT, Bio-Rad) by capillary action using standard methods (15Sambrook J. Fritsch E.F. Maniatis T. Molecular Cloning: A Laboratory Manual. 2nd Ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY1989: 9.31-9.58Google Scholar). The membrane was baked for 2 h at 80 °C in a vacuum oven, pre-washed at 65 °C for 1 h in 0.1 × SSC (SSC, 0.15 m sodium chloride, 0.015m sodium citrate, pH 7.0), 0.5% SDS, then prehybridized and hybridized in Church buffer (0.5 m sodium phosphate, pH 7.2, 7% SDS, 1 mm EDTA, 1% bovine serum albumin) at 65 °C for 16 h. The probe used was a HaeIII fragment of the cDNA clone pSA003 (8Doege K.J. Sasaki M. Kimura T. Yamada Y. J. Biol. Chem. 1991; 266: 894-902Google Scholar) encoding the human VNTR region. The blots were washed at 65 °C in 0.1 × SSC, 1% SDS, following a prewash at room temperature in 2 × SSC, 5% SDS. DNA size markers were co-electrophoresed and transferred and probed in the same hybridization with 32P-labeled marker DNA. DNA probes were labeled using random primer extension with Klenow and [α-32P]dCTP according to the kit manufacturer (Amersham Corp.). The two smallest alleles, 13 and 18, were sequenced after subcloning PCR-generated DNA into a T-tailed vector, using M13 sequencing primers flanking the cloning site. Cycle sequencing using Taq polymerase (ABI kit) and an ABI 383 automated DNA sequencer were used. Allele 18 was too large to sequence completely across the clone in this manner, but 15 of the 18 repeats were sequenced. The bovine VNTR product, approximately 1.6 kilobase pairs, required a nested deletion strategy, as the sequence was too repetitive for internal priming and too long to sequence across from the ends. Nested deletions in pUC18 were constructed using exonuclease III and mung bean nuclease (Stratagene kit) and sequenced as above. A PCR assay was designed to examine the size of the highly repeated region in various individuals. This strategy made use of the fact that the repeat region was part of a large exon, so that total genomic DNA could be used for analysis. Primers were chosen at positions just outside the highly repeated domain, so that the size of the product was predicted to be 33 nucleotides greater than the repeat region, which in turn was determined by the number of repeats of the 57 nucleotides encoding each 19 amino acid unit. DNA samples were obtained from a number of sources, including common cell lines and surgical discards, but the major resource used for this study was the CEPH collection's 40 founder couples. These are the founders of large, well studied families for which lymphoblast cell cultures, as well as purified DNA in some cases, are available from the NIGMS cell repository. Upon screening DNA of 72 of these unrelated individuals, as well as 22 additional samples, a range of PCR product sizes was observed. Fig. 1 displays the first 10 of these alleles that were identified and that includes the extremes of the range; three additional alleles (not shown) have been observed making a total of 13 alleles to date. Several lines of evidence support the conclusion that these PCR products arise from a polymorphic repeat region in the CS domain of aggrecan. First, the bands differ in size by a base pair count that is an integral number of the base pairs in a repeat unit, or 57 bases. In this respect, it was noted that the band derived from pSA003, the allele that was published as a cDNA sequence (8Doege K.J. Sasaki M. Kimura T. Yamada Y. J. Biol. Chem. 1991; 266: 894-902Google Scholar), was exactly one repeat larger than predicted. Apparently the large number of identical repeats in this sequence allowed the alignment to "slip" by one repeat during the computerized assembly of the random, sonicated fragment sequences in the original sequencing. This is a difficult problem to circumvent if a repeat region is long and the repeats are very similar. Second, the same pattern, adjusted for restriction siteversus primer location, was obtained by genomic Southern analysis. Total genomic DNA was digested with HaeIII, electrophoretically fractionated, blotted, and probed with the CS-repeat cDNA (Fig. 1 B). This correspondence in pattern rules out a PCR artifact in the generation of the bands. The PCR product bands can be digested appropriately with restriction enzymes (data not shown). Furthermore, upon extending the assay to several of the families of the CEPH founders (DNA kindly provided by Dr. Michael Litt, OHSU) the alleles followed a strict Mendelian assortment in all cases, in up to three generations (data not shown). Finally, as discussed below, two of the alleles have been cloned and sequenced and conclusively identified as variants of the published repeat sequence. The description and frequency of these alleles in the studied population are presented in Table I. The numbers of repeats in these 13 alleles range from 33 to 13; the allele cloned as pSA003 corresponded to 22 repeats in this assay, which includes the published 19 repeats (8Doege K.J. Sasaki M. Kimura T. Yamada Y. J. Biol. Chem. 1991; 266: 894-902Google Scholar), the missing "slipped" repeat, and two additional repeats resulting from a change in the defined start of a repeat (see below), making it one of the smaller and also less common forms of the gene. The most common alleles in the general population are those of 28, 27, or 26 repeats that make up 87% observed alleles and represent a normal distribution with a mean and mode of 27 repeats. The CEPH founders used in this study consist of 49 Utah, 17 French, 4 Venezuelan, and 2 Amish individuals; the allele distribution in the two largest groups, the Utah and French families, showed the same mean allele size, 27, but somewhat different rankings of the different alleles. Allele 26 showed the most divergence, as it was 16% in the Utah families and 35% in the French families, although this sample group is quite small. No data are yet available from other ethnic groups.Table IDistribution of human aggrecan VNTR alleles in a random populationObserved allele repeat numberPCR product sizeFrequency%bpn = 18833191510.532185831.629168742.12816304523.92715738444.72615163418.125145942.122128873.721123110.520117410.519111710.518106021.11377510.5 Open table in a new tab The generality of the VNTR phenomenon among species was examined by attempting to demonstrate its occurrence in the bovine system. The rat (5Doege K. Sasaki M. Horigan E. Hassell J.R. Yamada Y. J. Biol. Chem. 1987; 262: 17757-17767Google Scholar), mouse (16Walcz E. Deak F. Erhardt P. Coulter S.N. Fulpo C. Horvath P. Doege K.J. Glant T.T. Genomics. 1994; 22: 364-371Google Scholar), and chicken (17Li H. Schwartz N.B. Vertel B.M. J. Biol. Chem. 1993; 268: 23504-23511Google Scholar) aggrecan cDNA sequences have been reported, and although a recognizable variant of the human repeat region can be seen in all cases, the sequences are poorly conserved as a group for each species relative to the level of the human repeats. Partial bovine cDNA sequences have been published (13Oldberg A. Antonsson P. Heinegard D. Biochem. J. 1987; 243: 255-259Google Scholar, 14Antonsson P. Heinegard D. Oldberg A. J. Biol. Chem. 1989; 264: 16170-16173Google Scholar), but these non-overlapping clones do not include the CS domain sequence where the repeat region would be expected. The human and bovine sequences both share related KS hexapeptide repeats however (14Antonsson P. Heinegard D. Oldberg A. J. Biol. Chem. 1989; 264: 16170-16173Google Scholar), suggesting the CS repeats might also be similar, at least in level of conservation. Accordingly, the bovine CS-encoding region was cloned by PCR amplification from genomic DNA, using primer sequences from the published sequence flanking the gap region. The sequence determined from this PCR clone indeed contains a series of conserved repeats similar in size and number to the human conserved region, but the degree of conservation is lower in the bovine case (Fig.2). The rat repeat region sequence is presented for comparison and shows repeats that are conserved as a group even less well than the bovine. It is possibly due to this lower degree of sequence conservation that there has been no detectable polymorphism in approximately 20 bovine genomic DNA samples that have been examined by PCR (data not shown). These samples were obtained from different breeds and geographical regions in an effort to overcome the effects of agricultural inbreeding, but it may be that cattle from sufficiently isolated or distantly removed populations would still show some variation at this locus. At least it can be said that the repeated region in bovine is not polymorphic for the practical purpose of serving as an experimental system. Concurrently with this work, a bovine aggrecan cDNA clone encoding the CS domain was obtained in similar fashion by another group and sequenced, and the repeat region in that clone appears virtually identical to that reported here. 2Dr. Tom Hering, Case Western Reserve University, personal communication. This sequence is now available in GenBank (accession number U76615). The two shortest human alleles, 13 and 18, were cloned and sequenced, and the repeat structure compared with the previously sequenced allele 22 (Fig. 3). This comparison confirms that these alleles result from varying numbers of repeats and not from interpolation of some other sequence or amplification of some other unrelated sequence. The fine structure of these repeats can be examined because the near-identical repeats do show some variation, particularly in the penultimate codon, which may encode either threonine or alanine (Fig.2). Even where the amino acid sequences are identical, types of repeats can be distinguished by a third-base change in codon 17 of the 19-codon repeat, permitting four classes of repeats to be distinguished on the basis of these two adjacent codons: ACC ACT, ACC GCT, ACT GCT, and ACT ACT. On this basis, alleles 13 and 18 are rather similar, differing primarily by an expansion of the number of type III repeats in allele 18. It should be noted that three repeats in the middle of allele 18 could not be conclusively identified, due to the technical difficulties of sequencing such repeated regions; a deletion strategy will be required to sequence the longer alleles. Allele 22 shows several changes in the repeat pattern, including an expansion of type I repeats, a loss of type III repeats, and the introduction of a large body of type II repeats, not seen in the other two alleles. Allele 22 harbors one repeat that was not originally sequenced and has still not been determined. This unknown repeat is arbitrarily placed between the blocks of II and III repeats because it is most likely to be one of these types. Additionally, there are a few other changes in these repeats, shown boxed in Fig. 3, that may have arisen either by complex shuffling of repeat types or by point mutation within a block of repeats. VNTR polymorphisms, or hypervariable minisatellite regions, are extremely valuable tools in linkage analysis, providing high levels of heterozygosity (18Jeffreys A.J. Wilson V. Thein S.L. Nature. 1985; 314: 67-73Google Scholar). These loci are polymorphic by virtue of different numbers of conserved repeat sequences, probably generated by unequal crossing over during mitosis or meiosis (19Jeffreys A.J. Royle N.J. Wilson V. Wong Z. Nature. 1988; 332: 278-281Google Scholar). The polymorphic nature is correlated with degree of repeat conservation (18Jeffreys A.J. Wilson V. Thein S.L. Nature. 1985; 314: 67-73Google Scholar). These minisatellites almost always are associated with non-coding regions in the genome; the only exceptions previously reported are members of the mucin family (20Gum Jr., J.R. Am. J. Respir. Cell Mol. Biol. 1992; 7: 557-564Google Scholar) and, recently, the p57KIP2 gene (21Tokino T. Urano T. Furuhata T. Matsushima M. Miyatsu T. Sasaki S. Nakamura Y. Hum. Genet. 1996; 97: 625-631Google Scholar). Mucins share several features with proteoglycans, including long stretches of repetitive sequence used for carbohydrate attachment; there is no similarity in these sequences however between mucins and aggrecan. One such mucin, the human epithelial mucin cloned from breast and pancreatic tumor cells (22Gendler S.J. Lancaster C.A. Taylor-Papadimitriou J. Duhig T. Peat N. Burchell J. Pemberton L. Lalani E.-N. Wilson D. J. Biol. Chem. 1990; 265: 15286-15293Google Scholar), consists of 20 to 100 repeats of a 20-amino acid sequence. This range in repeat number is much greater than seen in the aggrecan expressed VNTR repeats; the number of different alleles is also much higher for this mucin. The functional consequences of the length variation has not been explored for the mucins, and in fact the properties of mucins may be rather insensitive to variations in mucin core protein lengths. The p57KIP2 gene, an inhibitor of cyclin-dependent kinase, has a proline-alanine repeat domain of 40 hexanucleotide repeats, which are polymorphic in that 7–15% of individuals have 12-nucleotide deletions. The GC-rich character of these short repeats recalls the triplet repeat type of mutations (23Caskey C.T. Pizzuti A. Fu Y.-H. Fenwick Jr., R.G. Nelson D. Science. 1992; 256: 784-789Google Scholar). There are numerous cases of variation in numbers of repeated sequences in related proteins, and perhaps not surprisingly these are often carbohydrate-bearing domains. These variants occur in different members of multigene families, as for the trout apo-polysialoglycoproteins (24Sorimachi Y. Emori Y. Kawasaki H. Kitajima K. Inoue S. Suzuki K. Inoue Y. J. Biol. Chem. 1988; 263: 17678-17684Google Scholar), or as evolutionary variants in closely related species, as for the involucrin gene (25Djian P. Green H. Proc. Natl. Acad. Sci. U. S. A. 1991; 88: 5321-5325Google Scholar). These examples may indicate tolerance for such core protein length variation but may also reflect compensatory changes in other tissue components. The aggrecan gene variation is unique, as the variation in repeats occurs at a polymorphic locus within a single population, as for mucins; yet unlike mucins the molecule is a primary functional component of a stable tissue, interacts with several other components of that tissue, and may contribute to a subtle tissue architecture, in addition to its bulk properties. There are two major aspects that may be affected by this variation in aggrecan repeat length (Fig. 4). Most obvious is the chondroitin-sulfate content. Each repeat contains two possible attachment points for CS, so that extreme range alleles 33 and 13 might vary by as many as 40 CS chains per core protein monomer. It is not known whether all serine-glycine pairs in the molecule are equally likely to receive CS chains, but if full substitution is assumed, there would be a range of 172 to 132 CS chains between these two alleles, respectively, representing a 30% variation. It is not clear what the consequences of such a difference in GAG content might mean functionally for cartilage, assuming all other variables in this complex system are unchanged; it is of course possible that chain length or percentage of sites substituted may differ in these individuals and thus compensate for acceptor site number. Differences in sulfation produce profound changes in cartilage, as seen in the brachymorphic mouse (26Orkin R.W. Pratt R.M. Martin G.R. Dev. Biol. 1976; 50: 82-94Google Scholar), and the recently identified diastrophic dysplasia gene as a sulfate transporter (27Hastbacka J. De la Chapelle A. Mahtani M.M. Clines G. Reeve-Daly M.P. Daly M. Hamilton B.A. Kusumi K. Trivedi B. Weaver A. Coloma A. Lovett M. Buckler A. Kaitila I. Lander E.S. Cell. 1994; 78: 1073-1087Google Scholar). The chondroitin sulfate chains of aggrecan undergo changes in size and number with aging, and these changes have been implicated in the onset of osteoarthritis (28Hardingham T. Bayliss M. Semin. Arthritis Rheum. 1990; 20: 12-33Google Scholar). It is possible that individuals expressing extreme repeat number alleles for the aggrecan expressed VNTR possess cartilage with unusual physical properties or that may show compensatory changes in aggrecan expression level, CS chain length, CS attachment site utilization, or sulfation pattern. The length of the core protein varies directly with repeat number, and this length variation also may lead to changes in cartilage function. The CS-bearing part of aggrecan is thought to attain an extended rod-like structure, so that more repeats should correlate with a longer core protein. The range in the CS domain would be 1538 to 1158 amino acids (allele 33 to allele 13), of which 589 to 209 are in the CS repeat domain; this range predicts a CS region length difference of 33%. This difference in core protein length may affect some critical spacing or packing aspects of cartilage. Aggrecan is anchored at the amino terminus by the hyaluronan/link protein binding of the G1 domain, but the carboxyl-terminal G3 domain also is likely to be involved in binding to some component of cartilage. The G3 domain of aggrecan has been shown to have lectin-like activity (29Halberg D.F. Proulx G. Doege K. Yamada Y. Drickamer K. J. Biol. Chem. 1988; 263: 9486-9490Google Scholar), and the similar domain of the related aggrecan family member versican has been reported to bind a variety of matrix molecules, most notably tenascin-R (30Aspberg A. Binkert C. Ruoslahti E. Proc. Natl. Acad. Sci. U. S. A. 1995; 92: 10590-10594Google Scholar) and heparin sulfate (31Ujita M. Shinomura T. Ito K. Kitagawa Y. Kimata K. J. Biol. Chem. 1994; 269: 27603-27609Google Scholar). While the GAG attachment domain of versican does not show the same type of conserved repeat sequence as aggrecan and is unlikely to vary in repeat number, it does undergo alternative splicing, producing molecules with varied spacing between G1 and G3 domains (32Ito K. Shinomura T. Zako M. Ujita M. Kimata K. J. Biol. Chem. 1993; 270: 958-965Google Scholar,33Naso M.F. Zimmermann D.R. Iozzo R.V. J. Biol. Chem. 1994; 269: 32999-33008Google Scholar). Variation in spacing between binding domains might be expected to have disruptive effects on tissue architecture, if the ligands are relatively fixed in position. Such disruption might be greater if two alleles are expressed together in the tissue; it is currently not known if both alleles of an individual heterozygous for aggrecan are expressed together and/or at an equal level in cartilage. The structure of the proteins expressed from these various polymorphic alleles has not been investigated; different alleles must be expressed in different individuals, however, as several alleles have been observed in the homozygous condition; these include alleles 28, 27, 26, and 22 (data not shown). The low frequency of most of the alleles (Table I) is probably the reason that homozygous individuals for the extreme forms of the gene have not been observed; it is also likely that other allelic forms exist in the population and would be observed in a larger sampling. We have analyzed the allelic distribution pattern for this gene in an independent population similar in size to that reported here, with very similar observed frequencies of these same alleles. 3W. E. Horton, Jr., R. Balakir, P. Precht, C. Plato, J. D. Tobin, M. Lethbridge-Cejku, L. Meek, and K. Doege, submitted for publication. There are numerous unanswered questions about the expression of such alleles, including whether both alleles in a heterozygote are expressed, whether both appear in equivalent amount and pattern in the matrix, as well as the questions regarding the glycosaminoglycan components raised above. Since these alleles occur widely in the population, the effects of the polymorphisms are obviously tolerated, and any proposed consequences would fall by necessity within the range of functional parameters seen in the population. One such potential consequence of different alleles might be a subtle variation in cartilage function affecting the resistance of an individual's articular cartilage to the onset of joint degeneration or osteoarthritis. These conditions occur widely with aging but to different degrees of severity and ages of onset in the population. The etiology of these disorders is undoubtedly complex and is under intense investigation, but a genetic component is suspected in families with early onset osteoarthritis, and in some cases has been demonstrated (see Ref. 34Jimenez S.A. Dharmavaram R.M. Ann. Rheum. Dis. 1994; 53: 789-797Google Scholar, and references therein). In such a widespread disease, the type of variation seen at the polymorphic aggrecan locus may provide an explanation for cartilage with a common but subtle reduction in functionality. Support for this hypothesis has recently been obtained in a population study linking one of the aggrecan VNTR alleles to bilateral hand osteoarthritis.3 It is perhaps surprising that this type of aggrecan polymorphism only appears in humans, of the species so far examined. The occurrence of the polymorphism in other species has only been directly tested in cows, but the level of conservation of the repeats is insufficient in other species to support unequal crossing over and generation of different repeat numbers. The polymorphism depends on a high degree of sequence conservation, leading to the question of why the sequences are so highly conserved in humans alone of the species so far examined. The actual repeat sequence is somewhat different in the various species (Fig. 2). The human repeat is unique in conserving the two closely spaced serine-glycine pairs in the middle of the repeat; the bovine and rat repeats have lost one of the middle serine-glycine pairs in favor of an additional one at the beginning, which is uniformly a proline-glycine pair in human repeats. Until more is known about the utilization of particular serine-glycine pairs as CS acceptor sequences, as well as details of cartilage ultrastructure, it is difficult to speculate on the consequences of these different repeat structures. Sequencing of the three shortest alleles confirms that they have arisen by some mechanism that expands or contracts blocks of repeated sequences, probably unequal exchange during meiosis or mitosis. There are several points of divergence among these three sequences, so the evolution has been complex and probably these three alleles are distantly related in the allelic "family." The other alleles require sequencing to clarify the relationship. It is also possible that there may be multiple alleles of the same repeat number that have arisen independently and show a different pattern of amplified repeats. This has not been observed and would require screening by extensive sequencing. The median allele size, 25 repeats, is smaller than the mean of 27 repeats; the smaller repeat number alleles are thus over-represented. The smaller repeat alleles are probably more stable than the larger repeat alleles, which may be more likely to recombine; in addition, the large alleles may be more difficult to detect using the PCR assay. Finally, the sample size may not be large enough to provide a true representation of allele frequencies. We gratefully acknowledge Stephanie Willis for performance of automated DNA sequencing.
Referência(s)