Topology Models for 37 Saccharomyces cerevisiaeMembrane Proteins Based on C-terminal Reporter Fusions and Predictions
2003; Elsevier BV; Volume: 278; Issue: 12 Linguagem: Inglês
10.1074/jbc.m300163200
ISSN1083-351X
AutoresHyun Kim, Karin Melén, Gunnar von Heijne,
Tópico(s)RNA and protein synthesis mechanisms
ResumoWe provide experimentally based topology models for 37 integral membrane proteins from Saccharomyces cerevisiae. A C-terminal fusion to a dual Suc2/His4C topology reporter has been used to determine the location of the C terminus of each protein relative to the endoplasmic reticulum membrane, and this information is used in conjunction with theoretical topology prediction methods to arrive at a final topology model. We propose that this approach may be used to produce reliable topology models on a proteome-wide scale. We provide experimentally based topology models for 37 integral membrane proteins from Saccharomyces cerevisiae. A C-terminal fusion to a dual Suc2/His4C topology reporter has been used to determine the location of the C terminus of each protein relative to the endoplasmic reticulum membrane, and this information is used in conjunction with theoretical topology prediction methods to arrive at a final topology model. We propose that this approach may be used to produce reliable topology models on a proteome-wide scale. hemagglutinin endo-β-N-acetylglucosaminidase synthetic medium lacking uracil The topology of a membrane protein, i.e. a specification of its transmembrane segments and their in/out orientation relative to the membrane, is a basic structural characteristic that is a very powerful guide to experimental studies when no three-dimensional structure is available. Most proteins for which experimentally derived topology models exist are from bacteria; only a handful of topologies are known for yeast membrane proteins. As an example, the widely used MPtopo data base (1Jayasinghe S. Hristova K. White S.H. Protein Sci. 2001; 10: 455-458Crossref PubMed Scopus (146) Google Scholar) currently holds a total of 92 experimentally determined topology models, of which only 3 are for yeast proteins. The preponderance of known topologies for bacterial proteins is mainly a result of the relative ease with which they can be determined experimentally by making series of C-terminal-truncated versions of the target protein fused to topology reporters such as PhoA, LacZ, Bla, or green fluorescent protein (2Manoil C. Methods Cell Biol. 1991; 34: 61-75Crossref PubMed Scopus (196) Google Scholar, 3Drew D. Sjöstrand D. Nilsson J. Urbig T. Chin C.N. de Gier J.W. von Heijne G. Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 2690-2695Crossref PubMed Scopus (170) Google Scholar). This approach often yields clear-cut results, and reliable topology models can be proposed. Although a couple of topology reporters have also been developed for the yeast Saccharomyces cerevisiae (4Sengstag C. Methods Enzymol. 2000; 327: 175-190Crossref PubMed Scopus (13) Google Scholar), they seem to yield less definitive results than can be obtained in bacterial systems (5Green N. Walter P. Mol. Cell. Biol. 1992; 12: 276-282Crossref PubMed Scopus (21) Google Scholar) and have not been much used. We have recently shown that highly reliable topology predictions can be obtained for a subset of membrane proteins for which many different topology prediction methods (five in our case) give the same prediction (6Nilsson J. Persson B. von Heijne G. FEBS Lett. 2000; 486: 267-269Crossref PubMed Scopus (93) Google Scholar, 7Nilsson J. Persson B. von Heijne G. Protein Sci. 2002; (in press)Google Scholar). When some limited experimental information (such as the in/out location of the C terminus of a protein) is available, consensus predictions are even more useful, and a combination of C-terminal reporter fusions and consensus predictions has been shown to make it possible to rapidly produce reliable topology models forEscherichia coli inner membrane proteins (3Drew D. Sjöstrand D. Nilsson J. Urbig T. Chin C.N. de Gier J.W. von Heijne G. Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 2690-2695Crossref PubMed Scopus (170) Google Scholar). We have further shown that this basic approach can be even more widely applied by using the experimentally determined location of the C terminus as a constraint on the topology predictions produced by the TMHMM program (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar). Considering that topology mapping based on reporter fusions to truncated target proteins may be problematic in yeast, it appeared to us that approaches based on a combination of C-terminal fusions to the full-length protein, which should be minimally disruptive to the structure of the proteins, and theoretical topology prediction should be particularly useful for yeast membrane proteins. In this pilot study, we report results for C-terminal reporter fusions to 40 S. cerevisiae membrane proteins. We find that only one of the 40 proteins that we cloned initially cannot be expressed using either of two vectors that we have tried, that the dual Suc2/His4C reporter (9Deak R. Wolf D. J. Biol. Chem. 2001; 276: 10663-10669Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar) that we have used yields consistent results for 37 of the 39 expressed proteins, and that the location of the C terminus as predicted by our consensus method (6Nilsson J. Persson B. von Heijne G. FEBS Lett. 2000; 486: 267-269Crossref PubMed Scopus (93) Google Scholar) is correct for 31 proteins of these 37. One of the two proteins for which the experimental results are inconsistent is a mitochondrial protein. We have also used the experimentally determined C-terminal locations to constrain the predictions from the TMHMM method, and we report TMHMM reliability scores for all proposed topology models. Our results suggest that large-scale topology mapping strategies where limited but reliable experimental information is combined with topology prediction will be successful in yeast. All plasmids were constructed by homologous recombination (10Oldenburg K.R. Vo K.T. Michaelis S. Paddon C. Nucleic Acids Res. 1997; 25: 451-452Crossref PubMed Scopus (398) Google Scholar). Plasmid pJK90, 1Kim et al., submitted for publication. which contains theOST4 gene fused to three hemagglutinin (HA)2 epitopes, a part of theSUC2 gene, and the HIS4 gene, was treated withSmaI to linearize the vector between the end of TPI promoter and the start codon of OST4. The 5′-end homologous recombination region was selected to match the 3′-end ofSmaI-digested pJK90, and the 3′-end homologous region was chosen to match the linker between the end of OST4 and the start of the HA sequence. Each homologous region comprised 35 nucleotides, (5′-AGGTGGTTTGTTACGCATGCAAGCTTGATATCGAA-3′ and 5′-GATGGTCTAGAGGTGTAACCACTTGAGTTCTTAGG-3′). A gene of interest was amplified by PCR using genomic DNA as a template and two primers, a 5′-end primer complementing the start codon of the gene with the homologous region sequence and a 3′-end primer complementing the end of the gene excluding the stop codon with the homologous region sequence. Genomic DNA was isolated as described (11Breeden L.L. Methods Enzymol. 1997; 283: 332-341Crossref PubMed Scopus (59) Google Scholar) from W303–1a (MATa, ade2, can1, his3, leu2, trp1, ura3) and from W303–1α (MATα, ade2, can1, his3, leu2, trp1, ura3). A yeast strain STY 50 (MATa, his4–401, leu2, -3, and -112, trp1–1, ura3–52, HOL1–1, SUC2::LEU2) (12Strahl-Bolsinger S. Scheinost A. J. Biol. Chem. 1999; 274: 9068-9075Abstract Full Text Full Text PDF PubMed Scopus (89) Google Scholar) was transformed with the linearized pJK90 vector and the PCR product carrying the gene of interest flanked by the homologous region sequences. Transformation was carried out by the lithium acetate protocol (13Ito H. Fukuda Y. Murata K. Kimura A. J. Bacteriol. 1983; 153: 163-168Crossref PubMed Google Scholar). Transformants were selected on synthetic medium lacking uracil (SD−Ura). Plasmids were isolated and verified by PCR analysis and DNA sequencing. Plasmids were named as pJK92 (gene name) using gene names from the Saccharomyces genome data base (14Dwight S.S. Harris M.A. Dolinski K. Ball C.A. Binkley G. Christie K.R. Fisk D.G. Issel-Tarver L. Schroeder M. Sherlock G. Sethuraman A. Weng S. Botstein D. Cherry J.M. Nucleic Acids Res. 2002; 30: 69-72Crossref PubMed Scopus (301) Google Scholar). For construction of plasmids with an inducible Gal promoter, the fragment carrying the gene, HA epitope, SUC2, andHIS4C was amplified by PCR using pJK92(YEL059C) or pJK92(YJL028W) as a template. The two primers used in this PCR carried the homologous region sequences with the EcoRI-digested 424GALS (ATCC, Manassas, VA). The PCR product and the linearized 424GALS plasmid were transformed into strain STY50, and transformants were selected on −Trp plates. The correct construction of the plasmid was confirmed by yeast colony PCR. Yeast transformants carrying TPI promoter plasmids were grown to OD600 0.8 to 1 in 10 ml of SD−Ura. Harvested cell pellets were washed with 5 ml of dH2O and left at −20 °C for at least 1 h. Frozen cells were resuspended in 200 μl of SDS sample buffer (50 mm Tris-HCl, pH 6.8, 10% glycerol, 2% SDS, 5% β -mercaptoethanol, 0.5 mm EDTA, 1 mmphenylmethylsulfonyl fluoride, protease inhibitor mixture (Roche Molecular Biochemicals), 0.0025% bromphenol blue), incubated at 60 °C for 10 to 15 min and centrifuged for 10 min at 13,000 rpm in an Eppendorf microfuge. Soluble fractions were transferred to new tubes and subjected to Endo H digestion. Transformants carrying the GALS promoter were grown to OD600 1 to 2 in 5 ml of −Trp media. Cells were harvested by centrifugation and diluted to 4-fold with −Trp media supplemented with galactose instead of glucose as carbon source and grown for 5 h at 30 °C. Cell lysates were prepared as described above. Whole-cell lysates were supplemented with a final concentration of 80 mm potassium acetate, pH 5.6, and 2 μl of Endo H (1 unit/200 μl, Roche Molecular Biochemicals) was added. Samples were incubated at 37 °C for 1 to 2 h. Mock samples were treated and incubated in the same way but without Endo H. Solublized proteins were separated on 7.5% SDS-polyacrylamide gels, transferred onto nitrocellulose membranes, and probed with anti-HA antibody (Babco, Richmond, CA). Transfomants carrying each fusion construct were streaked on SD−Ura medium lacking histidine but containing 6 mm histidinol. Plates were incubated at 30 °C for 3 to 4 days. All predicted S. cerevisiae open reading frames (15Goffeau A. Aert R. Agostini-Carbone M. Ahmed A. Aigle M. Alberghina L. Albermann K. Albers M. Aldea M. Alexandraki D. Aljinovic G. Allen E. Altmann R. Alt-Mörbe J. André B. Andrews S. Ansorge W. Antoine G. Anwar R. Aparicio A. Araujo R. Arino J. Arnold W. Arroyo J. Aviles E. Backes U. Baclet M. Badcock K. Bahr A. Baladron V. Ballesta J. Bankier A. Banrevi A. Bargues M. Baron L. Barreiros T. Barrell B. Barthe C. Barton A. Baur A. Bécam A. Becker A. Becker I. Beinhauer J. Benes V. Benit P. Berben G. Bergantino E. Bergez P. Berno A. Bertani I. Biteau N. Bjourson A. Blöcker H. Blugeon C. Bohn C. Boles E. Bolle P. Bolotin-Fukuhara M. Bordonné R. Boskovic J. Bossier P. Botstein D. Bou G. Bowman S. Boyer J. Brandt P. Brandt T. Brendel M. Brennan T. Brinkman R. Brown A. Brown A. Brown D. Brückner M. Bruschi C. Buhler J. Buitrago M. Bussereau F. Bussey H. Camasses A. Carcano C. Carignani G. Carpenter J. Casamayor A. Casas C. Castagnoli L. Cederberg H. Cerdan E. Chalwatzis N. Chanet R. Chen E. Chéret G. Cherry J. Chillingworth T. Christiansen C. Chuat J. Chung E. Churcher C. Churcher C. et al.Nature. 1997; 387 (suppl.): 1-105Google Scholar) were downloaded from genome-ftp.stanford.edu (version June 29, 2001). TMHMM1.0 (16Krogh A. Larsson B. von Heijne G. Sonnhammer E. J. Mol. Biol. 2001; 305: 567-580Crossref PubMed Scopus (8739) Google Scholar) was used to identify putative membrane proteins with a minimum of two predicted transmembrane helices. From this set, 55 proteins were selected for which five different topology prediction methods, TMHMM1.0, HMMTOP2.0 (17Tusnady G.E. Simon I. J. Mol. Biol. 1998; 283: 489-506Crossref PubMed Scopus (942) Google Scholar, 18Tusnady G.E. Simon I. Bioinformatics. 2001; 17: 849-850Crossref PubMed Scopus (1530) Google Scholar), MEMSAT1.8 (19Jones D.T. Taylor W.R. Thornton J.M. Biochemistry. 1994; 33: 3038-3049Crossref PubMed Scopus (700) Google Scholar), PHD2.1 (20Rost B. Fariselli P. Casadio R. Protein Sci. 1996; 5: 1704-1718Crossref PubMed Scopus (532) Google Scholar), and TOPPRED1.0 (21von Heijne G. J. Mol. Biol. 1992; 225: 487-494Crossref PubMed Scopus (1395) Google Scholar, 22Claros M.G. von Heijne G. Comput. Appl. Sci. 1994; 10: 685-686PubMed Google Scholar), all gave the same predicted topology. Three genes carrying introns (YDR376W, YML052W, YMR292W) were removed from this set, as were seven genes annotated as questionable open reading frames (YCL023C, YDR526C, YFL032W, YGL024W, YGL204C, YGR228W, YNL266W). A gene encoding a known mitochondrial protein (Q0275) was also excluded. The remaining 44 genes were cloned into the expression vectors described above. The 37 proteins for which the location of the C terminus could be determined experimentally (Table I) were further analyzed using a new version of TMHMM (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) that calculates a reliability score for the predicted topology and also allows any part of the topology to be fixed to a given location a priori. The experimentally determined C-terminal locations were used as constraints in these predictions.Table ISummary of the results for the 39 proteins analyzed in this studyOpen reading frameLengthPredicted or known functionExpressionGlycoGrowthC-terminalConsensusScoreAccuracyTMHMM (C)Score (C)Accuracy (C)YALOO7C215p24 protein involved in membrane traffickingMediumNoYesIn2, Nin-Cin0.440.552, Nin-Cin0.580.71YBR210W142Strong similarity to Drosophila melanogaster cornichon proteinHighYesNoOut3, Nin-Cout0.980.993, Nin-Cout0.980.99YDL212W210ChaperoneMediumNoYesIn4, Nin-Cin0.980.984, Nin-Cin0.990.99YDRO90C310Weak similarity to YRO2LowYesNoOut7, Nout-Cin0.340.477, Nin-Cout0.650.76YDR182W491Cell division control proteinLowYesNoOut3, Nin-Cout1.001.003, Nin-Cout1,001.00YDR438W370Strong similarity to hypothetical protein YML018cMediumNoYesIn10, Nin-Cin0.420.5310, Nin-Cin0.420.59YDR525W-A79Similarity to PMP3/SNA1MediumNoYesIn2, Nout-Cout0.420.542, Nin-Cin0.760.83YELO59W102Hypothetical proteinMediumYesN/DOut2, Nout-Cout0.840.872, Nout-Cout0.880.91YERO56C533Purine-cytosine permeaseMediumNoYesIn12, Nin-Cin0.050.2412, Nin-Cin0.080.35YER119C448Weak similarity to Erwinia herbicola tyrosine permeaseMediumYesNoOut11, Nin-Cout0.470.5811, Nin-Cout0.480.64YER185W303Strong similarity to Rtm 1pMediumNoNo?7, Nout-Cin0.860.89YGR055W574High affinity methionine permeaseMediumNoYesIn12, Nin-Cin0.200.3612, Nin-Cin0.200.44YGR105W77ATPase assembly integral membrane proteinHighNoYesIn2, Nin-Cin0.930.942, Nin-Cin0.970.98YGR121C492Ammonia permeaseLowNoYesIn11, Nout-Cin0.370.5011, Nout-Cin0.450.62YGR149W432Similarity to hypothetical protein SPBC776.05Schizosacchgromyces pombeMediumNoYesIn8, Nin-Cin0.340.478, Nin-Cin0.550.69YGR213C317Involved in 7-aminocholesterol resistanceMediumNoYesIn7, Nout-Cin0.980.997, Nout-Cin0.990.99YGR290W147Hypothetical proteinHighYesYesOut2, Nin-Cin0.910.932, Nout-Cout0.630.74YHRO26W213H+-ATPase 23 kDa subunitMediumYesNoOut5, Nin-Cout0.920.935, Nin-Cout0.990.99YHR140W239Hypothetical proteinMediumNoYesIn6, Nin-Cin0.910.936, Nin-Cin0.910.94YJL170C183Weak similarity to Helicobacter pylori endonuclease IIIMediumYesNoIn2, Nin-Cin0.790.831, Nin-Cout0.300.51YKL119C215H+-ATPase assembly proteinLowNoYesIn2, Nin-Cin0.980.982, Nin-Cin0.991.00YKR044W443Hypothetical proteinMediumNoYesIn2, Nin-Cin0.900.922, Nin-Cin0.910.94YKR065C197Similarity to hypothetical proteinSchizosacchgromyces pombeMediumNoNo?2, Nin-Cin0.990.99YLL028W586Polyamine transport proteinMediumNoYesIn12, Nin-Cin0.200.3612, Nin-Cin0.200.44YLRO46C270Strong similarity to Rta1p and Rtm1p proteinMediumNoYesIn6, Nin-Cin0.510.606, Nin-Cin0.590.71YLR311C115Weak similarity to Sauroleishmania tarentolae cryptogene protein G4MediumYesNoOut2, Nin-Cin0.820.853, Nin-Cout>0.710.79YLR404W285Hypothetical proteinMediumNoYesIn2, Nin-Cin0.810.852, Nin-Cin0.990.99YLR443W448Involved in cell wall biogenesis and architectureMediumNoYesIn4, Nin-Cin0.520.624, Nin-Cin0.750.82YMRO40W160Strong similarity to Yet 1pMediumNoYesIn3, Nout-Cin0.960.973, Nout-Cin0.970.98YMR148W148Hypothetical proteinMediumNoYesIn2, Nin-Cin0.540.632, Nin-Cin0.630.74YNL194C301Strong similarity to YDL222c and similarity to Sur7pMediumYesNoOut4, Nin-Cin0.900.924, Nout-Cout0.580.70YNR002C282Strong similarity to Yarrowia lipolyticaGPR1MediumYesNoOut6, Nout-Cout0.230.386, Nout-Cout0.320.53YNR062C327Weak similarity to Hemophilus influenzae lctP homologMediumNoYesIn8, Nin-Cin0.880.908, Nin-Cin0.880.92YOLO79W132Similarity to NADH dehydrogenasesHighYesNoOut3, Nin-Cout0.340.483, Nin-Cout0.370.56YOL1O1C312Similarity to YOLOO2c and YDR492wLowYesNoOut7, Nin-Cout0.820.867, Nin-Cout0.830.88YOR376W122Hypothetical proteinMediumNoYesIn2, Nin-Cin0.720.782, Nin-Cin0.890.92YPL264C353Strong similarity to YMR253cMediumNoYesIn10, Nin-Cin0.480.5910, Nin-Cin0.490.64YPRO71W211Strong similarity to YIL029cMediumNoYesIn4, Nin-Cin0.780.834, Nin-Cin0.920.94YPR192W305Similarity to water channel proteinsMediumNoYesIn6, Nin-Cin0.930.956, Nin-Cin0.940.96Gene names in column 1 are from the Saccharomyces genome data base (14Dwight S.S. Harris M.A. Dolinski K. Ball C.A. Binkley G. Christie K.R. Fisk D.G. Issel-Tarver L. Schroeder M. Sherlock G. Sethuraman A. Weng S. Botstein D. Cherry J.M. Nucleic Acids Res. 2002; 30: 69-72Crossref PubMed Scopus (301) Google Scholar). Column 2 gives the length of the wild-type protein, column 3 a summary of the annotation found in the theSaccharomyces genome data base and in the MIPS database (23Mewes H.W. Frishman D. Guldener U. Mannhaupt G. Mayer K. Mokrejs M. Morgenstern B. Munsterkotter M. Rudd S. Weil B. Nucleic Acids Res. 2002; 30: 31-34Crossref PubMed Scopus (758) Google Scholar), column 4 gives a qualtitative measure of expression levels as judged visually from Western blots, column 5 shows whether or not the fusion protein is glycosylated, column 6 whether a his4strain transformed with a plasmid carrying the fusion protein can or cannot grow on histidinol, column 7 gives the location of the C-terminal end of the wild-type protein as deduced from the glycosylation and growth phenotypes, column 8 gives the consensus topology predicted by the five methods used here, column 9 gives the TMHMM S3-score (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) calculated for the topology in column 8, column 10 gives the expected accuracy (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) calculated from the score in column, 9, column 11 gives the topology predicted by TMHMM after inclusion of the experimentally determined location of the C-terminus (column 7), and columns 12 and 13 give the TMHMM S3-score and the expected accuracy for the topology prediction in column 11. The detailed TMHMM outputs corresponding to the predictions given in column 11 are provided as an Electronic Supplement. Open table in a new tab Gene names in column 1 are from the Saccharomyces genome data base (14Dwight S.S. Harris M.A. Dolinski K. Ball C.A. Binkley G. Christie K.R. Fisk D.G. Issel-Tarver L. Schroeder M. Sherlock G. Sethuraman A. Weng S. Botstein D. Cherry J.M. Nucleic Acids Res. 2002; 30: 69-72Crossref PubMed Scopus (301) Google Scholar). Column 2 gives the length of the wild-type protein, column 3 a summary of the annotation found in the theSaccharomyces genome data base and in the MIPS database (23Mewes H.W. Frishman D. Guldener U. Mannhaupt G. Mayer K. Mokrejs M. Morgenstern B. Munsterkotter M. Rudd S. Weil B. Nucleic Acids Res. 2002; 30: 31-34Crossref PubMed Scopus (758) Google Scholar), column 4 gives a qualtitative measure of expression levels as judged visually from Western blots, column 5 shows whether or not the fusion protein is glycosylated, column 6 whether a his4strain transformed with a plasmid carrying the fusion protein can or cannot grow on histidinol, column 7 gives the location of the C-terminal end of the wild-type protein as deduced from the glycosylation and growth phenotypes, column 8 gives the consensus topology predicted by the five methods used here, column 9 gives the TMHMM S3-score (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) calculated for the topology in column 8, column 10 gives the expected accuracy (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) calculated from the score in column, 9, column 11 gives the topology predicted by TMHMM after inclusion of the experimentally determined location of the C-terminus (column 7), and columns 12 and 13 give the TMHMM S3-score and the expected accuracy for the topology prediction in column 11. The detailed TMHMM outputs corresponding to the predictions given in column 11 are provided as an Electronic Supplement. To select S. cerevisiae membrane proteins for this study, we first searched all predicted open reading frames in the yeast genome (15Goffeau A. Aert R. Agostini-Carbone M. Ahmed A. Aigle M. Alberghina L. Albermann K. Albers M. Aldea M. Alexandraki D. Aljinovic G. Allen E. Altmann R. Alt-Mörbe J. André B. Andrews S. Ansorge W. Antoine G. Anwar R. Aparicio A. Araujo R. Arino J. Arnold W. Arroyo J. Aviles E. Backes U. Baclet M. Badcock K. Bahr A. Baladron V. Ballesta J. Bankier A. Banrevi A. Bargues M. Baron L. Barreiros T. Barrell B. Barthe C. Barton A. Baur A. Bécam A. Becker A. Becker I. Beinhauer J. Benes V. Benit P. Berben G. Bergantino E. Bergez P. Berno A. Bertani I. Biteau N. Bjourson A. Blöcker H. Blugeon C. Bohn C. Boles E. Bolle P. Bolotin-Fukuhara M. Bordonné R. Boskovic J. Bossier P. Botstein D. Bou G. Bowman S. Boyer J. Brandt P. Brandt T. Brendel M. Brennan T. Brinkman R. Brown A. Brown A. Brown D. Brückner M. Bruschi C. Buhler J. Buitrago M. Bussereau F. Bussey H. Camasses A. Carcano C. Carignani G. Carpenter J. Casamayor A. Casas C. Castagnoli L. Cederberg H. Cerdan E. Chalwatzis N. Chanet R. Chen E. Chéret G. Cherry J. Chillingworth T. Christiansen C. Chuat J. Chung E. Churcher C. Churcher C. et al.Nature. 1997; 387 (suppl.): 1-105Google Scholar) for membrane proteins for which five prediction methods (TOPPRED, TMHMM, HMMTOP, MEMSAT, and PHD) all give the same predicted topology. From our previous work (6Nilsson J. Persson B. von Heijne G. FEBS Lett. 2000; 486: 267-269Crossref PubMed Scopus (93) Google Scholar, 7Nilsson J. Persson B. von Heijne G. Protein Sci. 2002; (in press)Google Scholar), we anticipated that the predicted topologies should be correct for a high proportion of these proteins. We further required that the TMHMM method predict at least two transmembrane helices in each protein because currently available bioinformatics tools cannot reliably distinguish between N-terminal signal-anchor sequences and cleavable signal peptides and thus may mistakenly identify secreted proteins as single-spanning, N-terminal anchored membrane proteins. This initial screen produced a list of 55 proteins. Three genes carrying introns (YDR376W, YML052W, YMR292W) were excluded from the original list, as were seven genes annotated as questionable open reading frames (YCL023C, YDR526C, YFL032W, YGL024W, YGL204C, YGR228W, YNL266W). A gene encoding a known mitochondrial protein (Q0275) was excluded because the glycosylation assay cannot be used for mitochondrial proteins. Five additional proteins could not be analyzed. YNL323W was not amplified by PCR, the cloned sequence of YOL137W turned out to be different from the expected sequence, the cloned sequences of YDL196W and YNL101W contained frameshifts relative to the published sequences, and protein expression of YJL028W was not detected. We successfully made and expressed C-terminal reporter fusions to the remaining 39 proteins (Table I). For this study, we chose a 125-kDa dual Suc2/His4C topology reporter (4Sengstag C. Methods Enzymol. 2000; 327: 175-190Crossref PubMed Scopus (13) Google Scholar) to determine the location of a protein's C terminus in either the cytosol or the endoplasmic reticulum lumen (9Deak R. Wolf D. J. Biol. Chem. 2001; 276: 10663-10669Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar, 12Strahl-Bolsinger S. Scheinost A. J. Biol. Chem. 1999; 274: 9068-9075Abstract Full Text Full Text PDF PubMed Scopus (89) Google Scholar). The histidinol dehydrogenase activity of the His4C moiety converts histidinol to histidine only when it is localized in cytosol. Thus, only cells expressing fusion proteins with the reporter domain in the cytosol can grow on histidine-free media supplemented with histidinol. The part of the SUC2 gene that is present in the reporter encodes a segment of invertase containing eight N-glycosylation acceptor sites. When this domain is localized in the lumen of the endoplasmic reticulum, the fusion protein becomes heavily glycosylated. The cytosolic/non-cytosolic location of the C terminus of each of the 39 Suc2/His4C fusion proteins was determined by Endo H treatment (to identify a glycosylated, lumenally oriented reporter) Fig. 1 and growth on histidine-free media containing histidinol (to identify a cytosolically oriented reporter), Fig. 2. We did not observe any general growth defects of the yeast transformants carrying these fusion constructs, indicating that the addition of the reporter domain to the target proteins had no obvious harmful effects.Figure 2Growth of his4cells expressing the 39 fusion proteins analyzed in this study on a medium lacking histidine but including histidinol. Strains that grow well are indicated by underlining.View Large Image Figure ViewerDownload (PPT) For 35 of the 39 fusion proteins, the results from the glycosylation and histidinol growth assays were entirely consistent (Table I). Some of these proteins are known to be localized to the membranes of secretory organelles and the plasma membrane, but most have no known localization or function annotated in the Saccharomycesgenome data base (14Dwight S.S. Harris M.A. Dolinski K. Ball C.A. Binkley G. Christie K.R. Fisk D.G. Issel-Tarver L. Schroeder M. Sherlock G. Sethuraman A. Weng S. Botstein D. Cherry J.M. Nucleic Acids Res. 2002; 30: 69-72Crossref PubMed Scopus (301) Google Scholar) or in MIPS (23Mewes H.W. Frishman D. Guldener U. Mannhaupt G. Mayer K. Mokrejs M. Morgenstern B. Munsterkotter M. Rudd S. Weil B. Nucleic Acids Res. 2002; 30: 31-34Crossref PubMed Scopus (758) Google Scholar). Because these 35 fusion proteins were either glycosylated or had histidinol dehydrogenase activity, it is reasonable to assume that their natural locations are in the membranes along the secretory pathway, although we cannot completely rule out that some of the unglycosylated proteins are located in mitochondria with their C terminus facing either the cytosol or the intermembrane space. The initial results from the glycosylation and histidinol growth assays were inconsistent for four proteins. Growth on histidinol was seen for YGR290W, despite the fact it was efficiently glycosylated. A small amount of unglycosylated protein possibly representing molecules where the reporter is cytosolically oriented was evident, however, and given the high level of expression seen for this protein this may be enough to allow growth on histidinol. We thus conclude that YGR290W has its C terminus in the endoplasmic reticulum lumen. We further found that YEL059W was not expressed from the constitutive TPI promoter. However, a low level of expression was seen when the inducible Gal promoter was used, Fig. 2. The fusion protein was sensitive to Endo H digestion, indicating that the C terminus of the protein was glycosylated and located in the lumen of the endoplasmic reticulum. Because YEL059W was only expressed from the inducible Gal promoter, growth on histidinol could not be assayed. Finally, in the case of YKR065C and YER185W, the fusion proteins were expressed, but neither became glycosylated nor allowed growth on histidinol. We considered that a possible explanation for this observation might be that the proteins are localized to mitochondria with their C termini in the matrix space. YKR065C is strongly predicted to have an N-terminal mitochondrial targeting peptide both by TargetP (24Emanuelsson O. Nielsen H. Brunak S. von Heijne G. J. Mol. Biol. 2000; 300: 1005-1016Crossref PubMed Scopus (3560) Google Scholar) and a predictor specifically developed for yeast proteins (25Drawid A. Gerstein M. J. Mol. Biol. 2000; 301: 1059-1075Crossref PubMed Scopus (113) Google Scholar). Indeed, YKR065C has recently been identified as a mitochondrial inner membrane protein with a cleavable, matrix-targeting presequence. 3N. Pfanner, personal communication. The location of YER185W is so far unknown. We also roughly estimated the relative expression levels based on Western blotting with HA antibodies (Table I). It appears that small fusion proteins with few transmembrane helices tend to be better expressed than large proteins, but the correlation is not very strong. As shown in Table I, the consensus predictions for the location of the C termini of the 37 proteins for which this could be deduced from the fusion protein data matched the experimental results in 31 cases (84%). Thus, for six proteins the consensus prediction does not yield a good topology model. As a more direct way of integrating the experimental results into the final predictions, we took advantage of a recent improvement to the TMHMM program that allows predictions to be constrained by experimental information (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar). This new version of TMHMM also calculates a reliability score for each prediction that correlates strongly with prediction accuracy. The TMHMM results with inclusion of the experimentally determined C-terminal location are shown in Table I, together with the corresponding reliability score and the estimated probability that the prediction is correct (the "expected accuracy"). In contrast to the situation for bacterial inner membrane proteins, experimentally derived topology models are available for only a handful of yeast membrane proteins. This lack of data is aggravated by the fact that theoretical topology prediction methods seem to perform less well on yeast membrane proteins than on both bacterial and mammalian ones (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar). 4K. Melén, A. Krogh, and G. von Heijne (2003) J. Mol. Biol., in press. In this study, we have applied and extended a strategy initially proposed for bacterial inner membrane proteins (3Drew D. Sjöstrand D. Nilsson J. Urbig T. Chin C.N. de Gier J.W. von Heijne G. Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 2690-2695Crossref PubMed Scopus (170) Google Scholar) to a set of 39 predicted membrane proteins from the yeast S. cerevisiae. The strategy is based on the premise that reliable topology models can be produced rapidly by combining limited experimental information with topology predictions. The experimental information generated is the cytosolic/non-cytosolic location of the C terminus of the target proteins, and we show that this can be easily and reliably obtained by fusion of the full-length protein to a C-terminal, dual topology reporter (9Deak R. Wolf D. J. Biol. Chem. 2001; 276: 10663-10669Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar) composed of a hemagglutinin tag for immunodetection, a part of Suc2p that contains eight acceptor sites forN-linked glycosylation, and the His4p enzyme that converts histidinol to histidine. Because N-linked glycosylation can only be carried out in the endoplasmic reticulum lumen and histidinol cannot be transported into this compartment, the cytoplasmic/non-cytoplasmic location of the reporter (and thus of the C terminus of the target protein) is easily assayed by checking whether Endo H can digest any N-linked glycans and whetherhis4 cells expressing the target protein-reporter fusion can grow on histidinol-containing media lacking histidine. 38 of the 39 fusion proteins could be expressed from the constitutive TPI promoter in amounts sufficient for analysis; one protein could be expressed only from the inducible Gal4 promoter (which precludes use of the histidinol growth assay). Only one protein that was included in our initial set could not be expressed from either promoter. It thus appears that nearly all membrane proteins in S. cerevisiaecan be analyzed by our procedure. The results from the two topology assays were entirely consistent for 35 of the 39 proteins, and the location of the reporter could be reliably inferred also for 2 of the remaining 4. In only two cases did we fail to observe both glycosylation and growth on histidinol; a possible explanation is that these proteins are not targeted to the endoplasmic reticulum but rather are imported into the mitochondrial inner membrane and have their C termini in the matrix space. In fact, one of the two proteins (YKR065C) has recently been identified as being located in the inner mitochondrial membrane, 5N. Pfanner, personal communication. although its membrane topology is not yet known. When combined with theoretical predictions, the experimentally mapped C-terminal locations allow us to propose what we consider are reliable topology models for 37 yeast proteins for which no such information was previously available. To this end, we have used two approaches. First, all our target proteins were selected from the full set of predicted yeast open reading frames in such a way that five different topology prediction methods all gave the same prediction for each protein. We have previously shown that such consensus predictions are highly reliable for bacterial inner membrane proteins (6Nilsson J. Persson B. von Heijne G. FEBS Lett. 2000; 486: 267-269Crossref PubMed Scopus (93) Google Scholar, 7Nilsson J. Persson B. von Heijne G. Protein Sci. 2002; (in press)Google Scholar). The experimentally determined C-terminal location was the same as the predicted one for 31 of the 37 proteins, and we thus regard the topology models for these proteins as very likely to be correct. Second, we constrained the TMHMM predictions by fixing the C-terminal end of each protein to the experimentally determined location, because this is known to substantially increase the prediction accuracy (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar). We further calculated a reliability score for each predicted topology, both without and with a fixed C-terminal location, Table I. Because there is an approximately linear relationship between the reliability score and the probability that a particular prediction is correct (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar), such probability values (expected accuracy) were also calculated. Most of the 37 proteins have high scores compared with the score distribution calculated for all predicted S. cerevisiae membrane proteins (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) (data not shown). This was expected, because the proteins in our set were selected based on the requirement that five different topology prediction methods should all give the same predicted topology. We also note that the changes in the reliability score for the 37 proteins seen after the inclusion of the experimentally determined C-terminal location in the prediction have a distribution that is very similar to the one derived for the much larger set of bacterial inner membrane proteins analyzed previously (8Käll L. Sonnhammer E. FEBS Lett. 2002; 532: 415-418Crossref PubMed Scopus (67) Google Scholar) (data not shown). It is interesting to compare the 37 proteins studied here with TMHMM predictions for the whole S. cerevisiae membrane proteome (16Krogh A. Larsson B. von Heijne G. Sonnhammer E. J. Mol. Biol. 2001; 305: 567-580Crossref PubMed Scopus (8739) Google Scholar). The overall distribution of proteins with different numbers of transmembrane helices is roughly the same for the set of 37 proteins and the whole proteome, with peaks at 2 helices and 10–12 helices. Proteins with an even number of predicted transmembrane helices are 1.8-fold more numerous than proteins with an odd number of helices among the 37 proteins and are 1.7-fold more numerous in the whole proteome (excluding the single-spanning proteins). There are 1.8 times more proteins with Cin as compared with Coutorientation in our set (1.7 times more in the whole proteome) and 4.3 times more proteins that are predicted to have Nin as compared with Nout orientation in our set (1.6 times more in the whole proteome). The proteins analyzed here thus seem to represent a rough cross-section of the whole proteome, except that their topologies are easier to predict and have higher TMHMM reliability scores than the proteome as a whole. In summary, we have shown that reliable topology models for S. cerevisiae membrane proteins can be produced on a reasonably large scale by a combination of C-terminal reporter fusion analysis and theoretical prediction. This approach not only reduces the experimental efforts required but also avoids the pitfalls inherent to fusions between a truncated target protein and topology reporters. We thank Johan Nilsson (Stockholm Bioinformatics Center) for bioinformatics analyses. Download .pdf (.85 MB) Help with pdf files
Referência(s)