A Bioinformatics Approach for Integrated Transcriptomic and Proteomic Comparative Analyses of Model and Non-sequenced Anopheline Vectors of Human Malaria Parasites
2012; Elsevier BV; Volume: 12; Issue: 1 Linguagem: Inglês
10.1074/mcp.m112.019596
ISSN1535-9484
AutoresCeereena Ubaida‐Mohien, David R. Colquhoun, Derrick Mathias, John G. Gibbons, Jennifer S. Armistead, Maria Carmen Rodríguez, Mario H. Rodrı́guez, Nathan Edwards, Jürgen Hartler, Gerhard Thallinger, David R. Graham, Jesús Martínez-Barnetche, Antonis Rokas, Rhoel R. Dinglasan,
Tópico(s)Invertebrate Immune Response Mechanisms
ResumoMalaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the "model" African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax–An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus. Malaria morbidity and mortality caused by both Plasmodium falciparum and Plasmodium vivax extend well beyond the African continent, and although P. vivax causes between 80 and 300 million severe cases each year, vivax transmission remains poorly understood. Plasmodium parasites are transmitted by Anopheles mosquitoes, and the critical site of interaction between parasite and host is at the mosquito's luminal midgut brush border. Although the genome of the "model" African P. falciparum vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines, especially non-sequenced P. vivax vectors such as Anopheles albimanus. Clearly, technologies and platforms that bridge this substantial scientific gap are required in order to provide public health scientists with key transcriptomic and proteomic information that could spur the development of novel interventions to combat this disease. To our knowledge, no approaches have been published that address this issue. To bolster our understanding of P. vivax–An. albimanus midgut interactions, we developed an integrated bioinformatic-hybrid RNA-Seq-LC-MS/MS approach involving An. albimanus transcriptome (15,764 contigs) and luminal midgut subproteome (9,445 proteins) assembly, which, when used with our custom Diptera protein database (685,078 sequences), facilitated a comparative proteomic analysis of the midgut brush borders of two important malaria vectors, An. gambiae and An. albimanus. Malaria transmission entails the obligatory development of Plasmodium in Anopheles mosquitoes (Fig. 1A). Although the majority of the 900,000 malaria deaths per year (caused primarily by Plasmodium falciparum) occur in Africa, malaria morbidity and mortality extend to other continents. Outside of Africa, malaria is caused by both P. falciparum and P. vivax. In fact, P. vivax has the widest geographic distribution among human malaria parasites and is responsible for between 80 and 300 million severe clinical cases every year (1Mueller I. Galinski M.R. Baird J.K. Carlton J.M. Kochar D.K. Alonso P.L. del Portillo H.A. Key gaps in the knowledge of Plasmodium vivax, a neglected human malaria parasite.Lancet Infect. Dis. 2009; 9: 555-566Abstract Full Text Full Text PDF PubMed Scopus (484) Google Scholar). Despite this substantial burden of disease, P. vivax has received less attention, in terms of research efforts and resources, than P. falciparum (1Mueller I. Galinski M.R. Baird J.K. Carlton J.M. Kochar D.K. Alonso P.L. del Portillo H.A. Key gaps in the knowledge of Plasmodium vivax, a neglected human malaria parasite.Lancet Infect. Dis. 2009; 9: 555-566Abstract Full Text Full Text PDF PubMed Scopus (484) Google Scholar, 2Alonso P.L. Brown G. Arevalo-Herrera M. Binka F. Chitnis C. Collins F. Doumbo O.K. Greenwood B. Hall B.F. Levine M.M. Mendis K. Newman R.D. Plowe C.V. Rodriguez M.H. Sinden R. Slutsker L. Tanner M. A research agenda to underpin malaria eradication.PLoS Med. 2011; 8: e1000406Crossref PubMed Scopus (464) Google Scholar). Anopheles albimanus is one of the primary P. vivax mosquito vectors in the Americas. Specialized P. vivax–An. albimanus genotypic interactions have been shown to occur in Mexico (3Rodriguez M.H. Gonzalez-Ceron L. Hernandez J.E. Nettel J.A. Villarreal C. Kain K.C. Wirtz R.A. Different prevalences of Plasmodium vivax phenotypes VK210 and VK247 associated with the distribution of Anopheles albimanus and Anopheles pseudopunctipennis in Mexico.Am. J. Trop. Med. Hyg. 2000; 62: 122-127Crossref PubMed Scopus (54) Google Scholar), with a distinct genetic P. vivax population mirroring the geographic dispersal of An. albimanus (4Joy D.A. Gonzalez-Ceron L. Carlton J.M. Gueye A. Fay M. McCutchan T.F. Su X.Z. Local adaptation and vector-mediated population structure in Plasmodium vivax malaria.Mol. Biol. Evol. 2008; 25: 1245-1252Crossref PubMed Scopus (89) Google Scholar). This degree of specificity underlying vector–parasite interactions suggest that genetic "compatibility" between parasite and mosquito, likely the outcome of co-evolutionary history, is an important factor in malaria epidemiology. An. albimanus is also a competent vector for P. falciparum (5Collins W.E. Warren M. Skinner J.C. Richardson B.B. Kearse T.S. Infectivity of the Santa Lucia (El Salvador) strain of Plasmodium falciparum to different anophelines.J. Parasit. 1977; 63: 57-61Crossref PubMed Scopus (26) Google Scholar, 6Olano V.A. Carrillo M.P. Delavega P. Espinal C.A. Vector competence of Cartagena strain of Anopheles albimanus for Plasmodium falciparum and.Plasmodium vivax. Trans. R. Soc. Trop. Med. Hyg. 1985; 79: 685-686Abstract Full Text PDF PubMed Scopus (4) Google Scholar). Thus, despite evidence of P. vivax–An. albimanus co-evolution at the population level, certain interactions between parasites and mosquitoes may be conserved across different Anopheles/Plasmodium species combinations that sustain malaria transmission. The prevailing notion that a subset of anophelines are "competent" malaria vectors (7Collins W.E. McClure H. Strobert E. Skinner J.C. Richardson B.B. Roberts J.M. Galland G.G. Sullivan J. Morris C.L. Adams S.R. Experimental infection of Anopheles gambiae s.s., Anopheles freeborni and Anopheles stephensi with Plasmodium malariae and.Plasmodium brasilianum. J. Am. Mosq. Control Assoc. 1993; 9: 68-71PubMed Google Scholar, 8Kiszewski A. Mellinger A. Spielman A. Malaney P. Sachs S.E. Sachs J. A global index representing the stability of malaria transmission.Am. J. Trop. Med. Hyg. 2004; 70: 486-498Crossref PubMed Scopus (367) Google Scholar) reinforces the hypothesis that there is a conserved set of molecules on the midgut epithelial brush border microvilli (BBMV) 1The abbreviations used are:BBMVbrush border microvilliDRMdetergent resistant membraneLC-MS/MSliquid chromatography tandem mass spectrometryMSmass spectrometryRNA-Seqnext-generation RNA sequencingTBVtransmission-blocking vaccine. 1The abbreviations used are:BBMVbrush border microvilliDRMdetergent resistant membraneLC-MS/MSliquid chromatography tandem mass spectrometryMSmass spectrometryRNA-Seqnext-generation RNA sequencingTBVtransmission-blocking vaccine. surface of anopheline malaria vectors that act as binding-ligands for Plasmodium ookinetes (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar) (Fig. 1A). Such mosquito ligands have been shown to be critical candidate targets for malaria transmission-blocking vaccines (TBVs) (10Lavazec C. Boudin C. Lacroix R. Bonnet S. Diop A. Thiberge S. Boisson B. Tahar R. Bourgouin C. Carboxypeptidases B of Anopheles gambiae as targets for a Plasmodium falciparum transmission-blocking vaccine.Infect. Immun. 2007; 75: 1635-1642Crossref PubMed Scopus (63) Google Scholar, 11Mathias D.K. Plieskatt J.L. Armistead J.S. Bethony J.M. Abdul-Majid K.B. McMillan A. Angov E. Aryee M.J. Zhan B. Gillespie P. Keegan B. Jariwala A.R. Rezende W. Bottazzi M.E. Scorpio D.G. Hotez P.J. Dinglasan R.R. Expression, immunogenicity, histopathology, and potency of a mosquito-based malaria transmission-blocking recombinant vaccine.Infect. Immun. 2012; 80: 1606-1614Crossref PubMed Scopus (41) Google Scholar, 12Dinglasan R.R. Kalume D.E. Kanzok S.M. Ghosh A.K. Muratova O. Pandey A. Jacobs-Lorena M. Disruption of Plasmodium falciparum development by antibodies against a conserved mosquito midgut antigen.Proc. Natl. Acad. Sci. U.S.A. 2007; 104: 13461-13466Crossref PubMed Scopus (134) Google Scholar, 13Dinglasan R.R. Jacobs-Lorena M. Flipping the paradigm on malaria transmission-blocking vaccines.Trends Parasitol. 2008; 24: 364-370Abstract Full Text Full Text PDF PubMed Scopus (62) Google Scholar), and it has been hypothesized recently that a subset of the BBMV-associated glycoproteins can be clustered via detergent resistant membranes (DRMs) to form a receptor complex for the ookinete (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar) (Fig. 1B). However, the current list of candidates is scant, and the remaining complement of TBV targets present on the BBMV, especially those that are not enriched in DRMs, remains unknown (Fig. 1B). Experiments that could validate these hypotheses are hampered by the dearth of molecular information available for non-model vectors. brush border microvilli detergent resistant membrane liquid chromatography tandem mass spectrometry mass spectrometry next-generation RNA sequencing transmission-blocking vaccine. brush border microvilli detergent resistant membrane liquid chromatography tandem mass spectrometry mass spectrometry next-generation RNA sequencing transmission-blocking vaccine. Although the genome of the African malaria vector, Anopheles gambiae, has been sequenced, evolutionary divergence limits its utility as a reference across anophelines (14Holt R.A. Subramanian G.M. Halpern A. Sutton G.G. Charlab R. Nusskern D.R. Wincker P. Clark A.G. Ribeiro J.M. Wides R. Salzberg S.L. Loftus B. Yandell M. Majoros W.H. Rusch D.B. Lai Z. Kraft C.L. Abril J.F. Anthouard V. Arensburger P. Atkinson P.W. Baden H. de Berardinis V. Baldwin D. Benes V. Biedler J. Blass C. Bolanos R. Boscus D. Barnstead M. Cai S. Center A. Chaturverdi K. Christophides G.K. Chrystal M.A. Clamp M. Cravchik A. Curwen V. Dana A. Delcher A. Dew I. Evans C.A. Flanigan M. Grundschober-Freimoser A. Friedli L. Gu Z. Guan P. Guigo R. Hillenmeyer M.E. Hladun S.L. Hogan J.R. Hong Y.S. Hoover J. Jaillon O. Ke Z. Kodira C. Kokoza E. Koutsos A. Letunic I. Levitsky A. Liang Y. Lin J.J. Lobo N.F. Lopez J.R. Malek J.A. McIntosh T.C. Meister S. Miller J. Mobarry C. Mongin E. Murphy S.D. O'Brochta D.A. Pfannkoch C. Qi R. Regier M.A. Remington K. Shao H. Sharakhova M.V. Sitter C.D. Shetty J. Smith T.J. Strong R. Sun J. Thomasova D. Ton L.Q. Topalis P. Tu Z. Unger M.F. Walenz B. Wang A. Wang J. Wang M. Wang X. Woodford K.J. Wortman J.R. Wu M. Yao A. Zdobnov E.M. Zhang H. Zhao Q. Zhao S. Zhu S.C. Zhimulev I. Coluzzi M. della Torre A. Roth C.W. Louis C. Kalush F. Mural R.J. Myers E.W. Adams M.D. Smith H.O. Broder S. Gardner M.J. Fraser C.M. Birney E. Bork P. Brey P.T. Venter J.C. Weissenbach J. Kafatos F.C. Collins F.H. Hoffman S.L. The genome sequence of the malaria mosquito.Anopheles gambiae. Science. 2002; 298: 129-149Google Scholar, 15Hittinger C.T. Johnston M. Tossberg J.T. Rokas A. Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life.Proc. Natl. Acad. Sci. U.S.A. 2010; 107: 1476-1481Crossref PubMed Scopus (87) Google Scholar, 16Krzywinski J. Besansky N.J. Molecular systematics of Anopheles: from subgenera to subpopulations.Annu. Rev. Entomol. 2003; 48: 111-139Crossref PubMed Scopus (143) Google Scholar, 17Zdobnov E.M. von Mering C. Letunic I. Torrents D. Suyama M. Copley R.R. Christophides G.K. Thomasova D. Holt R.A. Subramanian G.M. Mueller H.M. Dimopoulos G. Law J.H. Wells M.A. Birney E. Charlab R. Halpern A.L. Kokoza E. Kraft C.L. Lai Z. Lewis S. Louis C. Barillas-Mury C. Nusskern D. Rubin G.M. Salzberg S.L. Sutton G.G. Topalis P. Wides R. Wincker P. Yandell M. Collins F.H. Ribeiro J. Gelbart W.M. Kafatos F.C. Bork P. Comparative genome and proteome analysis of Anopheles gambiae and.Drosophila melanogaster. Science. 2002; 298: 149-159Google Scholar). Thus, our understanding of the similarities and differences among vivax–albimanus, falciparum–albimanus, and falciparum–gambiae interactions remains poor. Inspired by the renewed emphasis in the malaria community on advancing studies of P. vivax transmission biology, we developed a robust, hybrid sequencing workflow to produce high-quality, assembled transcriptomic data to drive comparative midgut proteomics analyses of the "model" P. falciparum mosquito vector, Anopheles gambiae, and a non-sequenced, predominantly P. vivax vector, An. albimanus. This workflow bridges the transcriptomic and proteomic gap between these anopheline vectors, thereby enabling studies aimed at providing new biological insight into the vivax–anopheles dyad that could translate into the development of novel interventions. Anopheles gambiae (Keele) (18Hurd H. Taylor P. Adams D. Underhill A. Eggleston P. Measuring the costs of mosquito resistance to malaria infection.Evolution. 2005; 12: 2560-2572Crossref Google Scholar) mosquitoes (5 to 6 days old, n = 1,000) at Johns Hopkins University and An. albimanus (3 to 5 days old, n = 2,000) white striped colony at the National Institute of Public Health, Mexico, were used and maintained following standard conditions [see "Methods in Anopheles Research," available at the Malaria Research and Reference Reagent Resource Center (MR4) Web site]. Midguts from both species were dissected and stored frozen in PBS supplemented with protease inhibitor mixture (PIC). BBMVs were prepared following established protocols (19Abdul-Rauf M. Ellar D.J. Isolation and characterization of brush border membrane vesicles from whole Aedes aegypti larvae.J. Invertebr. Pathol. 1999; 73: 45-51Crossref PubMed Scopus (29) Google Scholar), with modifications (20Hauser H. Howell K. Dawson R.M.C. Bowyer D.E. Rabbit small intestinal brush-border membrane preparation and lipid-composition.Biochim. Biophys. Acta. 1980; 602: 567-577Crossref PubMed Scopus (245) Google Scholar). Four hundred midguts were washed and resuspended in 200 μl in microvilli buffer (50 mm mannitol, 20 mm Tris-HCl pH 7.4, 1 mm PMSF, 3 mm imidazole-HCl) with PIC (Sigma, St. Louis, MO) and processed as described elsewhere (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar). SDS-PAGE and APN1 Western blots were carried out as described elsewhere (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar). Aminopeptidase (APN) activity in An. gambiae BBMV was assayed with l-leucine p-nitroanilide as substrate in a 96-well plate in a final volume of 210 μl per well. The BBMV were diluted in APN buffer (10 mm Tris-HCl, pH 7.4, 150 mm NaCl) to a final concentration of 6 μg/ml, distributed in the wells, and incubated for 15 min at 37 °C. The APN substrate (2 mm in APN buffer) was then added, and the initial rate of free ρ-nitroanilide product formation at 405 nm (SpectraMax, Molecular Devices, Sunnyvale, CA, USA) was used to calculate the specific APN enzymatic activity. Approximately 40 μg of BBMV proteins were resuspended in 8 m deionized urea and reduced and alkylated (in tris(2-carboxyethyl)phosphine and methyl methanethiosulfonate, respectively). Proteins were digested for 12 h at 30 °C using 1 μg LysC (Promega, Madison, WI). The resulting digests were diluted to 2 m urea/20 mm NH4HCO3 and digested overnight at 30 °C using 1 μg proteomics-grade trypsin (Promega). Digested, desalted, and dried peptides were separated using an Agilent 3100 OFFgel fractionator (Agilent, Santa Clara, CA). Samples were separated as described elsewhere (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar), concentrated, and analyzed on an Agilent LC-MS system comprising a 1200 LC system coupled to a 6520 Q-TOF via an HPLC Chip (160 nL, 300 Å C18 150 mm column) Cube interface, using previously described parameters (9Parish L.A. Colquhoun D.R. Mohien C.U. Lyashkov A.E. Graham D.R. Dinglasan R.R. Ookinete-interacting proteins on the microvillar surface are partitioned into detergent resistant membranes of Anopheles gambiae midguts.J. Proteome Res. 2011; 10: 5150-5162Crossref PubMed Scopus (29) Google Scholar). Total RNA from 50 An. albimanus midguts was extracted using TRIzol (Invitrogen), DNase treated and cleaned with an RNeasy column (Qiagen, Valenci, CA, USA), and quality checked using a Bioanalyzer (Agilent Technologies). The mRNA libraries were constructed and sequenced, as described elsewhere (15Hittinger C.T. Johnston M. Tossberg J.T. Rokas A. Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life.Proc. Natl. Acad. Sci. U.S.A. 2010; 107: 1476-1481Crossref PubMed Scopus (87) Google Scholar, 21Gibbons J.G. Beauvais A. Beau R. McGary K.L. Latge J.P. Rokas A. Global transcriptome changes underlying colony growth in the opportunistic human pathogen.Aspergillus fumigatus. Eukaryot. Cell. 2012; 11: 68-78Crossref PubMed Scopus (88) Google Scholar, 22Gibbons J.G. Janson E.M. Hittinger C.T. Johnston M. Abbot P. Rokas A. Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.Mol. Biol. Evol. 2009; 26: 2731-2744Crossref PubMed Scopus (123) Google Scholar), on a single lane of an Illumina HiSeq 2000, which generated ∼210 million 101 base pair paired end reads. The library preparation for Roche 454 sequencing data is described in greater detail elsewhere (23Martinez-Barnetche J. Gomez-Barreto R.E. Ovilla-Munoz M. Tellez-Sosa J. Garcia-Lopez D.E. Dinglasan R.R. Ubaida Mohien C. Maccallum R.M. Redmond S.N. Gibbons J.G. Rokas A. Machado C.M. Cazares-Raga F. Gonzalez-Ceron L. Hernandez-Martinez S. Rodriguez-Lopez M.H. Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.BMC Genomics. 2012; 13: 207Crossref PubMed Scopus (31) Google Scholar). However, transcriptome assembly for all transcriptome data followed the same analysis process described below. We assembled the ∼210 million 101 base pair paired end Illumina reads and ∼430,000 Roche 454 reads using Velvet (24Zerbino D.R. Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.Genome Res. 2008; 18: 821-829Crossref PubMed Scopus (7376) Google Scholar) and Oases (25Schulz M.H. Zerbino D.R. Vingron M. Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.Bioinformatics. 2012; 28: 1086-1092Crossref PubMed Scopus (1045) Google Scholar). To optimize our parameters, we first assembled a subset of the Illumina and 454 reads using 13 different k-mer values and found that k-mer values in the 40s gave the largest average and median-sized contigs and the largest cumulative assembly. To overcome memory constraints, we equally divided the complete set of paired end Illumina sequence reads into eight sets. We implemented the multiple k-mer assembly (26Surget-Groba Y. Montoya-Burgos J.I. Optimization of de novo transcriptome assembly from next-generation sequencing data.Genome Res. 2010; 20: 1432-1440Crossref PubMed Scopus (284) Google Scholar) independently for each read set, along with the 454 data, using three different k-mer values (43, 45, and 47). We merged and reassembled the resulting outputs of the 24 assemblies using Velvet and Oases using a k-mer value of 47, which produced the final transcriptome assembly. We filtered the assembly to retain genes that contained a single transcript, were longer than 300 base pairs, and had confidence scores of 1.0. As detailed in Ref. 23Martinez-Barnetche J. Gomez-Barreto R.E. Ovilla-Munoz M. Tellez-Sosa J. Garcia-Lopez D.E. Dinglasan R.R. Ubaida Mohien C. Maccallum R.M. Redmond S.N. Gibbons J.G. Rokas A. Machado C.M. Cazares-Raga F. Gonzalez-Ceron L. Hernandez-Martinez S. Rodriguez-Lopez M.H. Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.BMC Genomics. 2012; 13: 207Crossref PubMed Scopus (31) Google Scholar, briefly, An. albimanus midgut, abdominal cuticle, and dorsal vessel preparation was used to generate a set of 15,764 transcripts, ∼92% of which (15,441) mapped to the An. darlingi genome (11,430 predicted protein coding genes) and ∼57% of which (9,684) mapped to the An. gambiae genome (∼13,320 predicted protein coding genes). Given these data, we argue that the 16,699 proteins (our transcript set (15,764) plus the salivary gland Sanger contigs (935)) predicted from the current An. albimanus transcriptome should be virtually a complete transcriptome. We should emphasize here that because the transcriptome was assembled de novo, the data represent an independent sampling of the predicted protein sequences. With respect to proteome coverage, we find that of the 15,764 transcripts, 14,887 transcripts (∼94%) mapped to An. gambiae and 8,480 transcripts (∼54%) mapped to An. darlingi (NCBInr, August 11, 2011). The An. albimanus transcriptome dataset is available through VectorBase (23Martinez-Barnetche J. Gomez-Barreto R.E. Ovilla-Munoz M. Tellez-Sosa J. Garcia-Lopez D.E. Dinglasan R.R. Ubaida Mohien C. Maccallum R.M. Redmond S.N. Gibbons J.G. Rokas A. Machado C.M. Cazares-Raga F. Gonzalez-Ceron L. Hernandez-Martinez S. Rodriguez-Lopez M.H. Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.BMC Genomics. 2012; 13: 207Crossref PubMed Scopus (31) Google Scholar). MS raw files for each sample were converted to mzXML format using Trapper 4.3.0 (Institute for Systems Biology, Seattle, WA). The peaklists from all OFFgel fractions for each replicate (12 to 24 fractions per replicate) were merged into a single mzXML data format file; three biological replicates from An. gambiae and three from An. albimanus, for a total of six mzXML files, contained ∼1,000,000 spectra. Data were uploaded and searched using the PepArML metasearch engine (27Edwards N. Wu X. Tseng C.W. An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra.Clin. Proteomics. 2009; 5: 23-36Crossref Scopus (47) Google Scholar), which automatically conducts target and decoy searches using one or more of Mascot 2.2 (28Perkins D.N. Pappin D.J.C. Creasy D.M. Cottrell J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6763) Google Scholar), OMSSA 2.1.1 (29Geer L.Y. Markey S.P. Kowalak J.A. Wagner L. Xu M. Maynard D.M. Yang X.Y. Shi W.Y. Bryant S.H. Open mass spectrometry search algorithm.J. Proteome Res. 2004; 3: 958-964Crossref PubMed Scopus (1164) Google Scholar), and Tandem 2010.01.01.4 (30Craig R. Beavis R.C. TANDEM: matching proteins with tandem mass spectra.Bioinformatics. 2004; 20: 1466-1467Crossref PubMed Scopus (1987) Google Scholar) with native, K-score 2010.01.01.4 (31MacLean B. Eng J.K. Beavis R.C. McIntosh M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine.Bioinformatics. 2006; 22: 2830-2832Crossref PubMed Scopus (183) Google Scholar), and S-score 2010.01.01.4 pluggable scoring modules, MyriMatch 1.5.8 (32Tabb D.L. Fernando C.G. Chambers M.C. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis.J. Proteome Res. 2007; 6: 654-661Crossref PubMed Scopus (445) Google Scholar), and Inspect 20110313 (33Tanner S. Shu H. Frank A. Wang L.C. Zandi E. Mumby M. Pevzner P.A. Bafna V. InsPecT: identification of posttranslationally modified peptides from tandem mass spectra.Anal. Chem. 2005; 77: 4626-4639Crossref PubMed Scopus (501) Google Scholar) with MS-GF spectral probability scoring (34Kim S. Gupta N. Pevzner P.A. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases.J. Proteome Res. 2008; 7: 3354-3363Crossref PubMed Scopus (327) Google Scholar). It also combines the search results using an unsupervised machine-learning strategy and estimates peptide identification false discovery rates using identifications from the reversed decoy searches (35Elias J.E. Gygi S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat. Methods. 2007; 4: 207-214Crossref PubMed Scopus (2827) Google Scholar). The data were searched against an in-house Diptera FASTA database generated by extracting annotated sequences from the NCBInr (March 2011), with ∼ 685,078 entries from the Diptera order. The PepArML search was performed with variable modifications of methylmethanethiosulfonate and oxidized methionine, mass tolerances of 30 ppm and 20 ppm for precursor and fragment ions, respectively, and one missed cleavage. Search engines Mascot, OMSSA, and Tandem (native scoring) were used. The results were parsed into the MASPECTRAS 2 data analysis system with data filters of 5% peptide FDR and two peptides per protein minimum (36Mohien C.U. Hartler J. Breitwieser F. Rix U. Rix L.R. Winter G.E. Thallinger G.G. Bennett K.L. Superti-Furga G. Trajanoski Z. Colinge J. MASPECTRAS 2: an integration and analysis platform for proteomic data.Proteomics. 2010; 10: 2719-2722Crossref PubMed Scopus (19) Google Scholar). The metasearch peptide identifications were combined and clustered together if peptide identifications were shared between them, as this indicates substantial sequence similarity and functional homology, and the leader of each group of proteins was considered a unique protein identification and a definite protein entry (36Mohien C.U. Hartler J. Breitwieser F. Rix U. Rix L.R. Winter G.E. Thallinger G.G. Bennett K.L. Superti-Furga G. Trajanoski Z. Colinge J. MASPECTRAS 2: an integration and analysis platform for proteomic data.Proteomics. 2010; 10: 2719-2722Crossref PubMed Scopus (19) Google Scholar). Throughout this paper, we report only the unique identifications, specifically, the leader proteins of each protein group. The resultant contigs from the assembled transcript data from Illumina and 454 pyrosequencing were translated in open reading frames using EMBOSS Transeq (http://www.ebi.ac.uk). The sequences were trimmed at the stop codon (TAA, TAG, and TGA) so as to (1) select the longest reading frame and (2) discard sequences that were less than 50 amino acids. All RNA-Seq transcripts that met these criteria from all six frames were selected to create a FASTA database (Anopheles protein database) with randomly generated unique identifiers, along with the contig name, assigned to each entry, providing a unique identifier for each entry. Because the transcriptome is un-annotated and transcript sequences are often less well characterized, resulting in unexpected cleavage sites, we considered the analysis as requiring greater search sensitivity, so the method was changed as follows: the individual MS raw files from An. albimanus and An. gambiae were searched against the transcript translated database with a semi-tryptic search. To maximize search sensitivity, all PepArML search engines were used: Mascot; OMSSA; X!Tandem with native, K-score, and S-score plugins; MyriMatch; and InsPecT +MS-GF. Amazon Web-Services Elastic Compute Cloud (EC2) resources were used to supplement the local PepArML computing resources to carry out this computationally intensive search on a total of 947,147 spectra. The remainder of the analysis was completed as described in a later section (Fig. 2). The data analysis system meets all standards regarding the Minimum Information About a Proteomics Experiment (MIAPE), and our data, including the Diptera FASTA database, have been deposited in the ProteomeExchange via the PRIDE partner repository with the dataset identifier PXD000062. Transcript sources were annotated by Blast2GO (37Conesa A. Gotz S. Garcia-Gomez J.M. Terol J. Talon M. Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.Bioinformatics. 2005; 21: 3674-3676Crossref PubMed Scopus (8558) Google Scholar); specifically, the Gene Ontology (GO) database was searched by BLAST homology for annotations. NCBInr was used for the hom
Referência(s)