WheatNet: a Genome-Scale Functional Network for Hexaploid Bread Wheat, Triticum aestivum
2017; Elsevier BV; Volume: 10; Issue: 8 Linguagem: Inglês
10.1016/j.molp.2017.04.006
ISSN1674-2052
AutoresTak Lee, Sohyun Hwang, Chan Yeong Kim, Hongseok Shim, Hyojin Kim, Pamela C. Ronald, Edward M. Marcotte, Insuk Lee,
Tópico(s)Chromosomal and Genetic Variations
ResumoGene networks provide a system-level overview of genetic organizations and enable the dissection of functional modules underlying complex traits. Integration of diverse genomics data based on the Bayesian statistics framework has been successfully applied to the construction of genome-scale functional networks for major crop species such as rice (Lee et al., 2011Lee I. Seo Y.S. Coltrane D. Hwang S. Oh T. Marcotte E.M. Ronald P.C. Genetic dissection of the biotic stress response using a genome-scale gene network for rice.Proc. Natl. Acad. Sci. USA. 2011; 108: 18548-18553Crossref PubMed Scopus (131) Google Scholar), soybean (Kim et al., 2017Kim E. Hwang S. Lee I. SoyNet: a database of co-functional networks for soybean Glycine max.Nucleic Acids Res. 2017; 45: D1082-D1089Crossref PubMed Scopus (38) Google Scholar), and tomato (Kim et al., 2016Kim H. Kim B.S. Shim J.E. Hwang S. Yang S. Kim E. Iyer-Pascuzzi A.S. Lee I. TomatoNet: a genome-wide co-functional network for unveiling complex traits of tomato, a model crop for fleshy fruits.Mol. Plant. 2016; 10: 652-655Abstract Full Text Full Text PDF PubMed Scopus (14) Google Scholar), and their predictive power for gene-to-trait associations has been demonstrated. However, such a predictive gene network is not yet available for bread wheat, Triticum aestivum, an important staple food crop accounting for approximately 20% of the world's daily food consumption. Bread wheat also serves as a model for studying polyploidy in plants. Some of the reasons that functional genomics studies on bread wheat have lagged behind those on other crops include the large genome of bread wheat (∼17 Gb) and its polyploidy nature, which complicates genetic analysis. However, recent advances in wheat research have considerably improved genome assembly and gene models (International Wheat Genome Sequencing Consortium, 2014International Wheat Genome Sequencing Consortium A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome.Science. 2014; 345: 1251788Crossref PubMed Scopus (1186) Google Scholar). Furthermore, the discovery and application of genome editing (Upadhyay et al., 2013Upadhyay S.K. Kumar J. Alok A. Tuli R. RNA-guided genome editing for target gene mutations in wheat.G3 (Bethesda). 2013; 3: 2233-2238Crossref PubMed Scopus (278) Google Scholar) and TILLING technologies (Uauy et al., 2009Uauy C. Paraiso F. Colasuonno P. Tran R.K. Tsai H. Berardi S. Comai L. Dubcovsky J. A modified TILLING approach to detect induced mutations in tetraploid and hexaploid wheat.BMC Plant Biol. 2009; 9: 115Crossref PubMed Scopus (232) Google Scholar) have enabled targeted mutagenesis in wheat protoplasts and whole plants, setting the stage for the application of reverse genetics approaches for functional characterization of wheat genes. Here, we present WheatNet, a genome-scale functional gene network for T. aestivum and associated web server (www.inetbio.org/wheatnet), which provides network information and generates network-based functional hypotheses. WheatNet was constructed by integrating 20 distinct genomics datasets (Supplemental Table 1), including 156 000 wheat-specific co-expression links mined from 1929 DNA microarray datasets (Supplemental Table 2). A unique feature of WheatNet compared with previously constructed crop functional networks is that each network node in WheatNet represents either a single gene or a group of genes to reduce complexity. An allopolyploid wheat genome contains three homeologous chromosome sets—A, B, and D—that originate from three closely related species Triticum urartu, Aegilops speltoides, and Aegilops tauschii, respectively (International Wheat Genome Sequencing Consortium, 2014International Wheat Genome Sequencing Consortium A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome.Science. 2014; 345: 1251788Crossref PubMed Scopus (1186) Google Scholar). Therefore, the wheat genome contains many homologous genes between the three ancestral chromosome sets. Because homeologs are likely to have redundant functions, collapsing homeologs into a single network node would facilitate the network analysis by reducing network complexity. Unfortunately, comprehensive definitions of wheat homeologous relationships are not yet available. Therefore, we computationally partitioned “gene groups” mimicking homeologous genes by clustering 99 386 wheat genes, resulting in 20 248 gene groups comprising 63 401 genes, and 35 985 individual genes. WheatNet was thus constructed using 56 233 nodes; the final network has 20 230 nodes (13 430 gene groups and 16 800 individual genes) and 567 000 edges, integrating 20 sources of functional evidence linking pairs of genes (Supplemental Methods). The edge information of the integrated WheatNet and all 20 component networks are available for download. To assess WheatNet, we used biological process annotations by agriGO (Du et al., 2010Du Z. Zhou X. Ling Y. Zhang Z. Su Z. agriGO: a GO analysis toolkit for the agricultural community.Nucleic Acids Res. 2010; 38: W64-W70Crossref PubMed Scopus (1955) Google Scholar), which are moderately distinct from the dataset used for network training (∼38% gene pairs by shared agriGO annotations overlap the training data) and one of the few other large-scale wheat annotation sets available for testing. To help reduce bias, we excluded agriGO terms that annotate more than 300 wheat genes. Next, the accuracy of functional gene pairs by WheatNet or by random chance was measured using the proportion of gene pairs that share agriGO annotations for different coverage of the coding genome. We observed strong performance by WheatNet, in which a network covering approximately 20% of all genes map functional gene pairs with about 40% accuracy (Supplemental Figure 1). The quality of WheatNet was further evaluated by the degree of connectivity among genes involved in a particular biological process. Considering that genes for the same complex traits are more likely to be functionally coupled, high connectivity among known genes for a trait would support the quality of functional networks. We tested network connectivity for a group of genes based on two measures: (1) the number of edges among gene members (i.e., within-group edge count) and (2) the number of network neighbors that overlap among group members (i.e., network neighbor overlap). We used genes for two complex traits derived from proteomics studies: 45 genes with differential protein expression after Blumeria graminis f. sp. tritici infection (Mandal et al., 2014Mandal M.S. Fu Y. Zhang S. Ji W. Proteomic analysis of the defense response of wheat to the powdery mildew fungus, Blumeria graminis f. sp. tritici.Protein J. 2014; 33: 513-524Crossref PubMed Scopus (13) Google Scholar) and 17 genes with differential protein expression under drought conditions (Cheng et al., 2015Cheng Z. Dong K. Ge P. Bian Y. Dong L. Deng X. Li X. Yan Y. Identification of leaf proteins differentially accumulated between wheat cultivars distinct in their levels of drought tolerance.PLoS One. 2015; 10: e0125302PubMed Google Scholar). The significance of network connectivity was also measured based on a null distribution from 1000 random gene sets of the same size. We found that the connectivity among each trait's genes was significantly higher than by random chance (Figure 1A and 1B ). We consistently observed network communities of genes for both traits (Figure 1C and 1D). We conclude that WheatNet successfully predicts additional genes that are involved in a given trait. The WheatNet web server provides two options for prioritizing genes for wheat traits: (1) direct neighbors in the gene network and (2) context-associated hubs (CAHs). In the first approach, a user submits genes known for a trait that can guide network searches for new candidate genes. New genes are then ranked by the strength of evidence connecting them to the “guide genes,” measured for each candidate gene as the sum of network edge scores from that gene to the guide genes. The result page provides the ranked list of candidates and a visualization of the local guide gene network (Figure 1E). To provide functional clues for candidate genes, WheatNet provides available wheat and Arabidopsis gene annotations from the Gene Ontology biological process (GOBP) (Supplemental Methods). In the second approach, users exploit gene expression data related to a trait of interest. Gene expression profiles are one of the most common types of genomic data, and differential expression analysis provides many genes that are potentially associated with given traits such as abiotic and biotic stresses. However, many genes that are associated with stress conditions are not differentially expressed. Hypothesizing that a gene associated with many differentially expressed genes (DEGs) in stress (i.e., CAHs) is likely to be responsible for responses to the given stress condition, we prioritized genes by connections to the context-associated DEGs. To conduct CAH prioritization, we first defined a subnetwork that comprises a hub gene and all of its network neighbors in WheatNet. For the gene prioritization, we considered only subnetworks with hub genes that have at least 50 network neighbors. Assuming that DEGs are representative genes for a relevant biological context, we prioritized hub genes based on the enrichment of their network neighbors for the DEGs, measured using Fisher's exact test. The hub genes with significant enrichment (P < 0.01) of network neighbors for the DEGs are considered as CAHs and are presented as candidate genes for the context-associated trait. Similar to the network direct neighborhood search, all candidate genes are appended by GOBP annotations for wheat genes and for Arabidopsis orthologs. In addition, users can access a network view of a CAH and its connected DEGs by clicking each candidate gene (Figure 1F). The WheatNet predictions by each of the network-based gene prioritization methods were validated as follows. For the network direct neighborhood method, we evaluated the new candidate genes for drought stress response that were predicted by submitting 17 genes with differential protein expression under drought conditions (Cheng et al., 2015Cheng Z. Dong K. Ge P. Bian Y. Dong L. Deng X. Li X. Yan Y. Identification of leaf proteins differentially accumulated between wheat cultivars distinct in their levels of drought tolerance.PLoS One. 2015; 10: e0125302PubMed Google Scholar) as guide genes. We hypothesized that novel candidate genes for drought response are also likely to be expressed differentially under drought conditions. Thus, we investigated the enrichment of candidate drought response genes from DEGs under drought conditions. We generated a set of 2346 DEGs under drought conditions based on genes that showed more than 4-fold changes in expression levels at P < 0.01 (SRP045409 of NCBI Sequence Read Archive) (Liu et al., 2015Liu Z. Xin M. Qin J. Peng H. Ni Z. Yao Y. Sun Q. Temporal transcriptome profiling reveals expression partitioning of homeologous genes contributing to heat and drought acclimation in wheat (Triticum aestivum L.).BMC Plant Biol. 2015; 15: 152Crossref PubMed Scopus (240) Google Scholar). We found 15 drought-condition DEGs among the top 50 candidate genes by the network direct neighborhood method, which indicates more than 7-fold enrichment over predictions by random chance (15/50 = 0.3 by WheatNet versus 2346/56 233 = 0.042 by random chance). For the CAH method, we evaluated the candidate genes for Fusarium graminearum infection response that were predicted by submitting 837 DEGs after infection with F. graminearum (GEO: GSE54551 in NCBI Gene Expression Omnibus database) (Wojcik et al., 2015Wojcik P.I. Ouellet T. Balcerzak M. Dzwinel W. Identification of biomarker genes for resistance to a pathogen by a novel method for meta-analysis of single-channel microarray datasets.J. Bioinform. Comput. Biol. 2015; 13: 1550013Crossref PubMed Scopus (6) Google Scholar) as user input data. We found that the top 100 candidates by CAHs were significantly enriched for GOBP annotations relevant to fungus infection based on Arabidopsis orthologs: “response to chitin” (GO: 0010200, P = 9.72 × 10−31), “regulation of plant-type hypersensitive response” (GO: 0010363, P = 8.20 × 10−21), “defense response to fungus” (GO: 0050832, P = 1.73 × 10−20), “response to fungus” (GO: 0009620, P = 1.03 × 10−8), and “detection of biotic stimulus” (GO: 0009595, P = 3.43 × 10−5). These results indicate that WheatNet can effectively prioritize novel candidate genes for complex traits, including those governing abiotic and biotic stress responses, by using multiple network-based methods, which can be easily performed by simple submission of input data in the web server. WheatNet complements other types of knowledge mining systems (Hassani-Pak et al., 2016Hassani-Pak K. Castellote M. Esch M. Hindle M. Lysenko A. Taubert J. Rawlings C. Developing integrated crop knowledge networks to advance candidate gene discovery.Appl. Transl. Genom. 2016; 11: 18-26Crossref PubMed Scopus (30) Google Scholar) and provides a useful resource for systems biology and predictive genetics analysis of wheat. This work was supported by grants from the National Research Foundation of Korea (2012M3A9B4028641, 2012M3A9C7050151, and 2015R1A2A1A15055859) to I.L. This work was supported by a grant to P.C.R. and E.M.M. from NSF (1237975) and from the Welch Foundation (F1515) to E.M.M. The work conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231.
Referência(s)