Artigo Acesso aberto Revisado por pares

Widespread inter‐individual gene expression variability in Arabidopsis thaliana

2019; Springer Nature; Volume: 15; Issue: 1 Linguagem: Inglês

10.15252/msb.20188591

ISSN

1744-4292

Autores

Sandra Cortijo, Zeynep Gökçe Gayretli Aydın, Sebastian E. Ahnert, James Locke,

Tópico(s)

Plant Gene Expression Analysis

Resumo

Article24 January 2019Open Access Transparent process Widespread inter-individual gene expression variability in Arabidopsis thaliana Sandra Cortijo Sandra Cortijo The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Zeynep Aydin Zeynep Aydin The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Sebastian Ahnert Sebastian Ahnert The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author James CW Locke Corresponding Author James CW Locke [email protected] orcid.org/0000-0003-0670-1943 The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Sandra Cortijo Sandra Cortijo The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Zeynep Aydin Zeynep Aydin The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Sebastian Ahnert Sebastian Ahnert The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author James CW Locke Corresponding Author James CW Locke [email protected] orcid.org/0000-0003-0670-1943 The Sainsbury Laboratory, University of Cambridge, Cambridge, UK Search for more papers by this author Author Information Sandra Cortijo1, Zeynep Aydin1, Sebastian Ahnert1 and James CW Locke *,1 1The Sainsbury Laboratory, University of Cambridge, Cambridge, UK *Corresponding author. Tel: +44 1223 761110; E-mail: [email protected] Molecular Systems Biology (2019)15:e8591https://doi.org/10.15252/msb.20188591 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract A fundamental question in biology is how gene expression is regulated to give rise to a phenotype. However, transcriptional variability is rarely considered although it could influence the relationship between genotype and phenotype. It is known in unicellular organisms that gene expression is often noisy rather than uniform, and this has been proposed to be beneficial when environmental conditions are unpredictable. However, little is known about inter-individual transcriptional variability in multicellular organisms. Using transcriptomic approaches, we analysed gene expression variability between individual Arabidopsis thaliana plants growing in identical conditions over a 24-h time course. We identified hundreds of genes that exhibit high inter-individual variability and found that many are involved in environmental responses, with different classes of genes variable between the day and night. We also identified factors that might facilitate gene expression variability, such as gene length, the number of transcription factors regulating the genes and the chromatin environment. These results shed new light on the impact of transcriptional variability in gene expression regulation in plants. Synopsis RNA-seq on individual seedlings is used to detect inter-individual gene expression variability in Arabidopsis thaliana. 9% of the transcriptome is identified as highly variable for at least one time point during the day/night cycle. Two sets of highly variable genes were identified, either mainly variable during the day or night. Highly variable genes tend to be involved in the response to the environment. Highly variable genes tend to be shorter, to be targeted by a higher number of transcription factors and to be characterised by a "closed" chromatin environment. Gene expression variability for individual genes of interest can be viewed on https://jlgroup.shinyapps.io/AraNoisy/. Introduction Gene expression in individual cells is often noisy and dynamic. Genetically identical cells under the same environment can display widely different expression levels of key genes (Ko, 1992; Fiering et al, 2000; Martins & Locke, 2015). Noise in gene expression has been shown to have a significant impact on the design and function of genetic circuits in unicellular organisms (Elowitz et al, 2002; Eldar & Elowitz, 2010). It has also been observed in multiple pathways in mammalian cells (Yin et al, 2009; Mantsoki et al, 2016; Riddle et al, 2018), in Drosophila cells (Pare et al, 2009) and between individuals in Drosophila (Lin et al, 2016). However, gene expression variability has mostly been analysed for a few individual genes in plants at a single-cell resolution (Angel et al, 2015; Araujo et al, 2017; Meyer et al, 2017; Gould et al, 2018). Several studies suggest that transcriptional variability between cells can be exploited during development in multiple organisms (Wernet et al, 2006; Chang et al, 2008; Pare et al, 2009; Meyer et al, 2017). On the other hand, the identification of mutants in which transcriptional and/or phenotypic variability is increased indicates that variability is at least partly buffered or controlled (Rutherford & Lindquist, 1998; Queitsch et al, 2002; Raj et al, 2010; Folta et al, 2014; Schaefer et al, 2017). It is not known at a genome-wide scale to what extent gene expression can be variable during plant development or between identical plants. Plants are a promising system to examine the global properties of noise in gene expression, as phenotypic variability, also referred to as phenotypic instability, has been observed in multiple areas of plant growth and development. Inter-individual phenotypic variability has also been observed in isogenic populations of other organisms (Zhang et al, 2016; Roman et al, 2018), but the model plant Arabidopsis thaliana presents the advantage of being an inbreeding species where heterozygosity is extremely low (Abbott & Gomes, 1989). Phenotypic variability in plants can occur both within and between individuals that are growing in the same conditions. High levels of phenotypic variability have been described for seed germination time (Simons & Johnston, 2006; Venable, 2007; Mitchell et al, 2017), patterning of lateral roots (Forde, 2009) as well as for floral and foliar development (Paxman, 1956; Sakai & Shimamoto, 1965). It is not known whether such inter-individual phenotypic variability originates from responses to microenvironmental perturbations, or from stochastic factors at the cellular level or from both. Differences in the level of inter-individual variability have been observed between natural accessions, in recombinant inbred lines and also in mutants for many traits such as growth, hypocotyl length, leaf and flower number, plant height and plant defence metabolism (Hall et al, 2007; Jimenez-Gomez et al, 2011; Folta et al, 2014; Hong et al, 2016; Schaefer et al, 2017). Jimenez-Gomez and colleagues also identified QTLs explaining differences in the level of expression variability between pools of plants in Arabidopsis thaliana (Jimenez-Gomez et al, 2011). This suggests that such variability can be controlled or buffered by genetic factors. However, the molecular mechanisms underlying such inter-individual phenotypic variability are still poorly understood. In this work, we analyse gene expression variability between multicellular individuals using the model plant Arabidopsis thaliana, with the emphasis on three questions. Firstly, what is the global extent of gene expression variability between individuals? In order to better understand how gene expression variability is controlled and its role in plant physiological and developmental responses, we first need to identify genome-wide the genes that are highly variable between individuals. Secondly, does inter-individual expression variability change through the diurnal cycle? It is known that the diurnal cycle influences expression level of up to 36% of the transcriptome (Michael & McClung, 2003; Covington et al, 2008). However, little is known about the impact of the diurnal cycle on gene expression variability. Thirdly, what factors can regulate this inter-individual expression variability? Using single seedling RNA-seq, we identified hundreds of genes that are highly variable between individuals in Arabidopsis and show that the level of variability changes throughout the diurnal cycle. To ensure accessibility and reusability of our data, we created an interactive web application, in which the inter-individual gene expression variability through a diurnal cycle can be observed for individual genes (https://jlgroup.shinyapps.io/aranoisy/). This tool will help researchers to take into consideration any inter-individual gene expression variability in their genes of interest. Moreover, we show that highly variable genes (HVGs) are enriched for environmentally responsive genes and characterised by a combination of specific genomic and epigenomic features. We have revealed both the level and potential mechanism behind gene expression variability between individuals in Arabidopsis, allowing understanding of a previously unexplored aspect of gene regulation during plant development. Results Widespread expression variability in Arabidopsis seedlings through the day and night In order to measure transcriptional variability between individuals, we generated transcriptomes for single Arabidopsis thaliana seedlings at multiple time-points over a full day/night cycle (Fig 1A). To minimise any variability caused by external factors, these seedlings originated from the same mother plant, germinated at the same time and were grown in the same plate under controlled conditions (see Materials and Methods for more details). To analyse how transcriptional variability is influenced by diurnal cycles, we harvested seedlings every 2 h across a 24-h period (Fig 1A). ZT2 to ZT12 corresponding to the time-points harvested during the day, and ZT14 to ZT24 to the time-points harvested during the night, ZT12 and ZT24 being, respectively, harvested just a few minutes before dusk and dawn. In total, 168 transcriptomes have been analysed, that is, of 14 individual seedlings for each of the 12 time-points. We observed very similar mean expression profiles to already published bulk level diurnal profiles (Mockler et al, 2007; Appendix Fig S1B), indicating that known diurnal expression patterns are reproduced in our experiment. Figure 1. Widespread expression variability in Arabidopsis seedlings through the day and night Experimental set-up to identify transcriptional variability between seedlings during the day and night. RNA-seq was performed on individual seedlings, for a total of 14 seedlings at each time-point. Seedlings were harvested at 12 time-points, every 2 h across a 24-h period. Seedlings of different colours represent different transcriptional states and thus inter-individual expression variability we aim to identify. Identification of highly variable genes (HVGs) for the time-point ZT2. The red line shows the trend for the global relation between CV2 (variance/mean2) and mean expression, which is defined using all genes (minus small and lowly expressed genes, see Materials and Methods for more detail) and used to identify HVGs (blue points). For each gene, a corrected CV2 is calculated: log2(CV2/trend). Expression profiles (top) in the 14 seedlings over a 24-h time course (with 12 time-points) for a non-variable gene (left), a highly variable gene (middle) and a gene with the level of variability changing across the 24 h (right). Each dot is the mean normalised expression level for a single seedling. Variability profiles (bottom) of the log2(CV2/trend) for the same genes are also shown. Blue stars indicate time-points for which the gene is identified as being highly variable. Download figure Download PowerPoint Before identifying highly variable genes, we tested whether the level of variability observed in our data could be explained by experimental error or be due to the RNA-seq method. In order to validate the profiles for gene expression variability during the time course, we performed a full time course replicate and examined the variability between seedlings for 10 genes by RT–qPCR (Appendix Fig S1C and G). We observed very similar expression profiles for these genes (Appendix Fig S1G). We also found a positive correlation of 0.77 (Spearman correlation, P = 0.0092) between the average CV2 of the entire time course for each of these 10 genes measured by RNA-seq or RT–qPCR (Appendix Fig S1C), indicating that genes have a similar level of variability in both experiments. We also obtained very similar CV2 at ZT24 when using independent mapping programs (salmon compared to TopHat with a Spearman correlation of 0.85, P < 2.2e-16, or hisat2 compared to TopHat with a Spearman correlation of 0.9, P < 2.2e-16, Appendix Fig S1E and F). Having validated the measured inter-individual variability, we identified highly variable genes (HVGs) from our RNA-seq data set using a previously described method (Brennecke et al, 2013). In this method, the square coefficient of variation [CV2 = variance/(average2)] of each gene is compared to its average expression level (Fig 1B). In order to avoid biases caused by technical noise, which is likely to be higher at lower expression levels, we only selected the HVGs if they were significantly more variable than the background trend in CV2 (see Materials and Methods for more detail). We also calculated a corrected CV2 for each gene [log2(CV2/trend)] which corrected for the observed negative trend between CV2 and expression level, and used it for further analyses of gene expression variability. Genes with a negative log2(CV2/trend) are less variable than the trend, while genes with a positive log2(CV2/trend) are more variable than the trend. To test whether there were transcriptome-wide trends in the level of variability across the day, we verified that the global trends of the CV2 against the average normalised expression measured for each time-point are in the same range (Appendix Fig S2A). We also observed that there was no obvious bias in the distribution of CV2 against the average normalised expression at the different time-points (Appendix Fig S2B). We observed that while some genes are never classed as a HVG (Fig 1C left and Appendix Fig S2D), others are selected as a HVG for the entire time course (Fig 1C middle and Appendix Fig S2E), or as a HVG for only some time-points (Fig 1C right and Appendix Fig S2F), indicating a broad range of variability profiles during the diurnal cycle. Expression level in individual seedlings and profiles of the log2(CV2/trend) during the time course for individual genes can be viewed at https://jlgroup.shinyapps.io/AraNoisy/ (see Appendix Fig S2H for more detail about how to use the web application). In total, we identified between 257 and 716 HVGs at each time-point, with more HVGs identified during the night (Fig 2A). We also generated two other reference gene lists to compare to the HVGs: 1,000 randomly selected sets of genes for each time-point, with the number of random genes selected matching the number of HVGs for each time-point (Appendix Fig S3B), and the least 1,000 variable genes for each time-point (LVGs, for lowly variable genes, see Appendix Fig S3A). We see that HVGs are at least three times and on average 9.3 times more variable than the global trend, while LVGs are at least 4.8 times and on average 8.9 times less variable than the global trend (Appendix Fig S2C). Random genes span a wide range of variability including values as low as for LVGs and as high as for HVGs. All these results taken together reveal a widespread level of inter-individual variability in gene expression throughout the entire time course, with 8.7% of the analysed transcriptome identified as highly variable in at least one time-point (1,358 HVGs). Moreover, as we describe in more detail below, some genes can show very different levels of variability during the time course. Figure 2. Structure of noisiness shows partitioning between day and night Number of genes identified as being highly variable for each time-point. These genes are separated between those that are also selected as highly variable in at least one other time-point (dark blue) and those highly variable in only one time-point (light blue). The top bar indicates time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). Distribution of the number of time-points at which genes are identified as highly variable (blue) or lowly variable (green). The distribution of average (black) and 95% confidence interval (dotted grey) for the thousand random sets are also represented and are so close that they are superimposed and cannot be differentiated in the figure. Heatmap of the percentage of HVGs shared between time-points. Red indicates a high percentage of HVGs in common between two time-points. The top and side bars indicate time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). Hierarchical clustering of HVGs based on the log2(CV2/trend) at each time-point. The result is represented as a heatmap where yellow indicates a high log2(CV2/trend). The genes were separated into four clusters, indicated by the side coloured bar. The top bar indicates time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). See Appendix Fig S3G for heatmaps of the log2(CV2/trend) with the same colour cut-offs for HVGs, LVGs and random genes. Heatmap of the mean normalised expression level for the genes in Fig 2D, keeping the same clustering organisation. The top bar indicates time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). Download figure Download PowerPoint Structure of noisy genes shows partitioning between day and night We next examined the structure of our measured variability in more detail. First, we examined whether HVGs identified for each time-point were scored as variable for multiple time-points or for this time-point only. The vast majority (~93%) of the HVGs identified for each time-point are also identified as highly variable in another time-point (Fig 2A, dark blue). In comparison, an average of ~80% of LVGs and ~30% of random genes selected for each time-point is also observed in another time-point (Appendix Fig S3A and B). Since many genes are identified in more than one time-point, we then defined in how many time-points they are identified. We performed this analysis on all the HVGs (1,358 HVGs), LVGs (5,727) and on a thousand sets of random genes selected for each time-point (same number as HVGs for the time-point). In total, 30% of all the 1,358 HVGs are identified in only one time-point while the others are shared with other time-points, up to all of the 12 time-points for 40 genes (Fig 2B). LVGs and random genes are identified in a lower number of time-points, with 46% of all LVGs and on average 85% of all random genes that are specific for one time-point, while no genes are observed as lowly variable or random in all 12 time-points (Fig 2B). These results show that the number of HVGs shared between time-points is higher than what is observed for LVGs and random genes. It indicates that genes can be highly variable for multiple time-points and potentially show profiles in gene expression variability. Having observed that HVGs are shared between time-points, we then tested for any structure in the gene expression variability across the diurnal cycle. We calculated the percentage of HVGs that are shared between any two time-points and can see a clear separation of day and night time-points (Fig 2C). ZT12, which was harvested just a few minutes before dusk, behaves more like a night than a day time-point as it shares a higher proportion of HVGs with night time-points. When excluding ZT12, the percentage of HVGs that are shared between two time-points of the day (~55% on average) and two time-points of the night (~60%) is higher than between one time-point of the day and one time-point of the night (~35%). When doing the same analysis for LVGs, we observed that the percentage of genes that are shared between two time-points of the day (~18.5%) and two time-points of the night (~20.8%) is very similar to the percentage of genes shared between one time-point of the day and one time-point of the night (~17%, Appendix Fig S3C). We could not find any difference in the percentage of genes that are shared between two time-points in these three categories for the set of random genes analysed (Appendix Fig S3D). This result indicates a structure of the HVGs, but not of LVGs and random genes, in the time course, with a separation between day and night. In order to identify profiles of inter-individual variability across the time course, we performed hierarchical clustering of all 1,358 HVGs based on their log2(CV2/trend) at each time-point. We identified four clusters of variability patterns across the time course (Fig 2D, Appendix Fig S3G). Profiles of genes representative of each cluster can be seen in Appendix Fig S3H. Two clusters (543 genes, clusters 1 and 2) are composed of genes that are variable during the day and the night. One cluster (200 genes, cluster 3) is composed of genes that are highly variable mainly during the day, while another one (615 genes, cluster 4) is composed of genes highly variable mainly during the night. This observation is specific for HVGs, as we cannot observe such marked structure of variability profiles for LVGs and a set of random genes (Appendix Fig S3E–G). All these results show a clear structure in gene expression variability between seedlings during the time course, with different sets of genes being variable during the day or during the night, further suggesting regulation of the level of variability. Highly variable genes are enriched in environmentally responsive genes We next examined the function of the HVGs and LVGs. To do so, we first analysed Gene Ontology (GO) enrichment for all 1,358 HVGs. We identified enrichment for several processes involved in the response to biotic and abiotic stresses as well as in the response to endogenous and exogenous signals (Table EV3). This is not the case for the 5,727 LVGs, for which we found enriched GOs involved in primary metabolism (Table EV4), or for the random genes, for which no GO term was enriched. Interestingly, different GOs are enriched in the clusters identified in Fig 2D based on the log2(CV2/trend) of HVGs along the time course (Table EV3). For example, the response to cold is enriched in clusters 1 and 3, containing genes highly variable during the day, while nitrate assimilation is only enriched in cluster 4, which contains genes highly variable during the night (Table EV3). This result suggests that some GOs might be variable at specific times of the diurnal cycle. In order to test this, we analysed GO enrichment for the HVGs identified at each time-point and clustered the GOs based on the log10(FDR) of their enrichment at the different time-points (Fig 3A). While some GOs such as lipid transport and defence response to fungus are enriched in HVGs throughout the entire time course, we also identified GOs that are enriched only for a subset of the time course. This is the case for the response to toxic substance, reactive oxygen species metabolic process and response to iron ion that are more enriched during the night, or the response to water deprivation and to cold that are more enriched during the day (Fig 3A). We also analysed GO enrichment for the LVGs at each time-point and do not observe such enrichment of GOs preferentially during the day or night (Fig 3B, Table EV4). We also observed that HVGs tend to be expressed with a higher tissue specificity compared with LVGs and random genes (Appendix Fig S4A, Wilcoxon text P < 2.2e−16 between HVG and LVG, Wilcoxon text P < 2.2e−16 between HVG and the thousand sets of random genes). Most HVGs are still expressed in more than one tissue (Appendix Fig S4B). This higher tissue specificity is in agreement with the enrichment of many GOs associated with tissue-specific functions in HVGs. Figure 3. HVGs are enriched for stress responses GO enrichment for genes selected as highly variable for each time-point. GOs that are enriched in at least one time-point are represented. Hierarchical clustering of the GO is performed on the log10(FDR) for GO enrichment in the HVGs. The result is presented as a heatmap with significantly enriched GO in yellow. The top bar indicates time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). GO enrichment for genes selected as lowly variable for each time-point. GOs that are enriched in at least one time-point are represented. Hierarchical clustering of the GO is performed on the log10(FDR) for GO enrichment in the LVGs. The result is presented as a heatmap with significantly enriched GO in yellow. The top bar indicates time-points harvested during the day (orange), just before dusk (red), during the night (black) and just before dawn (blue). Number of transcription factors (TF) in each TF family with enriched targets in the HVGs compared with the total number of TFs in each family included in the DAP-seq data. Families in green have a significantly higher number of TFs with enriched targets in the HVGs than in the entire data set. Families in red have a significantly lower number of TFs with enriched targets in the HVGs than in the entire data set (based on a Fisher's exact test for which P-values are included in figure). Download figure Download PowerPoint In order to support these GO enrichment results, we also examined the transcription factors binding to the HVGs and LVGs, using available data generated by DNA affinity purification coupled with sequencing (DAP-seq), which provides the list of in vitro targets for 529 TFs (O'Malley et al, 2016). We identified 60 TFs with enriched targets in the HVGs, 5 TFs with enriched targets in the LVGs and only one TF with enriched targets in the random genes. Out of the 60 TFs with enriched targets in the HVGs, only 7 are themselves HVGs. 1,106 out of 1,358 HVGs are potential targets of at least one of these 7 TFs. However, 23,301 genes in total are potential targets of at least one of these 7 TFs, so only a small fraction of these potential targets are HVGs (Table EV5). Moreover, DAP-seq data being derived from in vitro interaction, it only provides a list of potential targets and further experiments such as ChIP-seq would be required to obtain the list of genes regulated by these TFs in our conditions. When deriving gene regulatory networks from the DAP-seq data for HVGs and these TFs, we observed a high level of regulation of these 60 TFs by other TFs of this same list and that most HVGs are targeted by a combination of highly variable and non-highly variable TFs (Appendix Fig S4C–E and Table EV6). These results suggest that while the high level of variability could potentially partly be explained by TFs, other factors are also probably involved. These 60 TFs with enriched targets in the HVGs are mainly part of the NAC-, bZIP- and MYB-related families. ZF-HD (4 TFs with enriched targets in the HVGs) and bZIP (12 TFs with enriched targets in the HVGs) families are significantly more represented in this set than expected based on the entire DAP-seq data set (Fig 3C). bZIP TFs regulate multiple processes including pathogen defence, light and stress signalling, seed maturation and flower development (Jakoby et al, 2002). These are in agreement with the enriched GOs identified for HVGs, involved in responses to the environment as well as biotic and abiotic stresses. These results show that HVGs are enriched for genes involved in the response to environment and stress and are targeted by TF families involved in environmental responses, while LVGs are enriched in DNA, RNA and protein metabolism. Moreover, the clear pattern of enrichment for some function either during the day or the night further suggests that variability between seedlings across the day and night is functional and might be controlled. Gene expression profiles and variability are not correlated for the majority of the HVGs Having identified that HVGs are enriched in stress-responsive genes and that variability is structured during a diurnal cycle, we next asked what factors could be involved in modulating variability in expression? To test whether expression levels could modulate variability, we analysed the expression level of HVGs, LVGs and random genes at each time-point (Appendix Fig S5A). We observed that expression levels of HVGs are slightly higher than for a thousand sets of random genes and LVGs. In order to define whether expression profiles could influence changes in variability during the time course, we used the same hierarchical clustering of HVGs based on their log2(CV2/trend) (Fig 2D) to represent their mean normalised expression levels (Fig 2E). We observed that genes in cluster 4, which are more variable during the night, have a peak of expression at the end of the night. Genes in cluster 3, that are more variable during the day, are also slightly more expressed during the day. However, at a global scale, we cannot see a general link between profiles of gene expression level and variability for all HVGs (Fig 2E). To go further, we analysed the correlation between the profiles of mean normalised expression and of the log2(CV2/trend) profile for each of the 1,358 HVGs, 5,727 LVGs as well as for a thousand sets of random genes of the same nu

Referência(s)