Revisão Acesso aberto Revisado por pares

The cell as a bag of RNA

2021; Elsevier BV; Volume: 37; Issue: 12 Linguagem: Inglês

10.1016/j.tig.2021.08.003

ISSN

1362-4555

Autores

Stephen R. Quake,

Tópico(s)

Gene Regulatory Network Analysis

Resumo

Single-cell transcriptomics techniques enable one to abstract the cell as if it were a 'bag of RNA'.Although this simplification neglects many important structural aspects of the cell, it enables profound insight into the nature of cell state and cell identity.Cell atlases derived by these techniques are companions to the genome which help explain cellular phenotype. Genomic sequencing has provided insight into the genetic characterization of many organisms, and we are now seeing sequencing technologies turned towards phenotypic characterization of cells, tissues, and whole organisms. In particular, single-cell transcriptomic techniques are revolutionizing certain aspects of cell biology and enabling fundamental discoveries about cellular diversity, cell state, and cell type identity. I argue here that much of this progress depends on abstracting one's view of the cell to regard it as a 'bag of RNA'. Genomic sequencing has provided insight into the genetic characterization of many organisms, and we are now seeing sequencing technologies turned towards phenotypic characterization of cells, tissues, and whole organisms. In particular, single-cell transcriptomic techniques are revolutionizing certain aspects of cell biology and enabling fundamental discoveries about cellular diversity, cell state, and cell type identity. I argue here that much of this progress depends on abstracting one's view of the cell to regard it as a 'bag of RNA'. It may seem ridiculously naïve to characterize something with the complexity and intricacy of a cell as a 'bag of RNA'. I will argue that in fact it is a powerful and productive abstraction that is helping revolutionize our understanding of cell biology. The inspiration for this view comes from the field of biochemistry, which used a similar abstraction ('the cell is a bag of enzymes') to motivate an experimental program that led to many important discoveries about how the cell works [1.Alberts B. Biochemical Conceptions of the Cell: From bag of enzymes to chemical factory.Cell. 1998; 92: 291-294Abstract Full Text Full Text PDF PubMed Scopus (981) Google Scholar,2.Kyne C. Crowley P.B. Grasping the nature of the cell interior: from physiological chemistry to chemical biology.FEBS J. 2016; 283: 3016-3028Crossref PubMed Scopus (17) Google Scholar]. It is of course manifestly incorrect to assert that the cell is a bag of enzymes, because indeed even bacterial cells depend on various structures, organelles, and other highly stereotyped forms of protein localization to accomplish the basic processes of life [3.Rudner D.Z. Losick R. Protein subcellular localization in bacteria.Cold Spring Harb. Perspect. Biol. 2010; 2a000307Crossref PubMed Scopus (144) Google Scholar,4.Murat D. et al.Cell biology of prokaryotic organelles.Cold Spring Harb. Perspect. Biol. 2010; 2: 1-19Crossref Scopus (62) Google Scholar]. However, the idea of purifying enzymes and studying their activity in test tubes provided enormous insight into what was happening in the cell – including metabolism, transcription, translation, and replication [5.Berg J.M. et al.Biochemistry.9th edn. MacMillan, 2019Google Scholar]. It is not clear that those discoveries would have been possible if one insisted on only studying intact cells in all their complexity. As a gedanken experiment in understanding questions of cellular identity and cellular state, imagine that you could create a census of all the molecules in a cell. This would be a list of each type of molecule and how many were found in the cell – proteins, lipids, metabolites, transcripts, etc. It is a coarse-grained description of the cell because we have thrown out all of the spatial coordinates of the molecules and therefore do not know where they are located in the cell, but most people would agree that this enumeration would capture a large amount of information about the state of the cell at a given point in time, and it would almost certainly provide definitive information about cell type. Unfortunately, despite exciting recent progress in single-cell 'omics' of various sorts [6.Stuart T. Satija R. Integrative single-cell analysis.Nat. Rev. Genet. 2019; 20: 257-272Crossref PubMed Scopus (460) Google Scholar, 7.Li Z. et al.Single-cell lipidomics with high structural specificity by mass spectrometry.Nat. Commun. 2021; 12: 2869Crossref PubMed Scopus (26) Google Scholar, 8.Duncan K.D. et al.Advances in mass spectrometry based single-cell metabolomics.Analyst. 2019; 144: 782-793Crossref PubMed Google Scholar, 9.A.-D. Brunner et al., Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation, bioRxiv. Published online February 8, 2020. https://doi.org/10.1101/2020.12.22.423933.Google Scholar], this sort of experiment is currently technically impossible to perform on individual cells for most molecular species. However, thanks to advances in nucleic acid sequencing and microfluidic technologies over the past two decades, it is now technically feasible to create a census of all of the RNA molecules in a cell [10.Wu A.R. et al.Single-cell transcriptional analysis.Annu. Rev. Anal. Chem. 2017; 10: 439-462Crossref PubMed Scopus (62) Google Scholar,11.Aldridge S. Teichmann S.A. Single cell transcriptomics comes of age.Nat. Commun. 2020; 11 (Nature Research): 4307Crossref PubMed Scopus (62) Google Scholar]. Most work to date has focused on mRNA, and that is where I focus this discussion, but it is important to point out that a growing literature now shows how to expand these experiments to include other species of RNA found in the cell [12.Isakova A. et al.Single cell profiling of total RNA using Smart-seq-total.bioRxiv. 2020; (Published online June 3, 2020. https://doi.org/10.1101/2020.06.02.131060)Google Scholar]. What is the value of enumerating all of the mRNA molecules in a given single cell? In short, it gives a high-dimensional description of the state and identity of the cell. There are roughly 20 000 genes in the mouse and human genomes, so a transcriptomic characterization of a single cell embeds it in a 20 000-dimensional space – which is a lot of information to have. This is a rich vector space in which each dimension corresponds to a particular gene in the genome and each component of the vector in that dimension is the number of transcripts of that gene that are expressed within the cell. These measurements give a rich molecular characterization and definition of cell type and provide fodder for interesting mathematical and computational manipulations to extract new functional insights from these data. Typically, cell types have been previously defined in the literature by, at best, a handful of genes, while the single-cell transcriptomic approach gives a more or less complete characterization in terms of which genes and their splice variants are expressed, upregulated, or downregulated in a given cell type [13.Teschendorff A.E. Feinberg A.P. Statistical mechanics meets single-cell biology.Nat. Rev. Genet. 2021; 22: 459-479Crossref PubMed Scopus (17) Google Scholar]. Moreover, this molecular characterization reminds us that it can be misleading to talk about the cell as if all cells are the same; while there are certainly highly conserved structures and mechanisms shared by all cell types, there are also important – even dramatic – differences between cell types to be mindful of. One important caveat to discuss is that cell types are traditionally defined in terms of protein expression, whereas now we use mRNA as a proxy for protein. As has been well documented through the years, it is not a perfect proxy [14.Edfors F. et al.Gene-specific correlation of RNA and protein levels in human cells and tissues.Mol. Syst. Biol. 2016; 12: 883Crossref PubMed Scopus (228) Google Scholar,15.Vogel C. Marcotte E.M. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses.Nat. Rev. Genet. 2012; 13: 227-232Crossref PubMed Scopus (2351) Google Scholar]. Thus, while it is natural to ask whether we are making errors by switching from protein to RNA, these same studies have shown that for the vast majority of genes, at any given time point, RNA expression is correlated with protein expression. This is true even at the single-cell level, especially for eukaryotic cells [16.Darmanis S. et al.Simultaneous multiplexed measurement of RNA and proteins in single cells.Cell Rep. 2016; 14: 380-389Abstract Full Text Full Text PDF PubMed Scopus (136) Google Scholar] (in bacteria it can be a bit more complex due to shorter RNA lifetimes, but the time-dependent correlation function is still nontrivial [17.Taniguchi Y. et al.Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells.Science (80-. ). 2010; 329: 533-538Crossref PubMed Scopus (1390) Google Scholar]). However, the correlation is not perfect and indeed for some genes not strong at all. Moreover, other factors such as post-translational modifications, especially those affecting protein turnover, protein folding efficiency, and protein trafficking (the cell is not a bag at the end of the day) can affect the cell's state, but perhaps not its identity. When one is working in a high-dimensional space (remember, we have 20 000 dimensions to play with) it turns out that many 'sins' are forgiven, in the sense that questions of cell state and identity tend to be overconstrained by so many variables, and not infrequently, it is simply the presence or absence of the transcript and protein that is determinative. How well can you measure RNA from a single cell, and what is the experimental precision and systematic error? These are complex questions that could occupy a full-length review of their own; here, I point out a few summary observations. Sequencing is a complex process that introduces distortion and bias at many intermediate steps along the way. There are two ways to deal with this. The first is to introduce unique molecular bar codes when RNA is first transcribed into cDNA; this allows one to track unique molecules through the whole process [18.Klein A.M. et al.Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.Cell. 2015; 161: 1187-1201Abstract Full Text Full Text PDF PubMed Scopus (1717) Google Scholar,19.Macosko E.Z. et al.Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.Cell. 2015; 161: 1202-1214Abstract Full Text Full Text PDF PubMed Scopus (3327) Google Scholar]. The second is to introduce National Institute of Standards and Technology traceable RNA standards in each single-cell experiment; this enables one to measure and correct for distortion in the process [20.Wu A.R. et al.Quantitative assessment of single-cell RNA-sequencing methods.Nat. Methods. 2014; 11: 41-46Crossref PubMed Scopus (493) Google Scholar]. Each of these approaches has strengths and weaknesses, and both are useful in certain situations. In terms of sensitivity, depending on which technical approach is used, one can see quantum efficiencies as high as 50% (i.e., for any given mRNA molecule in the cell there is a 50% chance it will be detected) and as low as a few percent. Again, there is a tradeoff in that the high-sensitivity methods tend to be lower throughput in terms of number of cells per experiment, and higher cost per cell. Another natural question is whether cell types measured by single-cell transcriptomics correspond to those measured by traditional physiological methods. It is not clear at all, a priori, whether these abstract 20 000 dimensional characterizations based on a single type of nucleic acid correspond to the known physiology of a given cell type. A number of studies have sought to address this question and have established an important intellectual foundation to the field by showing that, almost without exception, there is good correspondence. As a few examples, neurons from slices of the mouse brain were characterized by patch-clamp electrophysiology, and then their contents micropipetted out and sequenced [21.Földy C. et al.Single-cell RNAseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons.Proc. Natl. Acad. Sci. U. S. A. 2016; 113: E5222-E5231Crossref PubMed Scopus (96) Google Scholar,22.Fuzik J. et al.Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes.Nat. Biotechnol. 2016; 34: 175-183Crossref PubMed Scopus (223) Google Scholar]. This provided a way to directly compare individual cells both physiologically and transcriptomically [23.Zeng H. Sanes J.R. Neuronal cell-type classification: challenges, opportunities and the path forward.Nat. Rev. Neurosci. 2017; 18: 530-546Crossref PubMed Scopus (321) Google Scholar]. A similar approach was used on exocrine cells from the human pancreas, with similarly positive results [24.Camunas-Soler J. et al.Patch-Seq links single-cell transcriptomes to human islet dysfunction in diabetes.Cell Metab. 2020; 31: 1017-1031.e4Abstract Full Text Full Text PDF PubMed Scopus (81) Google Scholar]. One of the more stringent tests was performed in the fly olfactory system, where the physically adjacent glomeruli used to process smells from olfactory neurons are viewed as physiologically distinct cell types that differ only in the odor they process. In this case the cells defined by single-cell transcriptomics not only corresponded with their physiological cell type definition, but also provided novel insight into their molecular definition and 'wiring patterns' [25.Li H. et al.Classifying Drosophila olfactory projection neuron subtypes by single-cell RNA sequencing.Cell. 2017; 171: 1206-1220.e22Abstract Full Text Full Text PDF PubMed Scopus (118) Google Scholar]. One wonders how many cell types there are in the human body. Textbooks put the number at around 300, but are there more to be discovered? As the single-cell transcriptomic field began, estimates were all over the map. Were there tens of thousands or even millions of new cell types to discover? I made a simple, practical estimate of the upper limit of new cell types, based on the observation that biologists are good at observing, and in the past 200 years of cell biology, it seemed impossible to me that they missed >95% of what is there. Which is to say that if biologists had discovered only 5% of cell types in the human body, then the upper limit of cell types to discover is somewhere around 6000 (i.e., 300/0.05). The now substantial literature on single-cell transcriptomic analysis of human tissues and organs suggests that my calculation was in fact an overestimate. I am hard pressed to find a paper that states that the authors doubled the number of cell types in a given tissue; more typically there are only a few new cell types relative to the ones that were known previously. The only exception to this is the brain, where it appears there are many new cell types to be discovered than in other organs. Another complicating factor in the debate about how many cell types there are relates to the question of whether all cell types can be defined as discrete entities, or whether there is also continuous variation between cell types. To me, this is an interesting open problem and rests on the uncomfortable truth that the field has not arrived at a consensus definition of cell type, and neither has it truly distinguished between cell type and cell state. Most people have an intuitive sense that cell state is more transitory than cell type, and that cell types must have some sort of temporal stability which goes on for hours if not days. It is clear from many experiments that during differentiation, activation, or transformation, cells move along a continuum of states as they transition from one cell type to another; this has been shown both in naturally occurring tissues as well as artificial-engineered situations such as cellular reprogramming [26.Treutlein B. et al.Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq.Nature. 2016; 534: 391-395Crossref PubMed Scopus (250) Google Scholar]. It has also been shown that cells with static identities can have a continuum of identities, often linked to spatial distribution [27.Stanley G. et al.Discrete and continuous cell identities of the adult murine striatum.bioRxiv. 2019; (Published online Marhc 27, 2019)https://doi.org/10.1101/591396Google Scholar]. Related to that, the question of spatial relationships between these cell types is now accessible via a variety of new techniques, which interrogate a subset of transcripts within a cell while preserving the tissue structure, also known as spatial transcriptomics [28.Waylen L.N. et al.From whole-mount to single-cell spatial assessment of gene expression in 3D.Commun. Biol. 2020 31. 2020; 3: 1-11Google Scholar]. These are but a few of the many outstanding questions for the field (see Outstanding questions for more). There has been a long tradition of creating cell atlases in biology – from Ramón y Cajal's drawing of cell types of the brain as observed by optical microscopy [29.Ramón y Cajal S. Histologie Du Système Nerveux de L'homme & Des Vertébrés. A. Maloine, 1909Google Scholar] to Donald Fawcett's 'Atlas of Fine Structure' [30.Fawcet Don W. The Cell, Its Organelles and Inclusions. An Atlas of Fine Structure. W.B. Saunders and Co., 1967Google Scholar], which characterized the structures of the cell by electron microscopy. The community now embarked on creating a new version of a cell atlas, which now incorporates the full power of molecular definition of cell type, which was previously lacking for large numbers of cell types. Defining a cell type by its name and a gene or two, whose expression is associated with it, leaves a far from complete understanding of identity or state. This is the power of single-cell transcriptomics, which provides vast, multidimensional information regarding a cell's identity and physiology. My colleagues and I have spent a substantial amount of effort creating whole organism cell atlases – of flies [31.H. Li et al., Fly Cell Atlas: a single-cell transcriptomic atlas of the adult fruit fly, bioRxiv, Published online July 5, 2021. https://doi.org/10.1101/2021.07.04.451050.Google Scholar], mice [32.Tabula Muris Consortium et al Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.Nature. 2018; 562: 367-372Crossref PubMed Scopus (851) Google Scholar,33.The Tabula Muris Consortium A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.Nature. 2020; 583: 590-595Crossref PubMed Scopus (202) Google Scholar], lemurs (https://tabula-microcebus.ds.czbiohub.org/about) and humans [34Tabula Sapiens Consortium, The Tabula Sapiens: a single cell transcriptomic atlas of multiple organs from individual human donors, bioRxiv, Published online July 20, 2021. https://doi.org/10.1101/2021.07.19.452956.Google Scholar] – as have others in the community [35.Plass M. et al.Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics.Science. 2018; 360eaaq1723Crossref PubMed Scopus (209) Google Scholar, 36.La Manno G. et al.Molecular architecture of the developing mouse brain.Nature. 2021; 596: 92-96Crossref PubMed Scopus (38) Google Scholar, 37.G. Eraslan et al., Single-nucleus cross-tissue molecular reference maps to decipher disease gene function, bioRxiv. Published online July 19, 2021. https://doi.org/10.1101/2021.07.19.452954.Google Scholar, 38.C. C. Domínguez et al., Cross-tissue immune cell analysis reveals tissue-specific adaptations and clonal architecture across the human body, bioRxiv Published online April 28, 2021. https://doi.org/10.1101/2021.04.28.441762.Google Scholar]. These atlases provide a wealth of insight into the diversity of cell types in these organisms, how related cell types across different tissues are similar or different, how cells change with age, and what are the evolutionary relationships between different cell types across organisms. For example, the Tabula Muris Senis has helped us understand at a global level how gene expression changes with age, which cell types are most affected, and which of these aspects are reversible via rejuvenation approaches such as parabiosis [33.The Tabula Muris Consortium A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.Nature. 2020; 583: 590-595Crossref PubMed Scopus (202) Google Scholar,39.R. Pálovics et al., Molecular hallmarks of heterochronic parabiosis at single cell resolution, bioRxiv. Published online November 8, 2020. https://doi.org/10.1101/2020.11.06.367078.Google Scholar]. As another example, the Tabula Drosophila has helped reveal how gene expression depends on sex, and has identified which cell types have the highest levels of sex-biased gene expression – this is a launching pad to help understand how these differences meet the distinct developmental and physiological needs of males and females [31.H. Li et al., Fly Cell Atlas: a single-cell transcriptomic atlas of the adult fruit fly, bioRxiv, Published online July 5, 2021. https://doi.org/10.1101/2021.07.04.451050.Google Scholar]. I view them as phenotypic companions to the genome: the genome is most accurately described as the parts list but not the blueprint of the organism. Not every cell uses the genomic parts lists in the same way or even the same parts; it is the cell atlases which provide the true blueprint of how the parts are used in different tissues, organs and cell types. These resources will be essential to fully understand organismal development and physiology.Outstanding questionsThe difference between cell state and cell type identity are currently not well defined – most people would ascribe a time scale in the sense that cell type identities are stable over longer periods of time than cell states, which can be viewed as more transient, but the field has not converged on a rigorous distinction.Similarly, there has been an assumption throughout most of the literature that cell states are discrete, while there is evidence that at least in some tissues and cell types that there is continuous variation in identity.How well can cell state be characterized with molecules other than RNA, such as proteins, lipids, or metabolites?Correlations between cell type and spatial location within the tissue remain largely unexplored.What are the evolutionary relationships between cell types across both closely and distantly related organisms?What are the minimal number and types of genes required to specify cell types?What are the regulatory landscapes which specify cell types and can the global knowledge of cell type specific transcription factors be used to more efficiently reprogram a given cell type to any other cell type of interest?What is the epigenetic landscape of a given cell type and how does that change with state and identity?How are changes in the genome over time reflected in cell state? The difference between cell state and cell type identity are currently not well defined – most people would ascribe a time scale in the sense that cell type identities are stable over longer periods of time than cell states, which can be viewed as more transient, but the field has not converged on a rigorous distinction. Similarly, there has been an assumption throughout most of the literature that cell states are discrete, while there is evidence that at least in some tissues and cell types that there is continuous variation in identity. How well can cell state be characterized with molecules other than RNA, such as proteins, lipids, or metabolites? Correlations between cell type and spatial location within the tissue remain largely unexplored. What are the evolutionary relationships between cell types across both closely and distantly related organisms? What are the minimal number and types of genes required to specify cell types? What are the regulatory landscapes which specify cell types and can the global knowledge of cell type specific transcription factors be used to more efficiently reprogram a given cell type to any other cell type of interest? What is the epigenetic landscape of a given cell type and how does that change with state and identity? How are changes in the genome over time reflected in cell state? No interests are declared.

Referência(s)
Altmetric
PlumX