Revisão Acesso aberto Revisado por pares

Employing core regulatory circuits to define cell identity

2021; Springer Nature; Volume: 40; Issue: 10 Linguagem: Inglês

10.15252/embj.2020106785

ISSN

1460-2075

Autores

Nathalia Almeida, Matthew Wai Heng Chung, Elena M. Drudi, Elise N. Engquist, Eva Hamrud, Abigail Isaacson, Victoria Tsang, Fiona M. Watt, Francesca M. Spagnoli,

Tópico(s)

CRISPR and Genetic Engineering

Resumo

Review2 May 2021Open Access Employing core regulatory circuits to define cell identity Nathalia Almeida orcid.org/0000-0003-3145-186X Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Matthew W H Chung Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Elena M Drudi Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Elise N Engquist Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Eva Hamrud Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Abigail Isaacson Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Victoria S K Tsang Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Fiona M Watt Corresponding Author [email protected] orcid.org/0000-0001-9151-5154 Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Francesca M Spagnoli Corresponding Author [email protected] orcid.org/0000-0001-7094-8188 Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Nathalia Almeida orcid.org/0000-0003-3145-186X Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Matthew W H Chung Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Elena M Drudi Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Elise N Engquist Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Eva Hamrud Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Abigail Isaacson Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Victoria S K Tsang Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Fiona M Watt Corresponding Author [email protected] orcid.org/0000-0001-9151-5154 Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Francesca M Spagnoli Corresponding Author [email protected] orcid.org/0000-0001-7094-8188 Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK Search for more papers by this author Author Information Nathalia Almeida1,†, Matthew W H Chung1,†, Elena M Drudi1,†, Elise N Engquist1,†, Eva Hamrud1,†, Abigail Isaacson1,†, Victoria S K Tsang1,†, Fiona M Watt *,1 and Francesca M Spagnoli *,1 1Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, King's College London, London, UK †These authors contributed equally to this work *Corresponding author. Tel: +44 20 7188 5608; E-mail: [email protected] *Corresponding author. Tel: +44 20 7188 4520; E-mail: [email protected] EMBO J (2021)40:e106785https://doi.org/10.15252/embj.2020106785 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract The interplay between extrinsic signaling and downstream gene networks controls the establishment of cell identity during development and its maintenance in adult life. Advances in next-generation sequencing and single-cell technologies have revealed additional layers of complexity in cell identity. Here, we review our current understanding of transcription factor (TF) networks as key determinants of cell identity. We discuss the concept of the core regulatory circuit as a set of TFs and interacting factors that together define the gene expression profile of the cell. We propose the core regulatory circuit as a comprehensive conceptual framework for defining cellular identity and discuss its connections to cell function in different contexts. Introduction The nature of cell identity is a central problem in biology. Accurate identification of cell types deserves significant attention due to its impact on many areas of research and clinical applications, including regenerative medicine. Cell identities are influenced by external stimuli, such as signaling molecules, growth factors, and intercellular communication, which in turn affect downstream gene expression and jointly dictate cell phenotype and function(s) (Holmberg & Perlmann, 2012; Wagner et al, 2016). Even though these distinct facets of a cell's identity are interdependent, they are often considered separately. Nevertheless, the cell's phenotype and functional characteristics ultimately represent the readout of a specific gene-expression program. Typically, a small number of transcription factors (TF), which show a lineage-restricted expression pattern, are considered sufficient to establish gene expression programs that define the identity of a cell (Holmberg & Perlmann, 2012; Zaret & Mango, 2016). Often, these TFs have the ability to bind to inaccessible nucleosomal DNA, acting as "pioneer" TFs (Zaret & Carroll, 2011; Zaret & Mango, 2016). The concept that differentiated cell identity is established and continuously maintained by a set of TFs was proposed several decades ago (Blau & Baltimore, 1991). This was supported by pioneering studies with cell hybrids and heterokaryons, in which terminally differentiated cells could be successfully reprogrammed into muscle cells by cell fusion (Weiss & Green, 1967; Blau et al, 1983; Pomerantz et al, 2009), and later by gain-of-function approaches based on key TFs (Davis et al, 1987). While these experiments established that cell identity is actively maintained by TFs, it was only in 2008 that Hobert proposed the term of terminal selector gene (TSG) (Hobert, 2008). A TSG was defined as a gene that specifies individual identities by directly controlling the expression of a set of downstream differentiation genes (a.k.a. effector genes) via common cis-regulatory motifs (a.k.a. terminal selector motifs) (Hobert, 2008). Though initially described within the context of neuron-specific lineage determination and maintenance in C. elegans (Etchberger et al, 2007), the existence of TSGs has been confirmed in a plethora of other cell types and also in vertebrate model systems (Hobert, 2008) (Box 1). Features of neuronal cell TSG expression that may well apply to other cell types are as follows: (i) the initiation and maintenance of TSG expression are independent events; (ii) the initiation may be the result of transient expression of distinct regulatory factors, either extrinsic signals or TFs; (iii) after initiation, TSGs autoregulate their expression, ensuring continuous expression and regulation of downstream targets (Hobert, 2008, 2011). Box 1. Building the CRC of dopaminergic neurons Efforts to classify neuronal identity have greatly contributed to our understanding of CRCs, with studies in C. elegans being the first to conceptualize various components of CRCs such as TSGs (Hobert, 2008). For example, PBX/CEH-20, part of the PBX TALE (three-amino-acid loop extension) homeodomain proteins (Selleri et al, 2019), was first identified to initiate and maintain the terminally differentiated state of dopaminergic (DA) neurons, thereby acting as a TSG (Doitsidou et al, 2013). It was later found that PBX factors, in particular Pbx1, have a conserved role in mouse midbrain DA neurons (Villaescusa et al, 2016). More recently, a genetic approach was used to specifically ablate Pbx1 expression in mouse DA neurons to achieve temporal control over its expression, confirming the involvement of Pbx1 in an evolutionarily conserved CRC (Remesal et al, 2020). This study not only confirmed the involvement of Pbx1 in the production of dopamine, but also showed that this TF is required for the expression of a broad range of olfactory bulb DA effector genes (Remesal et al, 2020). Such a genetic approach enabled the distinction between the late roles of Pbx1 in terminal differentiation and preservation of neuronal identity (Remesal et al, 2020) and its early activities in neuroblasts as well as in midbrain DA neuron specification (Grebbin et al, 2016; Villaescusa et al, 2016) (Box 1 Figure). Box 1 Figure. The role of Pbx1 in the CRC of olfactory bulb DA neurons. Pbx1 is a TF that is continuously expressed from progenitor to mature neurons. Conditional knockout approaches were key for elucidating the role of Pbx1 not only in the specification of midbrain DA neurons (Villaescusa et al, 2016), but also specifically in the CRC of olfactory bulb DA neurons (Remesal et al, 2020). DA: dopaminergic. The transcriptional characterization of cell populations can be facilitated by the prior knowledge of TFs that promote cell identity and unfold a CRC network. For instance, the use of known DA lineage marker genes enabled Fernandes and colleagues to describe a previously unknown heterogeneity of DA neurons derived from human induced pluripotent stem cells (Fernandes et al, 2020). Using scRNA-seq to obtain an unsupervised clustering of the population of cells, the group identified six distinct cell types, two being neuron progenitor populations and four being subpopulations of DA neurons. Although these populations differed in expression of certain genes, all expressed typical DA lineage markers, including Pbx1. Additionally, the in vitro transcriptional data overlapped well with single-cell transcriptomic datasets of post-mortem substantia nigra, which validated the transcriptional heterogeneity found in subpopulations of human DA neurons. Given that DA neurons are degenerated in individuals with Parkinson's disease, building the CRC network in DA neurons will not only enrich our understanding of this cell type, but also, potentially, contribute to the development of disease therapies (see section "Assigning functional relevance to CRCs"). In higher vertebrate species, acquisition of a differentiated cell identity seems to require more complex circuitries, whereby a larger panel of TFs act in a combinatorial manner (Fig 1A) (Holmberg & Perlmann, 2012). Target/effector genes are not all controlled by a similar cis-regulatory logic, but instead different combinations of lineage-specific TFs co-regulate different subsets of target genes in distinct ways. Thus, only when the complete set of TFs is co-expressed in a cell, the full repertoire of differentiation genes is induced and maintained (Holmberg & Perlmann, 2012). Davidson pioneered the concept of gene regulatory networks (GRN) governing the development of body plan and organ formation in the embryo (Davidson & Erwin, 2006). TFs and transcriptional regulators are GRN components, and their target sites are the cis-regulatory DNA modules. Because each module is regulated by multiple TFs and each TF interacts with multiple modules, it is possible to represent developmental patterns of gene expression as an interlocking network (Peter & Davidson, 2016). Beyond early embryonic processes, GRN circuit design has been applied to describe the transcriptional control of binary fate choices in stem cell differentiation, for example, in the hematopoietic lineage (Graf & Enver, 2009; Davidson, 2010; Xia & Yanai, 2019). Furthermore, seminal studies from embryonic stem cells (ESCs) have revealed that a small set of TFs, such as NANOG, SOX2, and OCT4, called core TFs, not only bind to their own loci, but also mutually regulate one another, thereby forming cross-regulated feed-forward loops that maintain pluripotency (Boyer et al, 2005). The core TFs and their interconnected auto-regulatory loops have been termed "core regulatory circuitry" (CRC) (Boyer et al, 2005; Young, 2011). Figure 1. Cell identity is regulated by CRCs (A) Conceptualizing different types of information (e.g., transcriptomics, epigenomics) in the flow of biological information from DNA to function in order to shape our knowledge of CRCs. Downstream processes (purple), such as gene and protein expression, are routinely measured using transcriptomics and proteomics. Further downstream of this is cellular phenotype, a more complex readout which is measured using various assays and microscopy techniques. Factors that influence the CRC of a cell (green) include intrinsic factors and extrinsic factors, such as epigenetic memory and the external environment, respectively. While an overall flow of information is unidirectional (from top to bottom), many factors influence each other in more complex ways. (B) Model of cell identity being regulated by GRNs through development and CRCs in differentiated cells. We propose that CRCs define differentiated cell types and GRNs are temporally transient networks which drive cellular differentiation during development. GRNs adapt in response to external signals and other influences during development, resulting in a series of different developing cell states. Once cells become terminally differentiated, the TF network becomes more stable and can be defined as a CRC, which is autoregulating and activates the expression of the terminal effector gene battery. While GRNs and CRCs can be identified using similar methods, studies of GRNs additionally benefit from lineage tracing and pseudotime analysis to account for their temporal aspect. Download figure Download PowerPoint Arendt et al (2016) further extended these models and introduced the concept of core regulatory complex (CoRC), whereby cell type-specific gene expression not only requires the activity of a specific combination of terminal selectors but also depends on their physical cooperativity. Based on such a model, the origin of a new cell type in evolution coincides with the occurrence of a unique CoRC, distinct from its evolutionary sister cell type (Arendt, 2008; Arendt et al, 2016). While the primary function of CRC or CoRC factors is to keep cells in a stable differentiated state, the notion of GRN describes a temporally hierarchical framework of gene expression that controls a differentiation process and adapts in response to external signals and other influences during development (Davidson, 2010; Marioni & Arendt, 2017) (Fig 1B). Since CRC factors provide the ultimate instructive "code" underlying the expression of the effector genes in differentiated cells, cellular identifiers, such as the functional output, cannot be considered separately. Thus, the CRC concept might provide a standardized and comprehensive definition of a cell type, as the TF regulatory network, which is necessary for the induction and maintenance of cell type-specific gene expression program in differentiated cells. In this review, we discuss how CRC TFs can be employed to define cell identity in the context of differentiation strategies, which can benefit regenerative medicine. Identifying a core regulatory circuit Several efforts have been made to identify individual components of cell type-specific CRCs (Graf & Enver, 2009; Xia & Yanai, 2019). To date, most of our knowledge is based on the use of expression profiles of core TFs as a proxy for CRCs. However, to build the network, transcriptomic data need to be integrated with chromatin analyses in computational models for protein–protein, gene–protein, and regulatory element interactions (Fig 1A). Transcriptomics: from population to single-cell analyses Cell type-enriched sets of TFs, the main components of CRCs, are still primarily discovered by transcriptome analyses (Xia & Yanai, 2019) (Fig 2A). Over the last decades, the shift from bulk transcriptomics to single-cell or single-nucleus RNA-sequencing (scRNA-seq and snRNA-seq, respectively) has started to provide new insights into gene modules underlying individual cell types (Menon, 2018). Moreover, these approaches in genomics and transcriptomics at a single-cell resolution have led to depositories such as the Human Cell Atlas (HCA). The HCA is a global initiative, which aims to create a comprehensive reference map of all human cell types based on their molecular profiles and their classical cellular descriptions (Regev et al, 2017). The purpose is to provide a unique identification of each cell type and a common framework for understanding biological processes in health and disease. Single-cell atlases are already available for adult human tissues, including the lung, kidney, pancreas and liver, and sequencing of fetal tissues is also ongoing (https://data.humancellatlas.org/). Figure 2. Methods employed to identify CRCs (A) Single-cell transcriptomics allows the identification of cell populations or states (top). Putative CRC components for these cell identities can be identified by defining the TFs and downstream genes enriched in these cells (bottom). (B) Epigenetic methods allow the identification of cis-regulatory elements that make up the CRC. Chromosome conformation capture (3C/HiC) identifies regions of DNA, which are in close contact with each other, potentially including enhancer–promoter interactions (left). ATAC-/DNase- and ChIP-seq for histone modifications identify regions of open chromatin, which can be used to identify enhancers as well as promoters and actively transcribed genes (right). (C) Computational methods are used in multiple aspects of CRC identification. Clustering of single-cell transcriptomics data allows discovery of previously unknown cell types, while pseudotime analysis help identify transcriptional states when cell fate decisions along developmental trajectories are made (top). Several algorithms can make data-driven predictions of CRCs by analyzing TF co-expression and performing GRN inference (middle). Other relevant data supplied by users or deposited in databases can inform on CRC mechanisms (e.g., chromatin accessibility, promoter and enhancer states, TF-binding and protein–protein interactions) and be integrated to refine CRC predictions (bottom). Download figure Download PowerPoint To date, transcriptome analyses have enabled the classification of specific mammalian brain cells in spatiotemporal and cell-type databases (Keil et al, 2018; Arlotta & Paşca, 2019). The spatial transcriptome atlas of the adult human brain from the Allen Human Brain Atlas (AHBA), for example, comprises histological analysis and comprehensive microarray profiling of nearly 900 neuroanatomically precise microdissected sites of the brain in two individuals (Hawrylycz et al, 2012). More recently, in 2014, the U.S. National Institutes of Health funded the BRAIN Initiative Cell Census Consortium (BICCC) (Ecker et al, 2017). The initiative combines ten pilot projects spanning multiple approaches, including single-cell omics and species (mice, rats, zebrafish, and humans) with the final goal to classify brain cell types based on integrated analysis of their molecular, anatomical, and physiological properties. The BICCC network works closely with the HCA to develop a comprehensive atlas of all cell types in the human body within a common coordinate framework (Regev et al, 2017). BICCC groups developed new technologies for profiling single neurons that identified new cell types or cell states in the nervous system (Tasic et al, 2016). The availability of single-cell data has allowed the characterization of heterogeneous transcriptional profiles, context-dependent regulatory relationships, and functional interactomes with higher granularity (Aibar et al, 2017; Mohammadi et al, 2019). Kelley et al (2018) used scRNA-seq data to examine cell-type variations across brain regions in intact human tissue. This resulted in a robust strategy to define gene modules enriched in major neuronal subtypes, which they termed "core transcriptional identities" (Kelley et al, 2018; Menon, 2018). Despite its many advantages, scRNA-seq techniques are susceptible to several influences, which can bias the results (Chen & Zhou, 2017; Keil et al, 2018). Several technical factors can introduce variations in the sequencing data; cell dissociation and suspension preparation may introduce technical noise; and stress to the cell-type viability could lead to alterations in the gene expression profiles (Ecker et al, 2017; Menon, 2018; Kelley et al, 2018). As some classes of cells are more fragile and prone to rupture than others, this will introduce bias in the populations captured. Other challenges include transcripts of short length or of low abundance in a single cell. The low amount of material may result in uneven RNA loss leading to gene drop-out events which can be difficult to measure accurately (Chen & Zhou, 2017; Keil et al, 2018). In particular, Mawla and Huising have illustrated the limitations of pancreatic islets transcriptomics where the impact of endocrine cells, other than the insulin-producing β-cells, or auxiliary cells in the disruption of blood glucose homeostasis is often overlooked due to their lower abundance (Mawla & Huising, 2019). Although whole islet analysis is limited by the mixture of cells, which differ in abundance, Bramswig and Kaestner discussed the reliability of adding a sorting strategy to determine cell type-specific changes (Bramswig & Kaestner, 2014). Another challenge of developing a comprehensive human cell atlas is that scRNA-seq requires fresh tissue and therefore relies on limited tissue donations collected either surgically or post-mortem (Ecker et al, 2017; Kelley et al, 2018). A valuable alternative is snRNA-seq which can be applied to archived frozen samples and provides less biased cellular coverage (Bakken et al, 2018). In fact, Lake et al (2016) revealed 16 neuronal subtypes using nuclear RNA from single nuclei harvested from post-mortem tissue, demonstrating snRNA-seq as a promising method to analyze the human brain. Similarly, snRNA-seq overcame the technical problems due to rapid enzymatic RNA degradation upon resection of pancreatic tissue, which historically have led to underrepresentation of exocrine cells and hampered comprehensive sequencing of human exocrine pancreatic cells (Tosti et al, 2021). An additional challenge in single-cell transcriptomics is to classify cell variability, to define cell "types" and to distinguish them from transient cell "states". A consensus on whether a cell going through different states should still be considered the same cell type has not yet been achieved. Xia and Yanai proposed a "periodic table" approach to distinguish cell types from cell states. Typically, scRNA-seq analysis relies on unsupervised clustering algorithms based on the differential expression of genes to identify the cell types (Xia & Yanai, 2019). This uncovers modules of genes and provides an initial map of the relative proportions of different cell types (Regev et al, 2017; Menon, 2018). However, clustering based on differential gene expression might overlook the fact that cell states, such as the cell cycle or stress, are also captured (Kiselev et al, 2019). By contrast, by defining cell identity using the concept of CRCs, a given cell is expected to show a unique set of TFs regardless of its state, which would help to distinguish between cell types and cell states. Hence, Xia & Yanai propose a cell clustering approach that combines both differentially expressed genes and the expression profile of TFs (Xia & Yanai, 2019). This represents a practical approach for distinguishing cell states within the cluster of a given identity. Epigenetic modifications and chromatin landscapes Defining CRC factors and building a network requires elucidation of the relationships between the regulators of gene expression (TFs) and the target genes (effector genes). TFs activate or inhibit the expression of genes by binding specific regulatory sequences, including promoters and enhancers (Spitz & Furlong, 2012). Identifying the enhancers that regulate genes of interest or are bound by key TFs is therefore crucial to understand the connections between the players in the CRC. As enhancers cannot be uniquely characterized by a particular sequence or feature (Coppola et al, 2016), they are identified using multiple approaches combined (Fig 2B). Coordinated experiments interrogating transcriptional responses and chromatin binding via chromatin immuno-precipitation with next-generation sequencing (ChIP-seq) can offer insights into different levels of gene regulation, TF-binding motifs, DNA and chromatin modifications, and how each component is coupled to a functional output (Holmberg & Perlmann, 2012; Wilson & Filipp, 2018). Examples of CRCs in specific lineages are included in Box 1 and Box 2. Box 2. A CRC view of the pancreas Mist1 and Ptf1a, two TFs involved in the CRC of pancreatic acini, exemplify the way in which various technologies complement one another to inform our knowledge of CRCs. The function of Mist1 and Ptf1a in acinar tissue has been established thanks to mouse genetic studies (Krapp et al, 1996; Lemercier et al, 1997; Pin et al, 2001). Together, Mist1 and Ptf1a bind and drive the transcription of over a hundred downstream acinar genes through reiterated feed-forward regulatory loops (Jiang et al, 2016). However, the depth and nature of these TFs' involvement in acinar cell identity was not understood until more recently when a combination of epigenetic and transcriptomic analyses revealed that they are part of a CRC (Jiang et al, 2016). ChIP-seq analysis revealed that Mist1 and Ptf1a share many target genes with highly juxtaposed binding sites. Ptf1a drives expression of Mist1 through binding to its enhancer, thus generating a self-sustaining regulatory loop between the two factors capable of maintaining not only itself, but also expression of effector genes essential for acinar cell identity. Within the endocrine compartment of the pancreas, loss-of-function experiments also uncovered the roles of potential CRC constituents [comprehensively reviewed in (Romer & Sussel, 2015)]. Specifically, the development of insulin-producing β-cells depends on several TFs such as Pdx1, Ngn3, and Nkx6.1 (Murtaugh, 2007; Best et al, 2008; Arda et al, 2013; Romer & Sussel, 2015; Jennings et al, 2015). While some of these developmentally crucial TFs are also members of the CRC governing terminal β-cell identity, additional TFs such as MafA and MafB are required to maintain the mature β-cell phenotype through regulation of downstream effector genes involved in β-cell function (Kataoka et al, 2002; Matsuoka et al, 2004; Nishimura et al, 2015; Zhu et al, 2017; Russell et al, 2020). scRNA-seq studies have unveiled a remarkable heterogeneity within mouse and human β-cells (Baron et al, 2016; Muraro et al, 2016; Segerstolpe et al, 2016; Xin et al, 2016; Lawlor et al, 2017; Mawla & Huising, 2019), which has further contributed to our understanding of these cell types. Wang and colleagues have taken advantage of single-cell transcriptomic data to model the relationship between eight master TFs (Pdx1, Ptf1a, Nkx6.1, Sox9, Hes1, Arx, Ngn3, and Pax4) in the pancreatic cell lineage (Wang et al, 2020). An adaptive landscape was constructed in which states were annotated either as mature or progenitor cell types based on prior knowledge of the relationships between these factors (Wang et al, 2020). The model infers additional transition states along different pancreatic lineage trajectories as well as previously unrecognized progenitors characterized by distinct CRC systems (Wang et al, 2020). Box 2 Figure. CRCs maintain distinct endocrine and exocrine cell type in the pancreas. The pancreas contains several highly specialized cell types with distinct physiological secretory roles; these unique cell identities are maintained by independent CRCs. (A) In the acinar CRC, Rbpjl and Ptf1a drive expression not only of acinar terminal selector genes (orange arrows), but also of themselves and other CRC members (light green arrows). This is an example of the self-sustaining nature of CRCs. (B) Numerous TFs guide the development and maturation of the insulin-secreting β-cells. Among these TFs, Ngn3 is extremely important during development but does not participate in the CRC of mature β-cells, while MafA and MafB are essential TSGs at later stages for β-cell functionality. Finally, some TFs, such as Pdx1, are important in both development and in the CRC governing long-term cell type maintenance. The majority of enhancers, in order to influence gene expression, are located in proximity to their target gene's promoter. Pairs of genomic loci which are nearby in 3D space can be identified using chromosome conformational capture (3C) (Dekker et al, 2002) or Hi-C (Belton et al, 2012). More conveniently, the genome can be scanned for accessible chromatin regions. Accessibility can be assayed by DNase-seq (Boyle et al, 2008) or ATAC-seq (Buenrostro et al, 2015), which work by partial DNA digestion or transposases, respectively. As promoters and actively transcribed genes are also located in accessible chromatin regions, chromatin accessibility measurements need to be combined with other datasets to predict enhancers. For example, Thibodeau and colleagues were able to effectively predict

Referência(s)