Multiplying insights from perturbation experiments: predicting new perturbation combinations
2023; Springer Nature; Volume: 19; Issue: 6 Linguagem: Inglês
10.15252/msb.202311667
ISSN1744-4292
Autores Tópico(s)Meteorological Phenomena and Simulations
ResumoNews & Views11 May 2023Open Access Multiplying insights from perturbation experiments: predicting new perturbation combinations Joshua Welch Corresponding Author Joshua Welch [email protected] orcid.org/0000-0002-5869-2391 University of Michigan, Ann Arbor, MI, USA Search for more papers by this author Joshua Welch Corresponding Author Joshua Welch [email protected] orcid.org/0000-0002-5869-2391 University of Michigan, Ann Arbor, MI, USA Search for more papers by this author Author Information Joshua Welch *,1 1University of Michigan, Ann Arbor, MI, USA *Corresponding author. E-mail: [email protected] Molecular Systems Biology (2023)19:e11667https://doi.org/10.15252/msb.202311667 See also: Lotfollahi et al (June 2023) PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions Figures & Info Recent advances in multiplexed single-cell transcriptomics have allowed generating vast amounts of drug and genetic perturbation data. Experimentally exploring the combinatorial perturbation space is not feasible, emphasizing the need for computational methods that could predict the effect of perturbations. In their recent study, Theis and colleagues (Lotfollahi et al, 2023) present a new approach that uses deep generative models to predict the effects of new perturbations (such as drug or gene combinations that have not been experimentally measured) from high-throughput single perturbation experiments. Advances in experimental technologies have made high-throughput single-cell perturbation measurements increasingly easy to obtain. Activating, repressing, or editing specific genomic targets with CRISPR-based technologies allows focused genetic perturbations, and the effects of hundreds or thousands of perturbations on gene expression can be measured using Perturb-Seq (Dixit et al, 2016) or CROP-seq (Datlinger et al, 2017). Similarly, with techniques such as Sci-Plex (Srivatsan et al, 2020), it is now possible to measure gene expression after drug treatment, at single-cell resolution, across many different cell types. The cellular perturbation response is highly complex because perturbations can both directly affect gene targets and indirectly cause changes that propagate through gene regulatory interactions. Thus, understanding the consequences of perturbations requires considering the entire high-dimensional molecular state of a cell. This makes single-cell perturbation data a remarkable new resource for deciphering perturbation responses. However, the space of perturbation effects is far too large to measure exhaustively. For example, the number of possible drug-like molecules is estimated to exceed 1060. Similarly, there are more than 200 million combinations of single- and double-gene perturbations alone. The combinatorial explosion becomes more pronounced when considering multiple drug doses or multiple guide RNAs, multiple cell types, and multiple drug or gene combinations. In a seemingly unrelated trend, machine learning approaches for generating novel examples of high-dimensional data have rapidly advanced. In particular, using neural networks to train deep generative models has proven remarkably effective at generating text and image data. For example, deep generative models can draw pictures of new human faces that look like real people, even though no such person exists (Karras et al, 2019). Similar techniques have proven effective at generating new sentences and paragraphs that read just like prose from a human author (preprint: OpenAI, 2023). A particularly powerful property behind the success of these deep generative models is their ability to disentangle complex factors of variation in the training data. For example, a deep generative model trained on pictures of chairs can learn the effects of color, fabric type, and chair style on image pixel values. Disentangling such factors allows a remarkable result: Deep generative models can generalize beyond the training data by combining the underlying factors in ways not previously observed. Thus, we can combine concepts such as "armchair," "red," "avocado," and "gummy bear" in different ways to draw pictures of imaginary armchairs in the shape of an avocado or gummy bear (Fig 1A). Figure 1.(A) Deep generative models can combine concepts such as "armchair," "red," "avocado," and "gummy bear" in new ways to draw pictures of imaginary armchairs in the shape of an avocado or gummy bear. Images generated by DALL-E2. (B) Similarly, CPA can predict gene expression profiles for cells with unobserved gene combinations, drug dosages, or drug combinations based on the expression profiles obtained from single perturbations. Download figure Download PowerPoint In their recent work, Lotfollahi et al (2023) combine these two strands of research (single-cell perturbation measurements and deep generative models) to predict the effects of new perturbation combinations that have not yet been experimentally analyzed, such as drug dosages, cell types, or time points. The key contribution of the study is a deep generative model called compositional perturbation autoencoder (CPA). Similar to how deep generative models can draw pictures with combinations of attributes not seen in the training data, CPA can generate gene expression profiles of cells for new combinations of perturbations (Fig 1B). CPA is trained on large perturbation datasets from experiments such as Perturb-seq, CROP-seq, or sci-Plex. In an extensive set of evaluations, the authors showed that CPA can predict the effects of new gene activation combinations, new drug doses, and even new drug combinations. This effectively "multiplies" the utility of existing single-cell perturbation datasets, as the authors demonstrated by predicting gene expression profiles for the additional 98% of all possible two-gene combinations that were not measured in a Perturb-Seq dataset. In an impressive proof of principle, they used CPA to predict the gene expression profiles induced by unseen drug combinations. They then experimentally validated these results by creating a new single-cell dataset, which demonstrated the accuracy of the model predictions. While CPA is designed to predict new combinations of perturbations already seen in the training data, an extension of the model (called ChemCPA) uses molecule embeddings to predict unseen single-drug treatments. CPA opens several exciting future directions. First, while CPA makes predictions using neural networks rather than a directly interpretable model, these large-scale predictions provide an opportunity to elucidate mechanisms of gene regulation. Second, the authors note that although CPA is currently designed to predict the effects of perturbations on gene expression, the framework could be extended to incorporate additional types of single-cell data, such as epigenomic, proteomic, or spatial measurements. CPA predictions can also guide the design of perturbation experiments by nominating drugs or gene knockout or activation combinations that are predicted to have the most interesting or desirable effects. Finally, CPA holds promise for developing better therapeutics by predicting optimal drug combinations or personalized treatments. Disclosure and competing interest statement JW has no competing interests to declare. References Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, Schuster LC, Kuchler A, Alpar D, Bock C (2017) Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods 14: 297–301CrossrefCASPubMedWeb of Science®Google Scholar Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R et al (2016) Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167: 1853–1866.e17CrossrefCASPubMedWeb of Science®Google Scholar Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEEGoogle Scholar Lotfollahi M, Klimovskaia Susmelj A, De Donno C, Hetzel L, Ji Y, Ibarra Del Río I, Srivatsan S, Naghipourfar M, Daza RM, Martin B et al (2023) Predicting cellular responses to complex perturbations in high-throughput screens. Mol Syst Biol 19: e11517Wiley Online LibraryGoogle Scholar OpenAI (2023) GPT-4 technical report. arXiv [csCL] https://doi.org/10.48550/arXiv.2303.08774 [PREPRINT]CrossrefGoogle Scholar Srivatsan SR, McFaline-Figueroa JL, Ramani V, Saunders L, Cao J, Packer J, Pliner HA, Jackson DL, Daza RM, Christiansen L et al (2020) Massively multiplex chemical transcriptomics at single-cell resolution. Science 367: 45–51CrossrefCASPubMedWeb of Science®Google Scholar Previous ArticleNext Article Read MoreAbout the coverClose modalView large imageVolume 19,Issue 6,12 June 2023This month's cover highlights the article Predicting cellular responses to complex perturbations in high‐throughput screens by Fabian Theis and colleagues. The compositional perturbation autoencoder (CPA) is a deep learning model which predicts the transcriptomic responses of single cells to single or combinatorial treatments from drugs or genetic manipulations. (Cover concept by the authors; scientific illustration by SciStories) Volume 19Issue 612 June 2023In this issue FiguresReferencesRelatedDetailsLoading ...
Referência(s)