Artigo Acesso aberto Revisado por pares

A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations

2023; Elsevier BV; Volume: 3; Issue: 1 Linguagem: Inglês

10.1016/j.crmeth.2022.100390

ISSN

2667-2375

Autores

Carlos P. Roca, Oliver T. Burton, Julika Neumann, Samar Tareen, Carly E. Whyte, Václav Gergelits, Rafael Veiga, Stéphanie Humblet‐Baron, Adrian Liston,

Tópico(s)

Cell Image Analysis Techniques

Resumo

The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.

Referência(s)