Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)
2022; BioMed Central; Volume: 23; Issue: 1 Linguagem: Inglês
10.1186/s12859-022-04769-w
ISSN1471-2105
Autores Tópico(s)AI in cancer detection
ResumoData transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.
Referência(s)