Artigo Acesso aberto Revisado por pares

Estimating the Number of Clusters in a Data Set Via the Gap Statistic

2001; Oxford University Press; Volume: 63; Issue: 2 Linguagem: Inglês

10.1111/1467-9868.00293

ISSN

1467-9868

Autores

Robert Tibshirani, Guenther Walther, Trevor Hastie,

Tópico(s)

Data Management and Algorithms

Resumo

Summary We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

Referência(s)