Capítulo de livro Produção Nacional

Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce

2015; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-319-22348-3_5

ISSN

1865-1356

Autores

Ticiana L. Coelho da Silva, Antonio Cavalcante Araujo Neto, Régis Pires Magalhães, Victor A. E. Farias, José Antônio Fernandes de Macêdo, Javam C. Machado,

Tópico(s)

Cloud Computing and Resource Management

Resumo

Clustering is a major data mining technique that groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Among several types of clustering, density-based clustering algorithms are more efficient in detecting clusters with varied density and different shapes. One of the most important density-based clustering algorithms is DBSCAN. Due to the huge size of generated data by the widespread diffusion of wireless technologies and the complexity of big data analysis, new scalable algorithms for efficiently processing such data are needed. In this chapter we are particularly interested in using traffic data for finding congested areas in a city. For this purpose, we developed a new distributed and efficient strategy of DBSCAN algorithm that uses MapReduce to detect dense areas based on the input parameters. We conducted experiments using real traffic data of a brazilian city, Fortaleza, and compared our approach with the centralized and the MapReduce-based approaches. Our preliminary results confirmed that our approach is scalable and more efficient than the other ones. We also present an incremental version of DBSCAN considering the MapReduce version of it.

Referência(s)