
Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce
2015; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-319-22348-3_5
ISSN1865-1356
AutoresTiciana L. Coelho da Silva, Antonio Cavalcante Araujo Neto, Régis Pires Magalhães, Victor A. E. Farias, José Antônio Fernandes de Macêdo, Javam C. Machado,
Tópico(s)Cloud Computing and Resource Management
ResumoClustering is a major data mining technique that groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Among several types of clustering, density-based clustering algorithms are more efficient in detecting clusters with varied density and different shapes. One of the most important density-based clustering algorithms is DBSCAN. Due to the huge size of generated data by the widespread diffusion of wireless technologies and the complexity of big data analysis, new scalable algorithms for efficiently processing such data are needed. In this chapter we are particularly interested in using traffic data for finding congested areas in a city. For this purpose, we developed a new distributed and efficient strategy of DBSCAN algorithm that uses MapReduce to detect dense areas based on the input parameters. We conducted experiments using real traffic data of a brazilian city, Fortaleza, and compared our approach with the centralized and the MapReduce-based approaches. Our preliminary results confirmed that our approach is scalable and more efficient than the other ones. We also present an incremental version of DBSCAN considering the MapReduce version of it.
Referência(s)