Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters
2013; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-642-36071-8_8
ISSN1611-3349
AutoresRajashekhar M. Arasanal, Daanish U. Rumani,
Tópico(s)Advanced Database Systems and Queries
ResumoMapReduce has emerged as an important programming model with clusters having tens of thousands of nodes. Hadoop, an open source implementation of MapReduce may contain various nodes which are heterogeneous in their computing capacity for various reasons. It is important for the data placement algorithms to partition the input and intermediate data based on the computing capacities of the nodes in the cluster. We propose several enhancements to data placing algorithms in Hadoop such that the load is distributed across the nodes evenly. In this work, we propose two techniques to measure the computing capacities of the nodes. Secondly, we propose improvements to the input data distribution algorithm based on the map and reduce function complexities and the measured heterogeneity of nodes. Finally, we evaluate the improvement of the MapReduce performance.
Referência(s)