Artigo Acesso aberto Revisado por pares

Profiling, what-if analysis, and cost-based optimization of MapReduce programs

2011; Association for Computing Machinery; Volume: 4; Issue: 11 Linguagem: Inglês

10.14778/3402707.3402746

ISSN

2150-8097

Autores

Herodotos Herodotou, Shivnath Babu,

Tópico(s)

Advanced Data Storage Technologies

Resumo

MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. We introduce, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. We focus on the optimization opportunities presented by the large space of configuration parameters for these programs. We also introduce a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. All components have been prototyped for the popular Hadoop MapReduce system. The effectiveness of each component is demonstrated through a comprehensive evaluation using representative MapReduce programs from various application domains.

Referência(s)