A Comparison of ORC-Compress Performance with Big Data Workload on Virtualization

Artigo Acesso aberto Revisado por pares

A Comparison of ORC-Compress Performance with Big Data Workload on Virtualization

2016; Trans Tech Publications; Volume: 855; Linguagem: Inglês

10.4028/www.scientific.net/amm.855.153

ISSN

2297-8941

Autores

Kritwara Rattanaopas, Sureerat Kaewkeerat, Yanapat Chuchuen,

Tópico(s)

Graph Theory and Algorithms

Resumo

Big Data is widely used in many organizations nowadays. Hive is an open source data warehouse system for managing large data set. It provides a SQL-like interface to Hadoop over Map-Reduce framework. Currently, Big Data solution starts to adopt HiveQL tool to improve execution time of relational information. In this paper, we investigate on an execution time of query processing issues comparing two algorithm of ORC file: ZLIB and SNAPPY. The results show that ZLIB can compress data up to 87% compared to NONE compressing data. It was better than SNAPPY which has space saving 79%. However, the key for reducing execution time is Map-Reduce that were shown by a less query execution time when mapper and data node were equal. For example, all query suites in 6-node(ZLIB/SNAPPY) with 250-million table rows has quite similar execution time comparison to 9-node(ZLIB/SNAPPY) with 350-million table rows.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

A Comparison of ORC-Compress Performance with Big Data Workload on Virtualization