Artigo Acesso aberto Revisado por pares

Towards the optimization of a parallel streaming engine for telco applications

2014; Wiley; Volume: 18; Issue: 4 Linguagem: Inglês

10.1002/bltj.21652

ISSN

1538-7305

Autores

Bart Theeten, Ivan Bedini, Peter Cogan, Alessandra Sala, Tommaso Cucinotta,

Tópico(s)

Peer-to-Peer Network Technologies

Resumo

Parallel and distributed computing is becoming essential to process in real time the increasingly massive volume of data collected by telecommunications companies.Existing computational paradigms such as MapReduce (and its popular open-source implementation Hadoop) provide a scalable, fault tolerant mechanism for large scale batch computations.However, many applications in the telco ecosystem require a real time, incremental streaming approach to process data in real time and enable proactive care.Storm is a scalable, fault tolerant framework for the analysis of real time streaming data.In this paper we provide a motivation for the use of real time streaming analytics in the telco ecosystem.We perform an experimental investigation into the performance of Storm, focusing in particular on the impact of parameter confi guration.This investigation reveals that optimal parameter choice is highly non-trivial and we use this as motivation to create a parameter confi guration engine.As fi rst steps towards the creation of this engine we provide a deep analysis of the inner workings of Storm and provide a set of models describing data fl ow cost, central processing unit (CPU) cost, and system management cost.© 2014 Alcatel-Lucent.of functions for user implementation.However, the batch processing nature of MapReduce, which requires that the full dataset is available at the start of the analysis, may make it unsuitable for certain applications within the telco ecosystem.For instance if a backend server is producing a continuous stream of log data, these logs may contain early indications of network issues which the telecom providers must address as quickly as possible to ensure quality of service to subscribers.Under the MapReduce paradigm, data would be aggregated over some time period τ, and provided to MapReduce for batch analysis.The

Referência(s)