Capítulo de livro Revisado por pares

Parallel Time Series Modeling - A Case Study of In-Database Big Data Analytics

2014; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-319-13186-3_38

ISSN

1611-3349

Autores

Hai Qian, Shengwen Yang, Rahul Iyer, Xixuan Feng, M. Wellons, Caleb Welton,

Tópico(s)

Data Management and Algorithms

Resumo

MADlibis an open-source library for scalable in-database analytics. In this paper, we present our parallel design of time series analysis and implementation of ARIMA modeling in MADlib's framework. The algorithms for fitting time series models are intrinsically sequential since any calculation for a specific time $$t$$ depends on the result from the previous time step $$t-1$$ . Our solution parallelizes this computation by splitting the data into $$n$$ chunks. Since the model fitting involves multiple iterations, we use the results from previous iteration as the initial values for each chunk in the current iteration. Thus the computation for each chunk of data is not dependenton on the results from the previous chunk. We further improve performance by redistributing the original data such that each chunk can be loaded into memory, minimizing communication overhead. Experiments show that our parallel implementation has good speed-up when compared to a sequential version of the algorithm and R's default implementation in the "stats" package.

Referência(s)