Artigo Revisado por pares

F

2016; Association for Computing Machinery; Volume: 9; Issue: 13 Linguagem: Inglês

10.14778/3007263.3007312

ISSN

2150-8097

Autores

Dan Olteanu, Maximilian Schleich,

Tópico(s)

Data Stream Mining Techniques

Resumo

We demonstrate F , a system for building regression models over database views. At its core lies the observation that the computation and representation of materialized views, and in particular of joins, entail non-trivial redundancy that is not necessary for the efficient computation of aggregates used for building regression models. F avoids this redundancy by factorizing data and computation and can outperform the state-of-the-art systems MADlib, R, and Python StatsModels by orders of magnitude on real-world datasets. We illustrate how to incrementally build regression models over factorized views using both an in-memory implementation of F and its SQL encoding. We also showcase the effective use of F for model selection: F decouples the data-dependent computation step from the data-independent convergence of model parameters and only performs once the former to explore the entire model space.

Referência(s)