F
2016; Association for Computing Machinery; Volume: 9; Issue: 13 Linguagem: Inglês
10.14778/3007263.3007312
ISSN2150-8097
AutoresDan Olteanu, Maximilian Schleich,
Tópico(s)Data Stream Mining Techniques
ResumoWe demonstrate F , a system for building regression models over database views. At its core lies the observation that the computation and representation of materialized views, and in particular of joins, entail non-trivial redundancy that is not necessary for the efficient computation of aggregates used for building regression models. F avoids this redundancy by factorizing data and computation and can outperform the state-of-the-art systems MADlib, R, and Python StatsModels by orders of magnitude on real-world datasets. We illustrate how to incrementally build regression models over factorized views using both an in-memory implementation of F and its SQL encoding. We also showcase the effective use of F for model selection: F decouples the data-dependent computation step from the data-independent convergence of model parameters and only performs once the former to explore the entire model space.
Referência(s)