Skyline Cardinality for Relational Processing
2004; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-540-24627-5_7
ISSN1611-3349
Autores Tópico(s)Constraint Satisfaction and Optimization
ResumoThe skyline clause—also called the Pareto clause—recently has been proposed as an extension to SQL. It selects the tuples that are Pareto optimal with respect to a set of designated skyline attributes. This is the maximal vector problem in a relational context, but it represents a powerful extension to SQL which allows for the natural expression of on-line analytic processing (OLAP) queries and preferences in queries. Cardinality estimation of skyline sets is the focus in this work. A better understanding of skyline cardinality—and other properties of the skyline—is useful for better design of skyline algorithms, is necessary to extend a query optimizer's cost model to accommodate skyline queries, and helps to understand better how to use skyline effectively for OLAP and preference queries. Within a basic model with assumptions of sparseness of values on attributes' domains and statistical independence across attributes, we establish the expected skyline cardinality for skyline queries. While asymptotic bounds have been previously established, they are not widely known nor applied in skyline work. We show concrete estimates, as would be needed in a cost model, and consider the nature of the distribution of skyline. We next establish the effects on skyline cardinality as the constraints on our basic model are relaxed. Some of the results are quite counter-intuitive, and understanding these is critical to skyline's use in OLAP and preference queries. We consider when attributes' values repeat on their domains, and show the number of skyline is diminished. We consider the effects of having Zipfian distributions on the attributes' domains, and generalize the expectation for other distributions. Last, we consider the ramifications of correlation across the attributes.
Referência(s)