An Experimental Investigation of Complexity in Database Query Formulation Tasks
2013; Volume: 24; Issue: 3 Linguagem: Inglês
ISSN
2186-3679
AutoresGretchen Irwin Casterella, Leo R. Vijayasarathy,
Tópico(s)Information Retrieval and Search Behavior
Resumo1. INTRODUCTION Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. While there are many graphical user interface tools that allow end-users to summarize and view organizational data, structured language (SQL) is still the standard programming language for formulating ad hoc queries against relational databases (Allen & March, 2006). Query formulation with SQL is a skill that is in high demand and is taught in most introductory database courses. Query formulation can be a complex task because it often includes a high degree of requirements uncertainty (e.g., ambiguity in the request for information), multiple solution paths that produce the correct result, and a high degree of information overload when working with large data models (Bowen et al., 2009; Ashkanasy et al., 2007; Borthick et al., 2001; Campbell 1988). In this study, we investigate two factors that impact writing performance--the ambiguity in the information request and the complexity of the target solution. We examine performance in terms of the accuracy of the solution, the time taken to produce the solution, and the writer's confidence in the quality of his solution. The purpose of the study is to confirm the main effects of ambiguity and solution complexity on performance (as in Borthick et al., 2001) and to evaluate the interaction effects of ambiguity and complexity on performance. Our goal is to use these findings to better understand why some queries are more difficult to formulate than others, and to identify potential teaching strategies and techniques to facilitate students' acquisition of SQL skills. 2. PRIOR RESEARCH ON QUERY FORMULATION Reisner's (1981) classic model of the formulation process is shown in Figure 1. According to this model, the writer is given an information request (e.g., Find the of manager) and generates a mental query of an SQL SELECT statement. The template specifies the structural foundation for the query. The writer then maps elements from the information request into SQL components that can be inserted into the appropriate slots of the template. The mapping involves three transformational activities: (1) replacing words from the information request with elements from the data model (e.g., replacing the salary with the column SAL), (2) adding elements to the SELECT statement beyond what is in the information request (e.g., Smith's manager [right arrow] NAME = (SELECT MGR WHERE NAME = 'Smith'), and (3) ignoring terms from the information request that are not needed in the SELECT statement. This model of template-generation-plus-mapping provides a reasonable starting point for understanding the process of formulation and two sources of complexity in formulation tasks--structural complexity and (lexical) transformational complexity (Reisner, 1977). Structural complexity addresses questions about the template, such as whether the FROM clause specifies an inner or outer join, or whether a WHERE, GROUP BY, or HAVING clause is needed. [FIGURE 1 OMITTED] Transformational complexity stems from the complexity in the lexical mappings shown in Figure 1--the replacing, adding, and omitting of lexical elements from the natural language request to fill in the slots in the template. Transformational complexity increases as the gap between terms in the information request and elements in the data model increases, and is influenced by the degree of ambiguity in the information request (Borthick et al., 2001). For example, a request such as, Which customers placed online orders over $3,000 last July? could also be worded more precisely as, List the customer's name and account number, if the customer placed an order between July 1, 2012 and July 30, 2012 with an order total greater than 3000 and an online order flag equal to 1. …
Referência(s)