Fifth special issue on knowledge discovery and business intelligence
2020; Wiley; Volume: 37; Issue: 6 Linguagem: Inglês
10.1111/exsy.12628
ISSN1468-0394
Autores Tópico(s)Data Quality and Management
ResumoArtificial Intelligence (AI) is impacting our world. In the 1970s and 1980s, Expert Systems (ES) consisted of AI systems that included explicit knowledge, often represented in a symbolic form (e.g., by using the Prologue language), that was extracted from human experts. Since then, there has been an AI shift, due to three main phenomena (Darwiche, 2018): data explosion, with availability of several big data sources (e.g., social media, sensor data); computational power growth, following the famous Moore's law which states that computer processing capacity doubles every 2 years; and rise of sophisticated statistical and optimization techniques, including deep learning. Thus, rather than being expert-driven, ES have become more data-driven, with the focus on developing “computerized systems that use AI techniques to solve a specific real- world domain application task” (Cortez, Moro, Rita, King, & Hall, 2018). Aiming to foster the interaction between two key ES areas, Knowledge Discovery (KD) and Business Intelligence (BI), a series of “Knowledge Discovery and Business Intelligence” (KDBI) tracks were held at the EPIA conference on Artificial Intelligence, with a total of six editions from 2009 to 2019. Since 2011, the track has a dedicated special issue published in Wiley's Expert Systems journal (EXSY) (Cortez & Santos, 2013, 2015, 2017, 2018). KD is the AI subfield that addresses the extraction of useful knowledge from raw data (Fayyad, Piatetsky-Shapiro, & Smyth, 1996), while BI, also known as Business Analytics, is an umbrella term that includes methods and tools (e.g., data warehousing, dashboards and analytics) to obtain actionable knowledge from data (Ain, Vaia, DeLone, & Waheed, 2019). This is the Fifth special issue on Knowledge Discovery and Business Intelligence and it includes extended versions of selected papers presented at the sixth KDBI thematic track of EPIA 2019, held in Vila Real, Portugal. The track received a total of 17 paper submissions, from which 10 papers were accepted to be presented at the EPIA 2019 conference. The special issue of the EYSY journal involved two rounds of reviews for the selected papers, performed by the program committee members of the sixth KDBI track of EPIA2019 and EXSY journal expert reviewers. After the two rounds, six papers were accepted for the EXSY special issue. The accepted KDBI special issue papers reflect current AI methodological and application challenges. For instance, nowadays image, sound, sensor and social media data are becoming commonplace, thus there is a need to develop machine learning systems capable of processing such data and proving value in real-world applications. Moreover, the data-driven models should be understandable by the domain humans, in what is termed as Explainable AI (XAI). Finally, the extracted data-driven knowledge should be actionable, allowing to better support managerial decision-making. These challenges are addressed by the six papers published in this special issue, which are summarized in the next section. The first paper, entitled “Semantic Segmentation and Colorization of Grayscale Aerial Imagery with W-Net Models,” involves the semantic segmentation of remotely sensed aerial images. Dias, Monteiro, Estima, Silva, and Martins (2020) propose the W-Net specialized deep learning architecture to simultaneously segment images and reconstruct the colour of input images. Several experiments were held, with historical road maps and building footprints from the cities of Potsdam and Vaihingen, revealing competitive results when compared with the U-Net baseline model. Therefore, the adapted W-Net model can be used as a valuable image processing tool, alleviating the need for a manual inspection of aerial images. Another deep learning model, the Convolutional Neural Network (CNN), was adapted by Anjos, Marques, and Grilo (2020) to classify sibilant phonemes of European Portuguese sounds. The paper, entitled “Sibilant consonants classification comparison with multi and single-class neural networks,” proposes a serious sound game, aiming to help children to improve their speech. To benchmark the classifiers, a total of 1,500 sounds were collected using 145 children from three Portuguese schools. Mel-frequency Cepstral Coefficients (MFCCs) were used to pre-process the sounds, in order to generate the CNN inputs. The proposed CNN model outperformed two baseline models (a simpler Neural Network and a Support Vector Machine), obtaining a quality classification that can potentially help in language therapy games. In “From Mobility Data to Habits and Common Pathways,” Andrade, Cancela, and Gama (2020) analyse two real-world Global Positioning System (GPS) datasets, related with human trajectories from the cities of Beijing, China, and Porto, Portugal. In particular, a spatio-temporal clustering approach was proposed, in order to discover common pathways across the users' daily habits. The authors have found that humans tend to follow a regular schedule and frequent routes when moving between their preferred locations. The devised models can be used to predict the next locations of users or groups of users, which can be valuable in several real-world applications, including smart cities. The fourth paper, “A Context-Aware Recommender Method based on Text and Opinion Mining,” by Sundermann et al. (2020), focuses on recommendation systems based on social media texts. In particular, the authors propose a Natural Language Processing (NLP) method, called Context-Aware Recommendation Method based on Text and Opinion Mining (CARM-TOM). The method was validated using several experiments with restaurant recommendation data retrieved from the Yelp Internet platform. The proposed CARM-TOM obtained competitive results when compared with a matrix factorization and a state-of-the-art context extraction method. By automatically extracting contextual information, CARM-TOM has the potential to increase the performance of recommendation systems based on opinionated texts. Areosa and Torgo (2020) discuss a “Visual Interpretation of Regression Error.” The paper is set within the hot topic of XAI and it proposes several visualization tools to explain the relationship between input variables and the predictive performance of black box regression ML models. Several illustrations are provided for different real-world datasets and learning algorithms. The proposed visualizations can be used to increase the confidence of decision makers in trusting the regression model predictions. The sixth and last paper “A Bi-objective Procedure to Deliver Actionable Knowledge in Sport Services,” by Pinheiro and Cavique (2020), focuses on a real-world application, aiming to increase the retention of customers in gyms and health clubs. The work presents a data-driven pipeline that is capable of providing actionable knowledge. The authors used data from a Lisbon sports facility that had 21,755 users. A Decision Tree model was then used to predict user dropout rules. Next, several actionable attributes, which can be influenced by the sports facility managers, were identified, aiming to design actionable rules. Finally, several retention intervention what-if scenarios (which use the actionable rules) were simulated, combining ML and causal inference, allowing to estimate their business utility and costs. The overall data-driven actionable approach is thus of potential value to support the decisions of sport facility managers. We would like to thank the other KDBI 2019 track (of EPIA) co-organizers, namely João Gama, Lus Cavique, Manuel Santos and Nuno Marques. Moreover, we wish to thank the authors that contributed with papers for this special issue and the reviewers (from the KDBI 2019 program committee and the EXSY journal). The work of P. Cortez was supported by FCT - Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.
Referência(s)