Adaptive Identification of Hashtags for Real-Time Event Data Collection
2015; Springer Vienna; Linguagem: Inglês
10.1007/978-3-319-14379-8_1
ISSN2190-5436
AutoresXinyue Wang, Laurissa Tokarchuk, Félix Cuadrado, Stefan Poslad,
Tópico(s)Caching and Content Delivery
ResumoThe widespread use of microblogging services, such as Twitter, makes them a valuable tool to correlate people’s personal opinions about popular public events. Researchers have capitalized on such tools to detect and monitor real-world events based on this public, social, perspective. Most Twitter event analysis approaches rely on event tweets collected through a set of predefined keywords. In this paper, we show that the existing data collection approaches risk losing a significant amount of event-relevant information. We propose a refined adaptive crawling model, to detect emerging popular topics, using hashtags, and monitor them to retrieve greater amounts of highly associated data for the events of interest. The proposed adaptive crawling model expands the queries periodically by analyzing the traffic pattern of hashtags collected from a live Twitter stream. We evaluated this adaptive crawling model with a real-world event. Based on the theoretical analysis, we tuned the parameters and ran three crawlers, including one baseline and two adaptive crawlers, during the 2013 Glastonbury music festival. Our analysis shows that adaptive crawling based on a Refined Keyword Adaptation algorithm outperforms the others. It collects the most comprehensive set of keywords, and with the minimal introduction of noise.
Referência(s)