News Sync: Enabling scenario-based news exploration
2011; Wiley; Volume: 48; Issue: 1 Linguagem: Inglês
10.1002/meet.2011.14504801078
ISSN0044-7870
AutoresV. G. Vinod Vydiswaran, Jeroen van den Eijkhof, Raman Chandrasekar, Ann Paradiso, Jim St. George,
Tópico(s)Video Analysis and Summarization
ResumoProceedings of the American Society for Information Science and TechnologyVolume 48, Issue 1 p. 1-10 PaperFree Access News Sync: Enabling scenario-based news exploration V.G. Vinod Vydiswaran, V.G. Vinod Vydiswaran vgvinodv@illinois.edu University of Illinois, 201 N. Goodwin Ave, Urbana, IL 61801, USASearch for more papers by this authorJeroen van den Eijkhof, Jeroen van den Eijkhof jeroen@uw.edu Information School, Box 352840, University of Washington, Seattle, WA 98195, USASearch for more papers by this authorRaman Chandrasekar, Raman Chandrasekar mickeyrc@hotmail.com Evri.com, 71 Columbia Street, Suite 300, Seattle, WA 98104, USASearch for more papers by this authorAnn Paradiso, Ann Paradiso annpar@microsoft.com Microsoft Research, One Microsoft Way, Redmond, WA 98052, USASearch for more papers by this authorJim St. George, Jim St. George jamessg@microsoft.com Microsoft Research, One Microsoft Way, Redmond, WA 98052, USASearch for more papers by this author V.G. Vinod Vydiswaran, V.G. Vinod Vydiswaran vgvinodv@illinois.edu University of Illinois, 201 N. Goodwin Ave, Urbana, IL 61801, USASearch for more papers by this authorJeroen van den Eijkhof, Jeroen van den Eijkhof jeroen@uw.edu Information School, Box 352840, University of Washington, Seattle, WA 98195, USASearch for more papers by this authorRaman Chandrasekar, Raman Chandrasekar mickeyrc@hotmail.com Evri.com, 71 Columbia Street, Suite 300, Seattle, WA 98104, USASearch for more papers by this authorAnn Paradiso, Ann Paradiso annpar@microsoft.com Microsoft Research, One Microsoft Way, Redmond, WA 98052, USASearch for more papers by this authorJim St. George, Jim St. George jamessg@microsoft.com Microsoft Research, One Microsoft Way, Redmond, WA 98052, USASearch for more papers by this author First published: 11 January 2012 https://doi.org/10.1002/meet.2011.14504801078Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract News consumption patterns are changing, but the tools to view news are dominated by portal and search approaches. In this paper, we suggest using elements of search, visualization, natural language processing, and machine learning to provide a more captivating, sticky news consumption experience. We propose a novel use-case driven approach to present news and present News Sync, a system that addresses three specific news exploration scenarios where a user wants to catch up on news from a particular time period, keep in touch with news from specific locations, or follow the lives of celebrities. The news experience is enhanced by clustering news articles and allowing users to interact with and share stories of interest, and filter results on specific dimensions such as time, location, and key entities. User deployment studies suggest a distinct preference for an interface that supports exploration and visualization of news articles. INTRODUCTION The news landscape has undergone major changes with the advent of online media. While the readership of traditional newspapers has declined over the past few years, the consumption of news over the Internet has increased significantly. In a March 2010 survey of US Internet users (Gather, 2010), it was found that the Web/Internet is by far the most popular source to find news (49%), as compared to television (32%) and newspapers (9%). Further, as with other kinds of online information, the dominant mode of accessing news online is through search. According to the June 2010 Pew Research biennial news consumption survey (Pew Research Center for the People and the Press, 2010), henceforth referred to as the Pew survey, 34% of general public use search engines for news, and three of the top four most frequented websites for news are search engine websites. It suggests that, even though there are several dedicated news portals, consumption of online news is triggered primarily through search queries. Search engine companies take advantage of this user behavior by integrating relevant news results with Web search results for news-related queries and providing dedicated news verticals and topic-specific news web pages. Treating a news query as just another query, however, restricts users to just a set of ten (usually recent) news articles. News, however, is a living entity with a rich past and future. It deals with interactions between other entities, such as people, places, events, and topics. Any good news presentation must encourage users to explore these dimensions. In this paper, we propose a novel use-case driven approach to present news and list key design goals for such a system. We describe the design decisions we took to build a prototype, called News Sync, to explore an archive of twenty years of News York Times articles. We study how users interact with the system, using a combination of usability studies and user feedback, and show that users prefer using an exploratory interface in terms of task success and user satisfaction. SURVEY OF ONLINE NEWS SOURCES Traditional search companies form the largest and most frequently visited online news destinations. According to the Pew survey, Yahoo! (28%) is the most used website for news, ahead of television news leader, CNN (16%). Other search portals, Google (15%) and MSN (14%) compete for the top news websites. Over the past two years, search engine websites have gained in popularity over traditional news media conglomerates. Two-thirds of those polled in the Pew survey said they used search engines to find news on a particular subject, but only 10% regularly got news from customized webpages (e.g., iGoogle or My Yahoo!) or through RSS feeds. Social networking sites such as Facebook, MySpace, and LinkedIn also played a role in disseminating news headlines −19% of those polled got news from social networking sites regularly or sometimes, and some also got news occasionally from Twitter. In the following sections, we briefly survey different online news sites, and present a summary of some of their features. News Aggregators Traditional search engines aggregate news from many sources and categorize them by news categories, such as World (News), Business, Sports, etc. Aggregating news from multiple sources helps them present a multimodal view of a news story that includes videos, photos, and live feeds. Sites such as Yahoo! News prioritize news sources and differentiate news coming from different sources. Users can select sources for each category, and news stories from just these sources are displayed. Other sites, such as Bing News and Google News, present news from many sources as clusters. This allows them to also incorporate stories from other sources, such as blogs, Twitter, and Wikipedia. Traditional News Media Online While readership and viewership of traditional newspapers and television news diminished over the past decade, their online versions have seen an increase. The average monthly reach of web newspapers among Internet households has increased from 27.4% in 2004 to 40.9% in 2008 (Nielsen/Net Ratings). Many television channels now make news clips available online either on their sites or on other video-sharing sites like YouTube. Newspapers have augmented their online content with videos and photos to visually appeal to younger readers on the Web (e.g., see Figure 1). With the news consumption moving away from print media, some newspapers, such as the Seattle Post-Intelligencer11 Seattle Post-Intelligencer, http://www.seattlepi.com/ , have gone web-only, while other news outlets, such as The Huffington Post22 The Huffington Post, http://www.huffingtonpost.com/ , Newser33 Newser, http://www.newser.com/ , and Seven-Sided Cube44 Seven-Sided Cube, http://www.sevensidedcube.net/ , present editorial and blog content as independent news online. In addition, a significant portion of news is community-generated. Sites such as NewsVine55 NewsVine, http://www.newsvine.com/ and GlobalReporter66 Global Reporter, http://globalreporter.com/ allow users to post (and rate) news local to their community, while news outlets, such as CNNiReport77 CNN iReport, http://ireport.cnn.com/ , have accepted this notion of grass-roots journalism and allow users to post news videos. News on-the-fly News consumption patterns have also changed over the past decade. Rather than setting apart time to access news, users consume news throughout the day. The Pew survey found that 76% of all online users (62% of public) say they come across news online when they have been on the Web for another purpose. This is especially true of younger consumers, who typically follow links to news stories, rather than going directly to news sources by themselves. Social networking sites have played an important role in pushing news content to the web users. Sites such as StumbleUpon88 StumbleUpon, http://www.stumbleupon.com/ , Digg99 Digg, http://digg.com/ , Reddit1010 Reddit, http://www.reddit.com/ , Yahoo! Buzz1111 Yahoo! Buzz, http://buzz.yahoo.com/ , Delicious, Facebook, and Twitter allow users to tag news stories and recommend or share them with friends on their social network. Such news recommendation services not only rate mainstream news, but also help users stumble upon unique news stories that they may not otherwise have a chance to see. Many traditional news sites have plug-in interfaces to these bookmarking sites to enable readers to share news freely. The bookmarking also helps present "popular" or "upcoming" news, as shown by news ranking sites like NewsPulse1212 NewsPulse, http://newspulse.cnn.com/ , BuzzFeed1313 BuzzFeed, http://www.buzzfeed.com/ , Digg, and Reddit. Figure 1Open in figure viewerPowerPoint The Washington Post online edition (retrieved June 10th, 2010). Note the integration of (a) video, (b) images, (c) live feeds, and (d) social networking sites. Analysis of Online News As illustrated above, many online news sites augment news with videos, images, and user comments to enrich the news consumption experience. Search-based news aggregator sites cluster and categorize news stories and allow users to customize and personalize what they want to read. Services such as email/mobile news alerts and RSS feeds, and customized web pages, such as My MSN, My Yahoo!, and iGoogle, allow users to get news on demand. Readers are encouraged to share news with others, either in their social network or the online community at large. News portals use these to assess popularity of news stories and surface articles that seem to be generating a lot of interest ("buzz"). The algorithmic aggregation of news across sources used by news aggregators, however, appears to treat all news sources equally, especially when selecting which news item to show. Often, recent updates supersede earlier reports, even if the earlier reports were from reputable sources. On the other hand, users may have other preferences, like local news sources for community news, or in general, specific news sources for preferred genres of news. Local news or news from semi-urban regions are often ill-represented either because of fewer sources reporting on regional news or limited space being allocated to local news on news sites. The "one size fits all" approach of using keyword query based retrieval is not optimal for news. Query-based triggering is often imperfect, and searching for news using just keyword queries often limits expressivity. News demands a ranking different from web search. News dissemination is more than just selecting a list of news articles about popular events from well-known news sources. Online news must instead cater to specific use-cases and should ideally be personalized to users. BEYOND SEARCH: NEWS EXPLORATION SCENARIOS In this section, we propose techniques to select and present news according to user needs and preferences. We present three specific scenarios for a user-driven news digest to illustrate our ideas. Scenario 1: Catching up on News Consider the following scenario: Katie is an avid news reader who tracks news on a daily basis, often following up on specific news events several times a day. At times, Katie may be cut off from news, for example, when she goes on a long vacation. When she is back online, she may want to know what happened while she was away. She may want to skim through the major news stories that took place, including updates on the news she was following regularly before going on vacation. This caters to a common, specific need of a news consumer wanting to catch up on news. Scenario 2: Diaspora Digest It has become fairly common for people to migrate to another country or city for work or studies. Though most of these expatriates try to keep abreast with the news from the country of origin, they lose touch with traditional sources of news. They may visit news websites from their home country / city periodically to do so, but this may not be easy or convenient. If Katie is from Berlin and residing in the US, she may be interested in a summarized view of key events in Germany from the past week. She might be interested in German soccer team's performance round the year and in country-wide soccer competitions such as the German Cup. This caters to the need of expatriates who either live in another country or migrate to a new city within the country and access news about their home city. Scenario 3: Following Celebrities A longitudinal look at news is of great value for specific needs, such as following the activities of celebrities. Assume Katie is an admirer of Princess Diana, and wants to get a perspective of Princess Diana's life history as described in the news. She would be interested in key events such as her marriage, her time as the princess, her divorce, and her death and subsequent investigations. This caters to the user need to get a historic perspective on key people or events in an archival news corpus. REQUIREMENTS FOR NEWS SYNC To address the above scenarios, we propose a system called News Sync. This system allows Katie and similar news consumers to get adaptive, personalized news digests on a topic, region, time period, or a combination of these. We list the following requirements for News Sync: 1. Choice of news categories, topics, and sources: Users should be able to specify the time period of interest. In addition, users may specify if they are interested in news from particular sources, specific news categories, locations/regions, and/or specific topics. 2. Personalized news feed: The system should identify stories that are currently the most relevant to the user, based on past user behavior and user preferences, similar in spirit to work by Billsus & Pazzani (2000). 3. Variety in news content: The system should show a variety of content across diverse categories, instead of, say, returning a list of ten "most popular" news links which may be restricted to one or two topics. Users can thus get an overall picture of key events first, before they delve into specific stories. 4. Multimodal and adaptive news presentation: The news interface needs to be adaptive to the nature of news topic presented, and availability of multiple modes of news content. For example, a search for news about "Harry Potter" over summer 2007 should result in, among other stories, movie trailers (video) of "Harry Potter and the Order of Phoenix," book reviews (blogs) of "Harry Potter and the Deathly Hallows", which were both released in July '07, and news about the Harry Potter theme park announced in May '07. 5. Interactive and exploratory user interface: Users should be able to interactively modify time, location, and other parameters and have the system respond immediately with updated views of relevant news. 6. Parameterized interface design: Users should be able to set parameters to get results at different specificities. 7. Support source-tracing and finding related news: The system should allow users to go from a news summary to the original news article. Further, the system should suggest other related articles based on the news items viewed. 8. Ability to share news: Users should be able to comment on and share interesting news articles over their social network or over the Web via email. 9. Support news analyses by sentiment and points of view: Users should be able to view stories summarized by sentiment or different points of view. 10. Support the familiar "list view" as back-off: Finally, even as the news interface gets a facelift, it may be prudent to support the list view as a back-off option to take advantage of users' familiarity with the concept. Although these requirements have been specified with news domain in mind, they are generally applicable and relevant to other domains such as legal search, patent search, and in intelligence gathering tasks that aim at gleaning information from significant amount of archival text corpora. THE NEWS SYNC SYSTEM In this section, we present a brief description of News Sync, the system we developed based on requirements we listed in the previous section. We identified three key dimensions for news which users have control over during search (namely, the time period, location, and category of news articles), in addition to search via keywords. The following sections describe system features, assuming control over only these dimensions. It must be noted that it is possible to easily extend the system to other dimensions. Further, while the individual techniques used may not themselves be new, our proposed integration leads to a better news experience. System Description Figure 2 gives a schematic diagram of the News Sync system. The key steps in the system are: 1. Deciding on a news corpus: In this prototype, we use the New York Times corpus, released as part of the HCIR 2010 Challenge1414 Fourth Workshop on Human–Computer Interaction and Information Retrieval, 2010, http://www.hcir2010.org/ . It consists of all articles published (or posted online) by New York Times from 1987 to 2007. This is analogous to an archival dump of news articles that can be augmented incrementally, if desired. The corpus also contains fairly rich meta-data annotations like normalized names of people, locations, organizations, and key concepts found in the articles. 2. Indexing the corpus: The New York Times corpus was indexed using Lucene.Net1515 Lucene.Net, http://lucene.apache.org/lucene.net/ , with support for field-level queries. This required the removal of stop words and additional pre-processing to normalize some fields (such as publication date) to make them searchable. 3. Retrieving relevant news results: When a user issues a news query, the system converts it to a suitable Lucene1616 Lucene, http://lucene.apache.org/java/docs/index.html query and retrieves several hundred relevant news results. If a date range is specified, only results from that date range are retrieved. If a category or location is specified, it must appear in all result articles. 4. Grouping news articles: News needs to be presented in a manner that is easy to consume. This involves selecting the content to present and deciding how best to present it. In this work, we cluster articles returned by the search system to find related groups of articles. Each group may not be a single story thread, but this clustering-based dimensionality reduction offers a more structured view into the articles. Recursive clustering can help us get to news stories, which are collections of strongly related articles. We currently cluster on key concepts from articles, including named entities, descriptors, categories, and section headings obtained from the article meta-data. We use the Hierarchical Agglomerative Clustering algorithm (Hastie et al., 2009) to find clusters and threshold it based on similarity between news articles computed over the key concepts listed above. These news clusters may be created adaptively based on the individual user models built from the user profile, explicit user preferences, and implicit interest tracking. 5. Summarizing news clusters: We also adaptively summarize the clusters to provide some insight into the articles within a cluster. Summarization is performed using a modified version of SumBasic (Nenkova & Vanderwende, 2005). SumBasic is an extractive summarization system which iteratively selects a few most significant sentences from one or more articles. The set of sentences extracted do not necessarily form a cogent paragraph. However, just as result snippets provide some insight into individual results on a search result page, we expect these summaries to provide an indication of what the created clusters are about. 6. Adding aggregated meta-data about the clusters: Each news cluster is annotated with additional meta-data such as the news timeline, relevant categories, locations, and key concepts from the articles. 7. Presenting and visualizing news: Once the news clusters are annotated, they are presented to the user along with relevant meta-data. The meta-data, presented in the form of sparklines and tag clouds, can be used by the users to further explore and refine the clusters. This is described in detail in the next section on User Interaction. Figure 3 shows a screenshot of the results for a catching-up scenario query, "Watergate". Figure 2Open in figure viewerPowerPoint System diagram of News Sync. Figure 3Open in figure viewerPowerPoint Screenshot of News Sync showing results for the query "Watergate" in the catching-up scenario. The system is developed in C#. The interface is developed using Microsoft Silverlight1717 Microsoft Silverlight, http://www.silverlight.net/ , since it gives us browser independence and access to animation and interactivity. USER INTERACTION The system interaction flow is sketched below: 1. Providing search parameters: When Katie logs in, she is shown a tag cloud of key topics from the corpus. She can browse for other news by providing one or more of four input parameters – the news category, topics of interest (keywords), location(s), and a date range of interest in the input panel (see Figure 4). 2. Viewing news clusters: When Katie enters a news query consisting of one or more parameters, several hundred (currently 1000) relevant results are retrieved from the indexed article store and the articles are dynamically clustered. The left panel of the result screen (see Figure 5) lists the clusters, ordered by popularity and relevance. The top-most cluster is highlighted and the left panel displays additional properties about the cluster, such as tag clouds of key concepts and locations mentioned in the news articles. A sparkline shows the distribution of articles with time. The right panel (see Figure 6) gives additional information about the highlighted cluster. It shows a brief summary, followed by a list of relevant articles. The list shows the publication date, headline, and the lead paragraph for each article. 3. Browsing news results: Katie can either explore the articles in the current cluster or can look into other clusters from the left panel. If she clicks on the article headline, the article is displayed with all relevant meta-data (see Figure 7). If she clicks on another cluster from the left panel, the section with additional information on the first cluster shrinks, and the newly selected cluster expands to show its information. The right panel also shows results for this selected cluster. Katie can also interact with the sparkline by hovering over or clicking it. The sparkline is divided into 25 time periods, ranging f rom the date of the first (earliest) story in the cluster through the date of the last (latest) story in the cluster. The time period between the first and last dates is divided into 25 segments, and mapped on to the closest logical time period (viz., day, half-week, week, half-month, month, 3 months, and year). As the hover point is moved across the sparkline, different date ranges can be selected and the number of articles from that date range is displayed. The articles from the selected date range are also highlighted in real-time in the right panel. Clicking on a time segment filters the result set to articles only from this time period, allowing Katie to zoom into news from specific time periods. Typically, this is used to focus on articles corresponding to the peaks or valleys in the sparkline. Katie can also click on one or more keyword or location tags and reduce the result set to only articles that contain the selected tags. If Katie is interested in exploring a particular cluster in detail, she can select a cluster and choose to dig deeper. A new query is then issued based on the chosen cluster, the news articles in the cluster, and the original query to obtain a refined, second-level clustering that Katie can further explore. 4. Sharing results: The interface also allows Katie to share the summary, articles, or stories with her friends on popular social networking sites. She can also save her queries and results for future recall. 5. Following user actions: As Katie interacts with the system, her actions, queries, and parameter settings are stored. When Katie reads articles and shares it with her friends, the key concepts from the article are recorded in user models maintained per user. Result ranking, clustering, and summarization of clusters are continuously adapted based on the user model. Figure 4Open in figure viewerPowerPoint Top portion of the interface, showing input panel for entering search parameters. Figure 5Open in figure viewerPowerPoint The left panel is used to browse relevant clusters, interact with the sparkline and keyword and location clouds. Figure 6Open in figure viewerPowerPoint Right panel shows abstract for the selected cluster, followed by a list of articles in the cluster. Katie can also explicitly restrict her results to be from particular regions or categories. These customization preferences are recorded and subsequent results are tuned to these preferences. Figure 7Open in figure viewerPowerPoint News Sync article obtained after clicking an article in the summary view (Figure 6). CORPUS CHARACTERISTICS The system was developed as part of HCIR Challenge 2010, a shared-data initiative run over the summer of 2010, associated with the Human Computer Interaction and Information Retrieval workshop. As part of the challenge, the organizers released a dataset consisting of over 1.8 million articles published by the New York Times from 1987 to 2007. The corpus only had textual content and did not have any multimedia content, including images and videos. The dataset, however, included rich, manually annotated meta-data. All news articles were tagged with the section title and page numbers or URL where they appeared in the print and online versions of New York Times. The named entities in these articles, such as person names, locations, and organizations, were identified and normalized. Further, articles were tagged with other general descriptors that could be used as keywords describing the main themes of the article. The lack of multimedia content presented some challenges during the design of a news interface. The focus of the interface design was directed away from integrating multimedia content (such as photos and videos) with news, and was refocused towards enhancing user experience by allowing news exploration. The result section was designed to project the navigational aspects of news, allowing users to zoom in and out of news clusters based on time, location, entities, and other key concepts. A version of News Sync was demonstrated at HCIR 2010 (Vydiswaran et al., 2010). EVALUATION We piloted an initial prototype of News Sync internally within our organization. In the first phase, we released it to a small user-base to understand how users interact with the system, using implicit and explicit feedback. Usability studies and a follow-up survey helped us understand popular features and usage patterns. The next phase was a wider deployment of two competing interfaces with the same look-and-feel and result set, where the exploration features were enabled in one and disabled in the other. This study helped us evaluate the impact of news exploration on user satisfaction and task completion. Usability Study In the first evaluation phase, we conducted a boxed usability study. Users were invited for the study of a news site. They were first asked if they had heard of an archival search system and if they read news online. Almost all users displayed some understanding of online news systems, even if they had not used an archival search software before. This helped set the News Sync system in perspective. None of the users had seen the prototype before and no user manual was given on how to interact with the system. Users were asked to complete a set of tasks one-by-one, as they explored the system interface and various options
Referência(s)