Webpage understanding

2009; Association for Computing Machinery; Volume: 37; Issue: 4 Linguagem: Inglês

10.1145/1519103.1519111

ISSN

1943-5835

Autores

Zaiqing Nie, Ji-Rong Wen, Wei‐Ying Ma,

Tópico(s)

Spam and Phishing Detection

Resumo

In this paper we introduce the webpage understanding problem which consists of three subtasks: webpage segmentation, webpage structure labeling, and webpage text segmentation and labeling. The problem is motivated by the search applications we have been working on including Microsoft Academic Search, Windows Live Product Search and Renlifang Entity Relationship Search. We believe that integrated webpage understanding will be an important direction for future research in Web mining.

Referência(s)
Altmetric
PlumX