Construction of web search engine supporting intelligent Chinese word segmentation
2006; China Aerospace Science and Industry Group; Linguagem: Inglês
ISSN
1000-7024
Autores Tópico(s)Advanced Computational Techniques and Applications
ResumoChinese word segmentation has a vital effect on the precision and the recall of web search engine for Chinese.By analyzing an open source web search engine Nutch,a scalable lexical analyzer is implemented based on JavaCC.Then through integrating it with Nutch,a web search engine NutchEnhanced which supports intelligent Chinese word segmentation is constructed,and is used as a plat-form for testing the effect of various Chinese word segmentation algorithms in search engine.The experimental result show,for Chinese query,NutchEnhanced outperforms Nutch on the precision.With recall of 0.74 and precision of top 30 results getting 0.86,its Chinese search quality is as good as Google and Baidu in general.
Referência(s)