Gleaning Information from the Web: Using Syntax to Filter Out Irrelevant Information

作者： R. Chandrasekar , B. Srinivas

DOI:

关键词: Computer science 、 Syntax 、 Artificial intelligence 、 Natural language processing 、 Web search engine 、 Cognitive models of information retrieval 、 Search engine 、 Relevance (information retrieval) 、 Filter (video) 、 Domain (software engineering) 、 Human–computer information retrieval 、 Information retrieval 、 World Wide Web

摘要: In this paper, we describe a system called Glean, which is predicated on the idea that any coherent text contains significant latent information, such as syntactic structure and patterns of language use, can be used to enhance perlbrmauce Information Retrieval systems. We propose an approach information retrieval makes use obtained using tool supertagger. A supertagger corpus training material semi-automatically induce call augmented-patterns. show how these augmented may along with standard Web search engine or IR retrieve identify relevant filter out irrelevant items. experiment in domain official appointments, where are shown reduce number potentially documents by upwards 80%.

参考文章(8)

Srinivas Bangalore, Performance Evaluation of Supertagging for Partial Parsing international workshop/conference on parsing technologies. pp. 187- 198 ,(2000) , 10.1007/978-94-015-9470-7_11

R. Chandrasekar, B. Srinivas, Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging RIAO '97 Computer-Assisted Information Searching on Internet. pp. 531- 545 ,(1997)

Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)

William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)

Aravind K. Joshi, B. Srinivas, Disambiguation of super parts of speech (or supertags) Proceedings of the 15th conference on Computational linguistics -. ,vol. 1, pp. 154- 160 ,(1994) , 10.3115/991886.991912

Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260

Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, Martin Zaidel, XTAG system: a wide coverage grammar for English international conference on computational linguistics. pp. 922- 928 ,(1994) , 10.3115/991250.991297

Yves Schabes, Anne Abeille, Aravind K. Joshi, Parsing strategies with 'lexicalized' grammars Proceedings of the 12th conference on Computational linguistics -. pp. 578- 583 ,(1988) , 10.3115/991719.991757

Gleaning Information from the Web: Using Syntax to Filter Out Irrelevant Information

来源期刊

我的账户

Gleaning Information from the Web: Using Syntax to Filter Out Irrelevant Information

来源期刊

相似文章 10

我的账户