Gleaning Information from the Web: Using Syntax to Filter Out Irrelevant Information

作者: R. Chandrasekar , B. Srinivas

DOI:

关键词: Computer scienceSyntaxArtificial intelligenceNatural language processingWeb search engineCognitive models of information retrievalSearch engineRelevance (information retrieval)Filter (video)Domain (software engineering)Human–computer information retrievalInformation retrievalWorld Wide Web

摘要: In this paper, we describe a system called Glean, which is predicated on the idea that any coherent text contains significant latent information, such as syntactic structure and patterns of language use, can be used to enhance perlbrmauce Information Retrieval systems. We propose an approach information retrieval makes use obtained using tool supertagger. A supertagger corpus training material semi-automatically induce call augmented-patterns. show how these augmented may along with standard Web search engine or IR retrieve identify relevant filter out irrelevant items. experiment in domain official appointments, where are shown reduce number potentially documents by upwards 80%.

参考文章(8)
Srinivas Bangalore, Performance Evaluation of Supertagging for Partial Parsing international workshop/conference on parsing technologies. pp. 187- 198 ,(2000) , 10.1007/978-94-015-9470-7_11
R. Chandrasekar, B. Srinivas, Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging RIAO '97 Computer-Assisted Information Searching on Internet. pp. 531- 545 ,(1997)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)
Aravind K. Joshi, B. Srinivas, Disambiguation of super parts of speech (or supertags) Proceedings of the 15th conference on Computational linguistics -. ,vol. 1, pp. 154- 160 ,(1994) , 10.3115/991886.991912
Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260
Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, Martin Zaidel, XTAG system: a wide coverage grammar for English international conference on computational linguistics. pp. 922- 928 ,(1994) , 10.3115/991250.991297
Yves Schabes, Anne Abeille, Aravind K. Joshi, Parsing strategies with 'lexicalized' grammars Proceedings of the 12th conference on Computational linguistics -. pp. 578- 583 ,(1988) , 10.3115/991719.991757