Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging

作者: R. Chandrasekar , B. Srinivas

DOI:

关键词: Precision and recallNatural language processingComputer scienceRepresentation (mathematics)ModularityInformation filtering systemDocument filteringFilter (signal processing)Vector space modelInformation retrievalArtificial intelligenceExploit

摘要: Any coherent text contains significant latent information, such as syntactic structure and patterns of language use. This information can be exploited to overcome the inadequacies keyword-based retrieval make more effective. In this paper, we demonstrate quantitatively how is useful in filtering out irrelevant documents. We also compare two different labelings -- simple Part-of-Speech (POS) labeling Supertag show richer (more fine-grained) representation supertags leads effective document filtering. have implemented a system which exploits flexible manner filter The has been tested on large collection newswire sentences, achieves recall precision figures 86% 97% for Its performance modularity makes it promising postprocessing addition any Information Retrieval system.

参考文章(13)
Jerry R. Hobbs, Douglas Appelt Sr, John S. Bear, David Israel Sr, W. M. Tyson, FASTUS: A System for Extracting Information from Natural-Language Text Defense Technical Information Center. ,(1992) , 10.21236/ADA259435
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Srinivas Bangalore, Complexity of lexical descriptions and its relevance to partial parsing University of Pennsylvania. ,(1997)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)
Aravind K. Joshi, B. Srinivas, Disambiguation of super parts of speech (or supertags) Proceedings of the 15th conference on Computational linguistics -. ,vol. 1, pp. 154- 160 ,(1994) , 10.3115/991886.991912
Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260
Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, Martin Zaidel, XTAG system: a wide coverage grammar for English international conference on computational linguistics. pp. 922- 928 ,(1994) , 10.3115/991250.991297
Nicholas J. Belkin, W. Bruce Croft, Information filtering and information retrieval Communications of the ACM. ,vol. 35, pp. 29- 38 ,(1992) , 10.1145/138859.138861