Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging

作者： R. Chandrasekar , B. Srinivas

DOI:

关键词: Precision and recall 、 Natural language processing 、 Computer science 、 Representation (mathematics) 、 Modularity 、 Information filtering system 、 Document filtering 、 Filter (signal processing) 、 Vector space model 、 Information retrieval 、 Artificial intelligence 、 Exploit

摘要: Any coherent text contains significant latent information, such as syntactic structure and patterns of language use. This information can be exploited to overcome the inadequacies keyword-based retrieval make more effective. In this paper, we demonstrate quantitatively how is useful in filtering out irrelevant documents. We also compare two different labelings -- simple Part-of-Speech (POS) labeling Supertag show richer (more fine-grained) representation supertags leads effective document filtering. have implemented a system which exploits flexible manner filter The has been tested on large collection newswire sentences, achieves recall precision figures 86% 97% for Its performance modularity makes it promising postprocessing addition any Information Retrieval system.

参考文章(13)

Jerry R. Hobbs, Douglas Appelt Sr, John S. Bear, David Israel Sr, W. M. Tyson, FASTUS: A System for Extracting Information from Natural-Language Text Defense Technical Information Center. ,(1992) , 10.21236/ADA259435

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Srinivas Bangalore, Complexity of lexical descriptions and its relevance to partial parsing University of Pennsylvania. ,(1997)

Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)

William B. Frakes, Ricardo Baeza-Yates, Information Retrieval: Data Structures and Algorithms ,(1992)

Aravind K. Joshi, B. Srinivas, Disambiguation of super parts of speech (or supertags) Proceedings of the 15th conference on Computational linguistics -. ,vol. 1, pp. 154- 160 ,(1994) , 10.3115/991886.991912

R. Chandrasekar, B. Srinivas, Gleaning Information from the Web: Using Syntax to Filter Out Irrelevant Information ,(1996)

Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260

Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, Martin Zaidel, XTAG system: a wide coverage grammar for English international conference on computational linguistics. pp. 922- 928 ,(1994) , 10.3115/991250.991297

10.

Nicholas J. Belkin, W. Bruce Croft, Information filtering and information retrieval Communications of the ACM. ,vol. 35, pp. 29- 38 ,(1992) , 10.1145/138859.138861

Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging

来源期刊

我的账户

Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging

来源期刊

相似文章 10

我的账户