作者: R. Chandrasekar , B. Srinivas
DOI:
关键词: Precision and recall 、 Natural language processing 、 Computer science 、 Representation (mathematics) 、 Modularity 、 Information filtering system 、 Document filtering 、 Filter (signal processing) 、 Vector space model 、 Information retrieval 、 Artificial intelligence 、 Exploit
摘要: Any coherent text contains significant latent information, such as syntactic structure and patterns of language use. This information can be exploited to overcome the inadequacies keyword-based retrieval make more effective. In this paper, we demonstrate quantitatively how is useful in filtering out irrelevant documents. We also compare two different labelings -- simple Part-of-Speech (POS) labeling Supertag show richer (more fine-grained) representation supertags leads effective document filtering. have implemented a system which exploits flexible manner filter The has been tested on large collection newswire sentences, achieves recall precision figures 86% 97% for Its performance modularity makes it promising postprocessing addition any Information Retrieval system.