作者: Christopher Soo-Guan Khoo
DOI:
关键词: Identification (information) 、 Natural language processing 、 Document retrieval 、 Information retrieval 、 Wildcard 、 Artificial intelligence 、 Computational linguistics 、 Thesaurus (information retrieval) 、 Information extraction 、 Matching (statistics) 、 Sentence 、 Computer science
摘要: This study represents one attempt to make use of relations expressed in text improve information retrieval effectiveness. In particular, the investigated whether obtained by matching causal documents with users' queries could be used document results comparison using just term without considering relations. An automatic method for identifying and extracting cause-effect Wall Street Journal was developed. The uses linguistic clues identify recourse knowledge-based inferencing. successful about 68% that were clearly within a sentence or between adjacent sentences text. Of instances computer program identified as relations, 72% can considered correct. The an experimental system database full-text documents. Causal relation found yield small but significant improvement when weights combining scores from different types customized each query--as SDI routing situation. best combined word proximity (matching pairs causally related words query co-occur sentences). An analysis manually indicate bigger improvements expected more accurate identification relations. kind which member (either cause effect) represented wildcard match any term. The also Roget's International Thesaurus (3rd ed.) expand terms synonymous would Using Roget category codes addition keywords did give better results. However, at nonrelevant than relevant ones.