Distributional term representations: an experimental comparison

作者： Alberto Lavelli , Fabrizio Sebastiani , Roberto Zanoli

关键词: Compound term processing 、 Natural language 、 Categorization 、 Artificial intelligence 、 Representation (mathematics) 、 Cluster analysis 、 Information retrieval 、 Natural language processing 、 Computational linguistics 、 Computer science 、 Noun phrase 、 Thesaurus (information retrieval) 、 Term (time) 、 Index term

摘要: A number of content management tasks, including term categorization, clustering, and automated thesaurus generation, view natural language terms (e.g. words, noun phrases) as first-class objects, i.e. objects endowed with an internal representation which makes them suitable for explicit manipulation by the corresponding algorithms. The information retrieval (IR) literature has traditionally used extensional (aka distributional) according to a is represented "bag documents" in occurs. computational linguistics (CL) independently developed alternative distributional terms, terms" that co-occur it some document. This paper aims at discovering two representations most effective, brings about higher effectiveness once tasks require be explicitly manipulated. We carry out experiments on (i) categorization task, (ii) clustering task; this allows us compare different closely controlled experimental conditions. report results we categorize/cluster under 42 classes extracted from corpus more than 65,000 documents. Our show substantial difference between styles; give both intuitive explanation information-theoretic justification these behaviours.

参考文章(40)

Bernardo Magnini, Gabriela Cavaglia, Integrating Subject Field Codes into WordNet language resources and evaluation. ,(2000)

Steven Finch, Finding structure in language The University of Edinburgh. ,(1995)

Mark Stevenson, Miles Whitehead, Tony Rose, The reuters corpus volume 1 - From yesterday's news to tomorrow's language resources language resources and evaluation. ,(2002)

Zellig Sabbettai Harris, Mathematical structures of language ,(1968)

J.R. Galliers, K. Spärck Jones, Evaluating natural language processing systems ,(1995)

Peter Schäuble, Daniel Knaus, The Various Roles of Information Structures Springer, Berlin, Heidelberg. pp. 282- 290 ,(1993) , 10.1007/978-3-642-50974-2_28

Pio Nardiello, Fabrizio Sebastiani, Alessandro Sperduti, Discretizing Continuous Attributes in AdaBoost for Text Categorization Lecture Notes in Computer Science. pp. 320- 334 ,(2003) , 10.1007/3-540-36618-0_23

Gerard Salton, Experiments in Automatic Thesaurus Construction for Information Retrieval. ifip congress. pp. 115- 123 ,(1971)

Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262

10.

Páraic Sheridan, Martin Braschlert, Peter Schäuble, Cross-Language Information Retrieval in a Multilingual Legal Domain european conference on research and advanced technology for digital libraries. pp. 253- 268 ,(1997) , 10.1007/BFB0026732

Distributional term representations: an experimental comparison

来源期刊

我的账户

Distributional term representations: an experimental comparison

来源期刊

相似文章 10

我的账户