NEW DIRECTIONS IN TEXT CATEGORIZATION

作者: Richard S. Forsyth

DOI: 10.1007/978-3-642-58648-4_11

关键词:

摘要: As more and documents are held in machine-readable form, problems of efficient text processing analysis become pressing. An important kind processing, which has recently attracted the attention researchers Artificial Intelligence (AI), is categorization, e.g. automatically assigning news stories [11.5] or medical case notes [11.46] a suitable category code. However, classifying not new problem: workers field stylometry have been grappling with it for than century. Typically, stylometers given most to authorship attribution used statistical methods, while AI-based research concentrated on discrimination by subject matter, using machine-learning techniques. The present chapter reports several recent studies drawing both these traditions. In addition, investigates various methods Textual Feature-Finding, i.e. choosing textual features attributes that: (1) do depend subjective judgement; (2) need knowledge sources external texts being analyzed, such as computerized lexicon; (3) presume that studied English; (4) assume word only possible unit.

参考文章(63)
Heikki Mannila, Erja Nikunen, Helena Ahonen, Forming grammars for structured documents AAAIWS'93 Proceedings of the 2nd International Conference on Knowledge Discovery in Databases. pp. 314- 325 ,(1993)
F. N. Teskey, Principles of text processing ,(1982)
Richard Forsyth, Stylistic atructures: a computational approach to text classification University of Nottingham. ,(1996)
Sholom M. Weiss, Computer systems that learn ,(1990)