Feature instability as a criterion for selecting potential style markers

作者: Moshe Koppel , Navot Akiva , Ido Dagan

DOI: 10.1002/ASI.20428

关键词:

摘要: We introduce a new measure on linguistic features, called stability, which captures the extent to language element such as word or syntactic construct is replaceable by semantically equivalent elements. This may be perceived quantifying degree of available "synonymy" for item. show that frequent, but unstable, features are especially useful discriminators an author's writing style.

参考文章(13)
Yusuke Shinyama, Satoshi Sekine, Kiyoshi Sudo, Ralph Grishman, Automatic paraphrase acquisition from news articles international conference on human language technology research. pp. 313- 318 ,(2002)
Maria Fernanda Caropreso, Fabrizio Sebastiani, Stan Matwin, A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization Text databases & document management. pp. 78- 102 ,(2001)
Dunja Mladenić, Feature subset selection in text-learning european conference on machine learning. pp. 95- 100 ,(1998) , 10.1007/BFB0026677
Miguel E. Ruiz, Padmini Srinivasan, Hierarchical Text Categorization Using Neural Networks Information Retrieval. ,vol. 5, pp. 87- 118 ,(2002) , 10.1023/A:1012782908347
Ido Dagan, Yael Karov, Dan Roth, Mistake-Driven Learning in Text Categorization empirical methods in natural language processing. ,(1997)
Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni, Automatically Categorizing Written Texts by Author Gender Literary and Linguistic Computing. ,vol. 17, pp. 401- 412 ,(2002) , 10.1093/LLC/17.4.401
Hinrich Schütze, David A. Hull, Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '95. pp. 229- 237 ,(1995) , 10.1145/215206.215365
Regina Barzilay, Kathleen R. McKeown, Extracting paraphrases from a parallel corpus Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01. pp. 50- 57 ,(2001) , 10.3115/1073012.1073020
Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283