Stylometric Analysis of Scientific Articles

作者: Shane Bergsma , David Yarowsky , Matt Post

DOI:

关键词: Natural language processingArtificial intelligenceBaseline (configuration management)SyntaxComputational linguisticsDomain (software engineering)Computer scienceReduction (complexity)Stylometry

摘要: We present an approach to automatically recover hidden attributes of scientific articles, such as whether the author is a native English speaker, male or female, and paper was published in conference workshop proceedings. train classifiers predict these computational linguistics papers. The perform well this challenging domain, identifying non-native writing with 95% accuracy (over baseline 67%). show benefits using syntactic features stylometry; syntax leads significant improvements over bag-of-words models on all three tasks, achieving 10% 25% relative error reduction. give detailed analysis which words most particular attribute, we strong correlation between our predictions paper's number citations.

参考文章(1)
Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein, Learning Accurate, Compact, and Interpretable Tree Annotation meeting of the association for computational linguistics. pp. 433- 440 ,(2006) , 10.3115/1220175.1220230