作者: Shane Bergsma , David Yarowsky , Matt Post
DOI:
关键词: Natural language processing 、 Artificial intelligence 、 Baseline (configuration management) 、 Syntax 、 Computational linguistics 、 Domain (software engineering) 、 Computer science 、 Reduction (complexity) 、 Stylometry
摘要: We present an approach to automatically recover hidden attributes of scientific articles, such as whether the author is a native English speaker, male or female, and paper was published in conference workshop proceedings. train classifiers predict these computational linguistics papers. The perform well this challenging domain, identifying non-native writing with 95% accuracy (over baseline 67%). show benefits using syntactic features stylometry; syntax leads significant improvements over bag-of-words models on all three tasks, achieving 10% 25% relative error reduction. give detailed analysis which words most particular attribute, we strong correlation between our predictions paper's number citations.