作者: Grigori Sidorov , Francisco Velasquez , Efstathios Stamatatos , Alexander Gelbukh , Liliana Chanona-Hernández
DOI: 10.1016/J.ESWA.2013.08.015
关键词: Natural language processing 、 Classifier (linguistics) 、 Syntax 、 Computer science 、 C4.5 algorithm 、 Part of speech 、 Support vector machine 、 Classifier (UML) 、 Machine learning 、 Tree (data structure) 、 Syntactic predicate 、 Parsing 、 Naive Bayes classifier 、 Artificial intelligence
摘要: In this paper we introduce and discuss a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional in the manner how construct them, i.e., what elements are considered neighbors. case sn-grams, neighbors taken by following relations trees, not taking words as they appear text, sn-grams constructed paths trees. manner, allow bringing knowledge into machine learning methods; still, previous parsing is necessary for their construction. can be applied any natural language processing (NLP) task where used. We describe were to authorship attribution. used baseline words, part speech (POS) tags characters; three classifiers applied: support vector machines (SVM), naive Bayes (NB), tree classifier J48. give better results with SVM classifier.