Syntactic N-grams as machine learning features for natural language processing

作者： Grigori Sidorov , Francisco Velasquez , Efstathios Stamatatos , Alexander Gelbukh , Liliana Chanona-Hernández

DOI: 10.1016/J.ESWA.2013.08.015

关键词: Natural language processing 、 Classifier (linguistics) 、 Syntax 、 Computer science 、 C4.5 algorithm 、 Part of speech 、 Support vector machine 、 Classifier (UML) 、 Machine learning 、 Tree (data structure) 、 Syntactic predicate 、 Parsing 、 Naive Bayes classifier 、 Artificial intelligence

摘要: In this paper we introduce and discuss a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional in the manner how construct them, i.e., what elements are considered neighbors. case sn-grams, neighbors taken by following relations trees, not taking words as they appear text, sn-grams constructed paths trees. manner, allow bringing knowledge into machine learning methods; still, previous parsing is necessary for their construction. can be applied any natural language processing (NLP) task where used. We describe were to authorship attribution. used baseline words, part speech (POS) tags characters; three classifiers applied: support vector machines (SVM), naive Bayes (NB), tree classifier J48. give better results with SVM classifier.

参考文章(25)

J. Grieve, Quantitative Authorship Attribution: An Evaluation of Techniques Literary and Linguistic Computing. ,vol. 22, pp. 251- 270 ,(2007) , 10.1093/LLC/FQM020

Hans Van Halteren, Author verification by linguistic profiling: An exploration of the parameter space ACM Transactions on Speech and Language Processing. ,vol. 4, pp. 1- 17 ,(2007) , 10.1145/1187415.1187416

Apoorv Agarwal, Fadi Biadsy, Kathleen R. Mckeown, Contextual Phrase-Level Polarity Analysis Using Lexical Affect Scoring and Syntactic N-Grams meeting of the association for computational linguistics. pp. 24- 32 ,(2009) , 10.3115/1609067.1609069

Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283

D. I. HOLMES, The Evolution of Stylometry in Humanities Scholarship Literary and Linguistic Computing. ,vol. 13, pp. 111- 117 ,(1998) , 10.1093/LLC/13.3.111

Efstathios Stamatatos, A survey of modern authorship attribution methods Journal of the Association for Information Science and Technology. ,vol. 60, pp. 538- 556 ,(2009) , 10.1002/ASI.V60:3

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, The WEKA data mining software ACM SIGKDD Explorations Newsletter. ,vol. 11, pp. 10- 18 ,(2009) , 10.1145/1656274.1656278

Thamar Solorio, Manuel Montes-y-Gomez, Hugo Jair Escalante, Local Histograms of Character N-grams for Authorship Attribution meeting of the association for computational linguistics. pp. 288- 298 ,(2011)

A. Abbasi, Hsinchun Chen, Applying authorship analysis to extremist-group Web forum messages IEEE Intelligent Systems. ,vol. 20, pp. 67- 75 ,(2005) , 10.1109/MIS.2005.81

10.

Jonathan Schler, Moshe Koppel, Elisheva Bonchek-Dokow, Measuring Differentiability: Unmasking Pseudonymous Authors Journal of Machine Learning Research. ,vol. 8, pp. 1261- 1276 ,(2007)

Syntactic N-grams as machine learning features for natural language processing

来源期刊

我的账户

Syntactic N-grams as machine learning features for natural language processing

来源期刊

相似文章 10

我的账户