Robust French syntax analysis : reconciling statistical methods and linguistic knowledge in the Talismane toolkit

作者: Assaf Urieli

DOI:

关键词: Baseline (configuration management)Statistical modelFeature (machine learning)AnnotationBeam searchSentenceNatural language processingArtificial intelligenceParsingDependency (UML)LinguisticsComputer science

摘要: In this thesis we explore robust statistical syntax analysis for French. Our main concern is to methods whereby the linguist can inject linguistic knowledge and/or resources into engine in order improve results specific phenomena. We first dependency annotation schema French, concentrating on certain Next, look various algorithms capable of producing annotation, and particular transition-based parsing algorithm used rest thesis. After exploring supervised machine learning NLP classification problems, present Talismane toolkit analysis, built within framework thesis, including four modules - sentence boundary detection, tokenisation, pos-tagging as well baseline model, corpora, lexicons feature sets. experiments attempt configurations identify best baseline. then improvements made possible by beam search propagation. Finally, a series aimed at correcting errors related phenomena, using targeted features. One our innovation introduction rules that impose or prohibit decisions locally, thus bypassing model. usage features are unable correct. enhancement large scale resources, semi-supervised approach distributional semantic resource.

参考文章(90)
V. N. Vapnik, The Nature of Statistical Learning Theory. ,(1995)
Ameet Talwalkar, Ameet Talwalkar, Mehryar Mohri, Afshin Rostamizadeh, Afshin Rostamizadeh, Foundations of Machine Learning ,(2012)
Pontus Stenetorp, Sampo Pyysalo, Jun'ichi Tsujii, Goran Topić, Tomoko Ohta, Sophia Ananiadou, brat: a Web-based Tool for NLP-Assisted Text Annotation conference of the european chapter of the association for computational linguistics. pp. 102- 107 ,(2012)
Cécile Fabre, Affinités syntaxiques et sémantiques entre mots : apports mutuels de la linguistique et du TAL Université Toulouse le Mirail - Toulouse II. ,(2010)
Jun'ichi Tsujii, Yoshimasa Tsuruoka, Yusuke Miyao, Kazuhiro Yoshida, Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers international joint conference on artificial intelligence. pp. 1783- 1788 ,(2007)
Fernando C. N. Pereira, Ryan T. McDonald, Online Learning of Approximate Dependency Parsing Algorithms. conference of the european chapter of the association for computational linguistics. ,(2006)
François Morlane-Hondère, Une approche linguistique de l'évaluation des ressources extraites par analyse distributionnelle automatique Université Toulouse le Mirail - Toulouse II. ,(2013)
Joakim Nivre, Yue Zhang, Analyzing the Effect of Global Learning and Beam-Search on Transition-Based Dependency Parsing international conference on computational linguistics. pp. 1391- 1400 ,(2012)
Enrique Henestroza Anguiano, Efficient large-context dependency parsing and correction with distributional lexical resources Université Paris-Diderot - Paris VII. ,(2013)
Clémentine Adam, Voisinage lexical pour l'analyse du discours Université Toulouse le Mirail - Toulouse II. ,(2012)