作者: Assaf Urieli
DOI:
关键词: Baseline (configuration management) 、 Statistical model 、 Feature (machine learning) 、 Annotation 、 Beam search 、 Sentence 、 Natural language processing 、 Artificial intelligence 、 Parsing 、 Dependency (UML) 、 Linguistics 、 Computer science
摘要: In this thesis we explore robust statistical syntax analysis for French. Our main concern is to methods whereby the linguist can inject linguistic knowledge and/or resources into engine in order improve results specific phenomena. We first dependency annotation schema French, concentrating on certain Next, look various algorithms capable of producing annotation, and particular transition-based parsing algorithm used rest thesis. After exploring supervised machine learning NLP classification problems, present Talismane toolkit analysis, built within framework thesis, including four modules - sentence boundary detection, tokenisation, pos-tagging as well baseline model, corpora, lexicons feature sets. experiments attempt configurations identify best baseline. then improvements made possible by beam search propagation. Finally, a series aimed at correcting errors related phenomena, using targeted features. One our innovation introduction rules that impose or prohibit decisions locally, thus bypassing model. usage features are unable correct. enhancement large scale resources, semi-supervised approach distributional semantic resource.