作者: Alena Böhmová , Jan Hajič , Eva Hajičová , Barbora Hladká
DOI: 10.1007/978-94-010-0201-1_7
关键词: Dependency (UML) 、 Syntax (programming languages) 、 Natural language processing 、 Scheme (programming language) 、 Annotation 、 Czech 、 Treebank 、 Artificial intelligence 、 Field (computer science) 、 Computational linguistics 、 Computer science
摘要: The availability of annotated data (with as rich and “deep” annotation possible) is desirable in any new developments. Textual are being used for so-called training phase various empirical methods solving problems the field computational linguistics. While there many that use texts their plain (or raw) form (in most cases unsupervised training), more accurate results may be obtained if corpora available. itself a complex task. morphologically (pioneered by Henry Kucera 60’s) now available English other languages, syntactically rare. Inspired Penn Treebank, widely corpus English, we decided to develop similarly sized Czech with scheme.