作者: Darja Fišer , Damjan Popič , Teja Kavčič , Polona Logar , Tomaž Erjavec
DOI:
关键词: Natural language processing 、 Sentence segmentation 、 Linguistics 、 On Language 、 Word (computer architecture) 、 Computer-mediated communication 、 Manual annotation 、 Computer science 、 Typology 、 Artificial intelligence
摘要: Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to supplied typology. The was sampled from Janes-Norm (http://hdl.handle.net/11356/1084), which manually tokenisation, sentence segmentation, and word normalisation, automatically morphosyntactic descriptions lemmas. The further described in: POPIC, Damjan, FISER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniskih spletnih vsebinah. Proceedings Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/