Tweet comma corpus Janes-Vejica 1.0

作者: Darja Fišer , Damjan Popič , Teja Kavčič , Polona Logar , Tomaž Erjavec

DOI:

关键词: Natural language processingSentence segmentationLinguisticsOn LanguageWord (computer architecture)Computer-mediated communicationManual annotationComputer scienceTypologyArtificial intelligence

摘要: Janes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to supplied typology. The was sampled from Janes-Norm (http://hdl.handle.net/11356/1084), which manually tokenisation, sentence segmentation, and word normalisation, automatically morphosyntactic descriptions lemmas. The further described in: POPIC, Damjan, FISER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniskih spletnih vsebinah. Proceedings Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/

参考文章(0)