作者: Darja Fišer , Tomaž Erjavec , Teja Goli , Eneja Osrajnik
DOI:
关键词:
摘要: Janes-Kratko is a corpus of Slovene tweets manually annotated with shortening phenomena according to the supplied typology covering different types spelling, lexical and syntactic shortenings. The was sampled from Janes-Norm (http://hdl.handle.net/11356/1084), which for tokenisation, sentence segmentation word normalisation non-standard automatically morphosyntactic descriptions lemmas. The further described in: GOLI, Teja, OSRAJNIK, Eneja, FISER, Darja. Analiza krajsanja slovenskih sporocil na družbenem omrežju Twitter. Proceedings Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 77-82. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/