作者: Anton Karl Ingason , Joel Wallenberg , Eir'ikur R"ognvaldsson , Einar Freyr Sigurðsson
DOI:
关键词:
摘要: The Icelandic Parsed Historical Corpus (IcePaHC) is a manually corrected treebank, parsed according to the annotation guidelines of Penn Corpora English (PPCHE), with minor modifications that are specific Icelandic. It consists about 1 million words from 12th century 21st. samples in corpus close being evenly distributed over this period. Most text narratives and religious material but some other genres also included. file format labeled bracketing as Treebank UTF-8 encoding. released under CC BY 4.0 license. Sogulegi islenski trjabankinn er handleiðrettur trjabanki sem greindur samkvaemt þattunarskema sogulegu ensku Penn-trjabankanna (Penn English; PPCHE). Bankinn inniheldur um milljon lesmalsorða fra 12. til 21. aldar. Gognin i malheildinni eru tiltolulega jafndreifð yfir þetta timabil. Flestir textarnir frasagnartextar eða truartextar en einnig að raeða einhver daemi aðrar textategundir. Skraarsniðið svigasnið (e. bracketing) eins og Penn-trjabankanum textinn stafasetti. Malheildinni dreift með leyfi.