ANNIS3: A new architecture for generic corpus query and visualization

作者: Thomas Krause , Amir Zeldes

DOI: 10.1093/LLC/FQU057

关键词:

摘要: This article is concerned with the data structures, properties of query languages, and visualization facilities required for generic representation richly annotated, heterogeneous linguistic corpora. We propose that above beyond a general graph-based model, which becoming increasingly popular in many complex annotation formats, well-defined concept multiple, potentially conflicting segmentation layers must be introduced to deal different sources applications corpus flexibly. also solution specialized visualizations Web interface using annotation-triggered style sheets, leverage power modern browsers CSS multiple highly customizable views primary data. offer an implementation evaluation our architecture ANNIS3, open-source browser-based search visualization. present three case studies test coverage system, encompassing core digital humanities use-cases including annotated newspaper treebanks, multilingual diplomatic normalized manuscript materials edited TEI, analysis multimodal recordings spoken language.

参考文章(27)
Nancy Ide, Keith Suderman, GrAF Proceedings of the Linguistic Annotation Workshop on - LAW '07. pp. 1- 8 ,(2007) , 10.3115/1642059.1642060
Manfred Stede, The Potsdam commentary corpus meeting of the association for computational linguistics. pp. 96- 102 ,(2004) , 10.3115/1608938.1608951
WILLIAM C. MANN, SANDRA A. THOMPSON, Rhetorical Structure Theory : Toward a Functional Theory of Text Organization Text - Interdisciplinary Journal for the Study of Discourse. ,vol. 8, pp. 243- 281 ,(1988) , 10.1515/TEXT.1.1988.8.3.243
Thomas Schmidt, Kai Wörner, EXMARaLDA – creating, analysing and sharing spoken language corpora for pragmatic research Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA). ,vol. 19, pp. 565- 582 ,(2009) , 10.1075/PRAG.19.4.06SCH
Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, Vít Suchomel, The Sketch Engine: ten years on Lexicography ASIALEX. ,vol. 1, pp. 7- 36 ,(2014) , 10.1007/S40607-014-0009-9
Piotr Bański, Adam Przepiórkowski, Stand-off TEI annotation Proceedings of the Third Linguistic Annotation Workshop on - ACL-IJCNLP '09. pp. 64- 67 ,(2009) , 10.3115/1698381.1698392
Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, Ralph Weischedel, OntoNotes Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX - NAACL '06. pp. 57- 60 ,(2006) , 10.3115/1614049.1614064
Andrew Hardie, CQPweb — combining power, flexibility and usability in a corpus analysis tool International Journal of Corpus Linguistics. ,vol. 17, pp. 380- 409 ,(2012) , 10.1075/IJCL.17.3.04HAR
Anne H. Anderson, Miles Bader, Ellen Gurman Bard, Elizabeth Boyle, Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, Jan McAllister, Jim Miller, Catherine Sotillo, Henry S. Thompson, Regina Weinert, The HCRC Map Task Corpus Language and Speech. ,vol. 34, pp. 351- 366 ,(1991) , 10.1177/002383099103400404
Daniel Janus, Adam Przepiórkowski, Poliqarp: An open source corpus indexer and search engine with syntactic extensions meeting of the association for computational linguistics. pp. 85- 88 ,(2007) , 10.3115/1557769.1557795