Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography

作者： Mushtag Ahmad , Stefan Gruner , Muhammad Tanvir Afzal , None

DOI:

关键词:

摘要: Medieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making study a necessary pre-requisite to gaining in-depth knowl- edge history. Although tool-less such is possible has been ongoing for centuries, much subtle remains locked unless it gets revealed by effective means computational analysis. Automatic analy- sis non-trivial task mainly due non-conforming styles, spelling peculiarities, lack relational structures (hyper-links), which could be used answer meaningful queries. Natural Language Processing (NLP) tools algo- rithms are carry out analysis text data. However high percentage variations in manuscripts, NLP algorithms cannot applied directly If mapped standard dictionary words, then application al- gorithms becomes possible. In this paper we describe web-based software tool CAMM (Computational Analysis Manuscripts) maps vari- ations modern German dictionary. Here steps taken acquire, reformat, analyze data, produce putative mappings as well evaluate findings. At time writing paper, provides ac- cess 11275 organized into 54 collections containing total 242446 distinctly spelled words. accurately corrects 55% percent ver- ifiable freely available at http://researchworks.cs.athabascau.ca/

参考文章(17)

Philipp Koehn, Europarl: A Parallel Corpus for Statistical Machine Translation ,(2005)

Klaus U. Schulz, Christiane Wanzeck, Markus Heller, Andreas Hauser, Elisabeth Leiss, Information Access to Historical Documents from the Early New High German Period dagstuhl seminar proceedings. pp. 0- ,(2007)

Jie Qin, Shu-Mei Zhao, Shu-Qiang Yang, Wen-Hua Dou, XPEV: A Storage Model for Well-Formed XML Documents Fuzzy Systems and Knowledge Discovery. pp. 360- 369 ,(2005) , 10.1007/11539506_46

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)

Andrea Ernst-Gerlach, Norbert Fuhr, Generating Search Term Variants for Text Collections with Historic Spellings Lecture Notes in Computer Science. pp. 49- 60 ,(2006) , 10.1007/11735106_6

Hasan Zafari, Keramat Hasani, M. Ebrahim Shiri, XLight, An Efficient Relational Schema to Store and Query XML Data data storage and data engineering. pp. 254- 257 ,(2010) , 10.1109/DSDE.2010.46

Jun-Ki Min, Chun-Hee Lee, Chin-Wan Chung, XTRON: An XML data management system using relational databases Information & Software Technology. ,vol. 50, pp. 462- 479 ,(2008) , 10.1016/J.INFSOF.2007.05.003

Joseph J. Pollock, Antonio Zamora, Automatic spelling correction in scientific and scholarly text Communications of the ACM. ,vol. 27, pp. 358- 368 ,(1984) , 10.1145/358027.358048

Mushtaq Ahmad, Nazim Rahman, Stefan Gruner, A phonetic approach to handling spelling variations in medieval documents Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment - SAICSIT '11. pp. 263- 266 ,(2011) , 10.1145/2072221.2072253

10.

Thomas Pilz, Wolfram Luther, Norbert Fuhr, Ulrich Ammon, Rule-based Search in Text Databases with Nonstandard Orthography Literary and Linguistic Computing. ,vol. 21, pp. 179- 186 ,(2006) , 10.1093/LLC/FQL020

Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography

来源期刊

我的账户

Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography

来源期刊

相似文章 1

Life still goes on: Analysing Australian WW1 Diaries through Distant Reading

我的账户