Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography

作者: Mushtag Ahmad , Stefan Gruner , Muhammad Tanvir Afzal , None

DOI:

关键词:

摘要: Medieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making study a necessary pre-requisite to gaining in-depth knowl- edge history. Although tool-less such is possible has been ongoing for centuries, much subtle remains locked unless it gets revealed by effective means computational analysis. Automatic analy- sis non-trivial task mainly due non-conforming styles, spelling peculiarities, lack relational structures (hyper-links), which could be used answer meaningful queries. Natural Language Processing (NLP) tools algo- rithms are carry out analysis text data. However high percentage variations in manuscripts, NLP algorithms cannot applied directly If mapped standard dictionary words, then application al- gorithms becomes possible. In this paper we describe web-based software tool CAMM (Computational Analysis Manuscripts) maps vari- ations modern German dictionary. Here steps taken acquire, reformat, analyze data, produce putative mappings as well evaluate findings. At time writing paper, provides ac- cess 11275 organized into 54 collections containing total 242446 distinctly spelled words. accurately corrects 55% percent ver- ifiable freely available at http://researchworks.cs.athabascau.ca/

参考文章(17)
Klaus U. Schulz, Christiane Wanzeck, Markus Heller, Andreas Hauser, Elisabeth Leiss, Information Access to Historical Documents from the Early New High German Period dagstuhl seminar proceedings. pp. 0- ,(2007)
Jie Qin, Shu-Mei Zhao, Shu-Qiang Yang, Wen-Hua Dou, XPEV: A Storage Model for Well-Formed XML Documents Fuzzy Systems and Knowledge Discovery. pp. 360- 369 ,(2005) , 10.1007/11539506_46
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Andrea Ernst-Gerlach, Norbert Fuhr, Generating Search Term Variants for Text Collections with Historic Spellings Lecture Notes in Computer Science. pp. 49- 60 ,(2006) , 10.1007/11735106_6
Hasan Zafari, Keramat Hasani, M. Ebrahim Shiri, XLight, An Efficient Relational Schema to Store and Query XML Data data storage and data engineering. pp. 254- 257 ,(2010) , 10.1109/DSDE.2010.46
Jun-Ki Min, Chun-Hee Lee, Chin-Wan Chung, XTRON: An XML data management system using relational databases Information & Software Technology. ,vol. 50, pp. 462- 479 ,(2008) , 10.1016/J.INFSOF.2007.05.003
Joseph J. Pollock, Antonio Zamora, Automatic spelling correction in scientific and scholarly text Communications of the ACM. ,vol. 27, pp. 358- 368 ,(1984) , 10.1145/358027.358048
Mushtaq Ahmad, Nazim Rahman, Stefan Gruner, A phonetic approach to handling spelling variations in medieval documents Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment - SAICSIT '11. pp. 263- 266 ,(2011) , 10.1145/2072221.2072253
Thomas Pilz, Wolfram Luther, Norbert Fuhr, Ulrich Ammon, Rule-based Search in Text Databases with Nonstandard Orthography Literary and Linguistic Computing. ,vol. 21, pp. 179- 186 ,(2006) , 10.1093/LLC/FQL020