Transformation of Wiktionary entry structure into tables and relations in a relational database schema

作者: A. A. Krizhanovsky

DOI:

关键词:

摘要: This paper addresses the question of automatic data extraction from Wiktionary, which is a multilingual and multifunctional dictionary. Wiktionary collaborative project working on same principles as Wikipedia. The entry plain text processing point view. guidelines prescribe layout rules, should be followed by editors presence structure article formatting rules allows transforming into tables relations in relational database schema, part machine-readable dictionary (MRD). describes how flat was extracted, converted, stored specially designed database. MRD contains definitions, semantic relations, translations extracted English Russian Wiktionaries. parser software released under open source license agreement (GPL), to facilitate its dissemination, modification upgrades, draw researchers programmers parsing other Wiktionaries, not only English.

参考文章(15)
Christof Müller, Iryna Gurevych, Using Wikipedia and Wiktionary in domain-specific information retrieval cross language evaluation forum. pp. 219- 226 ,(2008) , 10.1007/978-3-642-04447-2_28
R. Krovetz, W. B. Croft, Word sense disambiguation using machine-readable dictionaries international acm sigir conference on research and development in information retrieval. ,vol. 23, pp. 127- 136 ,(1989) , 10.1145/75334.75349
A. A. Krizhanovsky, The comparison of Wiktionary thesauri transformed into the machine-readable format arXiv: Information Retrieval. ,(2010)
Mark Dredze, Courtney Napoles, Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language north american chapter of the association for computational linguistics. pp. 42- 50 ,(2010)
Andrew Krizhanovsky, Feiyu Lin, Related Terms Search Based on WordNet / Wiktionary and its Application in Ontology Matching ♣ arXiv: Information Retrieval. pp. 363- 369 ,(2009)
G. Jan Wilms, Computerizing a machine readable dictionary Proceedings of the 28th annual Southeast regional conference on - ACM-SE 28. pp. 306- 313 ,(1990) , 10.1145/98949.99149
Asuka Sumida, Kentaro Torisawa, Hacking Wikipedia for Hyponymy Relation Acquisition international joint conference on natural language processing. pp. 883- 888 ,(2008)
Ulf Krumnack, Ekaterina Ovchinnikova, Henrik Dittmann, Tonio Wandmacher, Extraction, evaluation and integration of lexical-semantic relations for the automated construction of a lexical ontology AOW '07 Proceedings of the Third Australasian Workshop on Advances in Ontologies - Volume 85. pp. 61- 69 ,(2007)
Iryna Gurevych, Torsten Zesch, Christof Müller, Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary language resources and evaluation. ,(2008)