作者: A. A. Krizhanovsky
DOI:
关键词:
摘要: This paper addresses the question of automatic data extraction from Wiktionary, which is a multilingual and multifunctional dictionary. Wiktionary collaborative project working on same principles as Wikipedia. The entry plain text processing point view. guidelines prescribe layout rules, should be followed by editors presence structure article formatting rules allows transforming into tables relations in relational database schema, part machine-readable dictionary (MRD). describes how flat was extracted, converted, stored specially designed database. MRD contains definitions, semantic relations, translations extracted English Russian Wiktionaries. parser software released under open source license agreement (GPL), to facilitate its dissemination, modification upgrades, draw researchers programmers parsing other Wiktionaries, not only English.