Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

作者: Heiko Maus , Sven Schwarz , Christian Jilek , Andreas Dengel , Markus Schröder

DOI: 10.4230/OASICS.LDK.2019.11

关键词: Ontology (information science)Process (engineering)OntologyNamed-entity recognitionArtificial intelligenceGermanInformation extractionQuality (business)Natural language processingWord (computer architecture)Precision and recallComputer scienceTask (project management)

摘要: A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems -- just name a few. To perform the services desired by user, these analyze user activity logs or explicit input extremely fast. In particular, text content (e.g. form snippets) needs be processed an information extraction task. Regarding aforementioned temporal requirements, this has accomplished few milliseconds, which limits methods that can applied. Practically, only very fast remain, on other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines. paper, we investigate and propose for real-time capable Named Entity Recognition (NER). As first improvement step address are word variations induced inflection, example present German language. Our approach is ontology-based makes use several language sources like Wiktionary. We evaluated it using Wikipedia (about 9.4B characters), whole NER process took considerably less hour. Since precision recall higher comparably methods, conclude quality gap between high speed NLP pipelines narrowed bit without losing too much runtime performance.

参考文章(13)
Leo Sauermann, Ansgar Bernardi, Andreas Dengel, Overview and outlook on the semantic desktop sdw'05 Proceedings of the 2005 International Conference on Semantic Desktop Workshop: Next Generation Information Management D Collaboration Infrastructure - Volume 175. pp. 74- 91 ,(2005)
Iryna Gurevych, Torsten Zesch, Christof Müller, Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary language resources and evaluation. ,(2008)
Stefan Dlugolinsky, Giang Nguyen, Michal Laclavik, Martin Seleng, Character gazetteer for Named Entity Recognition with linear matching complexity world congress on information and communication technologies. pp. 361- 365 ,(2013) , 10.1109/WICT.2013.7113096
Harith Al-Jumaily, Paloma Martínez, José L. Martínez-Fernández, Erik Van der Goot, A real time Named Entity Recognition system for Arabic text mining language resources and evaluation. ,vol. 46, pp. 543- 563 ,(2012) , 10.1007/S10579-011-9146-Z
Marti A. Hearst, Automatic acquisition of hyponyms from large text corpora Proceedings of the 14th conference on Computational linguistics -. pp. 539- 545 ,(1992) , 10.3115/992133.992154
Alfred V. Aho, Margaret J. Corasick, Efficient string matching: an aid to bibliographic search Communications of The ACM. ,vol. 18, pp. 333- 340 ,(1975) , 10.1145/360825.360855
Pablo N. Mendes, Max Jakob, Andrés García-Silva, Christian Bizer, DBpedia spotlight Proceedings of the 7th International Conference on Semantic Systems - I-Semantics '11. pp. 1- 8 ,(2011) , 10.1145/2063518.2063519
Giang Nguyen, Štefan Dlugolinský, Michal Laclavík, Martin Šeleng, Viet Tran, Next Improvement Towards Linear Named Entity Recognition Using Character Gazetteers Advanced Computational Methods for Knowledge Engineering. pp. 255- 265 ,(2014) , 10.1007/978-3-319-06569-4_19
Jakob Nielsen, Usability Engineering ,(1993)