作者: Albert Weichselbraun , Arno Scharl , Stefan Gindl
DOI:
关键词: Artificial intelligence 、 Ambiguity 、 Natural language processing 、 Adverb 、 Noun 、 Set (abstract data type) 、 Linguistics 、 Verb 、 Parsing 、 Computer science 、 Text processing 、 Database 、 Grammatical category
摘要: Despite the obvious business value of visualizing similarities between ele- ments evolving information spaces and mapping these e.g. onto geospa- tial reference systems, analysts are often more interested in how semantic orien- tation (sentiment) towards an organization, a product or particular technology is changing over time. Unfortunately, popular methods that process unstructured tex- tual material to detect orientation automatically based on tagged dictionar- ies (Scharl et al. 2003) not capable fulfilling this task, even when coupled with part-of-speech tagging, standard component most text processing toolkits distinguishes grammatical categories such as article (AT), noun (NN), verb (VB), adverb (RB). Small corpus size, ambiguity subtle incremental change tonal expressions different versions document complicate detection se- mantic prevent promising algorithms from being incorporated into commercial applications. Parsing structures, by contrast, outper- forms dictionary-based approaches terms reliability, but usually suffers poor scalability due their computational complexity. This paper addresses predica- ment presenting alternative approach building Tagged Linguistic Unit (TLU) databases overcome restrictions dictionaries limited set tokens.