作者: Christian Biemann
DOI:
关键词: Language identification 、 Temporal annotation 、 Information extraction 、 Natural language processing 、 Universal Networking Language 、 Computational linguistics 、 Artificial intelligence 、 Natural language 、 Question answering 、 Deep linguistic processing 、 Computer science
摘要: After almost 60 years of attempts to implement natural language competence on machines, there is still no automatic processing system that comes even close human performance. The fields Computational Linguistics and Natural Language Processing predominantly sought teach the machine a variety subtasks understanding either by explicitly stating rules or providing annotations should learn reproduce. In contrast this, acquisition largely happens in an unsupervised way – mere exposure numerous samples triggers processes generalisation abstraction needed for speaking language. Exactly this strategy pursued work: rather than telling machines how process language, one instructs them discover structural regularities text corpora. Shifting workload from specifying rule-based systems manually annotating creating employ utilise structure builds inventory mechanisms once being verified number datasets applications are universal allows their execution unseen data with similar structure. This enormous alleviation what called "acquisition bottleneck processing" gives rise unified treatment provides accelerated access part our cultural memory. Now computing power storage capacities have reached sufficient level undertaking, we first time find ourselves able leave bulk work overcome sparseness simply larger data. Chapter 1, Structure Discovery paradigm introduced. framework learning large data, making these explicit introducing via self-annotation. predominant paradigms, involves neither languagespecific knowledge nor supervision therefore independent lan-