Unsupervised and Knowledge-free Natural Language Processing in the Structure Discovery Paradigm.

作者: Christian Biemann

DOI:

关键词: Language identificationTemporal annotationInformation extractionNatural language processingUniversal Networking LanguageComputational linguisticsArtificial intelligenceNatural languageQuestion answeringDeep linguistic processingComputer science

摘要: After almost 60 years of attempts to implement natural language competence on machines, there is still no automatic processing system that comes even close human performance. The fields Computational Linguistics and Natural Language Processing predominantly sought teach the machine a variety subtasks understanding either by explicitly stating rules or providing annotations should learn reproduce. In contrast this, acquisition largely happens in an unsupervised way – mere exposure numerous samples triggers processes generalisation abstraction needed for speaking language. Exactly this strategy pursued work: rather than telling machines how process language, one instructs them discover structural regularities text corpora. Shifting workload from specifying rule-based systems manually annotating creating employ utilise structure builds inventory mechanisms once being verified number datasets applications are universal allows their execution unseen data with similar structure. This enormous alleviation what called "acquisition bottleneck processing" gives rise unified treatment provides accelerated access part our cultural memory. Now computing power storage capacities have reached sufficient level undertaking, we first time find ourselves able leave bulk work overcome sparseness simply larger data. Chapter 1, Structure Discovery paradigm introduced. framework learning large data, making these explicit introducing via self-annotation. predominant paradigms, involves neither languagespecific knowledge nor supervision therefore independent lan-

参考文章(131)
Satu Elisa Schaeffer, Stochastic Local Clustering for Massive Graphs Advances in Knowledge Discovery and Data Mining. pp. 354- 360 ,(2005) , 10.1007/11430919_42
Chris Biemann, A. Gliozzo, C. Giuliano, Unsupervised Part of Speech Tagging Supporting Supervised Methods ,(2007)
Christopher D. Manning, Dan Klein, The unsupervised learning of natural language structure Stanford University. ,(2005)
Andrew M Olney, Latent Semantic Grammar Induction: Context, Projectivity, and Prior Distributions Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing. pp. 45- 52 ,(2007)
Dayne Freitag, Trained Named Entity Recognition using Distributional Clusters. empirical methods in natural language processing. pp. 262- 269 ,(2004)
Stefan Bordag, Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation conference of the european chapter of the association for computational linguistics. ,(2006)
Amruta Purandare, Ted Pedersen, Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces conference on computational natural language learning. pp. 41- 48 ,(2004)