Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties

作者: Dipanjan Das , Noah A. Smith

DOI:

关键词:

摘要: We present novel methods to construct compact natural language lexicons within a graph-based semi-supervised learning framework, an attractive platform suited for propagating soft labels onto new types from seed data. To achieve compactness, we induce sparse measures at graph vertices by incorporating sparsity-inducing penalties in Gaussian and entropic pairwise Markov networks constructed labeled unlabeled Sparse are desirable high-dimensional multi-class problems such as the induction of on types, which typically associate with only few labels. Compared standard methods, two lexicon expansion problems, our approach produces significantly smaller obtains better predictive performance.

参考文章(37)
Fernando Pereira, Partha Pratim Talukdar, Graph-based weakly-supervised methods for information extraction & integration Graph-based weakly-supervised methods for information extraction & integration. pp. 170- 170 ,(2010)
Xiaojin ZhuЃ, Zoubin GhahramaniЃн, None, Learning from labeled and unlabeled data with label propagation Center for Automated Learning and Discovery, CMU: Carnegie Mellon University, USA.. ,(2002)
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Amarnag Subramanya, Fernando Pereira, Slav Petrov, Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models empirical methods in natural language processing. pp. 167- 176 ,(2010)
S. Kullback, R. A. Leibler, On Information and Sufficiency Annals of Mathematical Statistics. ,vol. 22, pp. 79- 86 ,(1951) , 10.1214/AOMS/1177729694
Matthieu Kowalski, Bruno Torrésani, Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients Signal, Image and Video Processing. ,vol. 3, pp. 251- 264 ,(2009) , 10.1007/S11760-008-0076-1
Collin Baker, Michael Ellsworth, Katrin Erk, SemEval-2007 Task 19: Frame Semantic Structure Extraction meeting of the association for computational linguistics. pp. 99- 104 ,(2007) , 10.3115/1621474.1621492
Galen Andrew, Jianfeng Gao, Scalable training of L1-regularized log-linear models international conference on machine learning. pp. 33- 40 ,(2007) , 10.1145/1273496.1273501
Ciyou Zhu, Richard H. Byrd, Peihuang Lu, Jorge Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization ACM Transactions on Mathematical Software. ,vol. 23, pp. 550- 560 ,(1997) , 10.1145/279232.279236
Charles J Fillmore, Christopher R Johnson, Miriam RL Petruck, Background to Framenet International Journal of Lexicography. ,vol. 16, pp. 235- 250 ,(2003) , 10.1093/IJL/16.3.235