Inducing features of random fields

作者: S. Della Pietra , V. Della Pietra , J. Lafferty

DOI: 10.1109/34.588021

关键词: Principle of maximum entropyDecision treeGreedy algorithmGeneralized iterative scalingRandom fieldFeature extractionCluster analysisKullback–Leibler divergenceTraining setStochastic processExpectation–maximization algorithmEmpirical distribution functionIterative methodTheoretical computer scienceComputer science

摘要: We present a technique for constructing random fields from set of training samples. The learning paradigm builds increasingly complex by allowing potential functions, or features, that are supported large subgraphs. Each feature has weight is trained minimizing the Kullback-Leibler divergence between model and empirical distribution data. A greedy algorithm determines how features incrementally added to field an iterative scaling used estimate optimal values weights. models techniques introduced in this paper differ those common much computer vision literature underlying non-Markovian have number parameters must be estimated. Relations other approaches, including decision trees, given. As demonstration method, we describe its application problem automatic word classification natural language processing.

参考文章(20)
Charles J. Geyer, Elizabeth A. Thompson, Constrained Monte Carlo Maximum Likelihood for Dependent Data Journal of the royal statistical society series b-methodological. ,vol. 54, pp. 657- 683 ,(1992) , 10.1111/J.2517-6161.1992.TB01443.X
Laurent Younes, Estimation and annealing for Gibbsian fields Annales De L Institut Henri Poincare-probabilites Et Statistiques. ,vol. 24, pp. 269- 294 ,(1988)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Imre Csiszar, A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling Annals of Statistics. ,vol. 17, pp. 1409- 1413 ,(1989) , 10.1214/AOS/1176347279
Bernard Chalmond, An iterative Gibbsian technique for reconstruction of m -ary images Pattern Recognition. ,vol. 22, pp. 747- 762 ,(1989) , 10.1016/0031-3203(89)90011-3
J. N. Darroch, D. Ratcliff, Generalized Iterative Scaling for Log-Linear Models Annals of Mathematical Statistics. ,vol. 43, pp. 1470- 1480 ,(1972) , 10.1214/AOMS/1177692379
Stuart Geman, Donald Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-6, pp. 721- 741 ,(1984) , 10.1109/TPAMI.1984.4767596
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, A statistical approach to sense disambiguation in machine translation human language technology. pp. 146- 151 ,(1991) , 10.3115/112405.112427
Arnoldo Frigessi, Chii-Ruey Hwang, Laurent Younes, Optimal Spectral Structure of Reversible Stochastic Matrices, Monte Carlo Methods and the Simulation of Markov Random Fields Annals of Applied Probability. ,vol. 2, pp. 610- 628 ,(1992) , 10.1214/AOAP/1177005652
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X