Training conditional random fields via gradient tree boosting

作者: Thomas G. Dietterich , Adam Ashenfelter , Yaroslav Bulatov

DOI: 10.1145/1015330.1015428

关键词:

摘要: Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible and powerful model for learning to assign labels elements of sequences in such applications as part-of-speech tagging, text-to-speech mapping, protein DNA sequence analysis, information extraction from web pages. However, existing algorithms are slow, particularly problems with large numbers potential input features. This paper describes new method training CRFs by applying Friedman's (1999) gradient tree boosting method. In boosting, the CRF functions represented weighted sums regression trees. Regression trees learned stage-wise optimizations similar Adaboost, but objective maximizing conditional likelihood P(Y|X) model. By growing trees, interactions among features introduced only needed, so although parameter space is potentially immense, search algorithm does not explicitly consider space. As result, scales linearly order Markov feature interactions, rather than exponentially like previous based on iterative scaling descent.

参考文章(16)
Structural, syntactic, and statistical pattern recognition Lecture Notes in Computer Science. ,vol. 6218, ,(2002) , 10.1007/978-3-642-14980-1
Thomas G. Dietterich, Machine Learning for Sequential Data: A Review Lecture Notes in Computer Science. pp. 15- 30 ,(2002) , 10.1007/3-540-70659-3_2
T. J. Sejnowski, Parallel networks that learn to pronounce English text Complex Systems. ,vol. 1, pp. 145- 168 ,(1987)
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
D. Geman, Random fields and inverse problems in imaging Springer, Berlin, Heidelberg. pp. 115- 193 ,(1990) , 10.1007/BFB0103042
Jerome H. Friedman, Greedy function approximation: A gradient boosting machine. Annals of Statistics. ,vol. 29, pp. 1189- 1232 ,(2001) , 10.1214/AOS/1013203451
Adwait Ratnaparkhi, A Maximum Entropy Model for Part-Of-Speech Tagging empirical methods in natural language processing. ,(1996)
Andrew McCallum, Dayne Freitag, Fernando C. N. Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation international conference on machine learning. pp. 591- 598 ,(2000)
Robert E. Schapire, Yoav Freund, Experiments with a new boosting algorithm international conference on machine learning. pp. 148- 156 ,(1996)