Batch Mode Active Learning for Networked Data

作者: Lixin Shi , Yuhang Zhao , Jie Tang

DOI: 10.1145/2089094.2089109

关键词: Machine learningActive learning (machine learning)Instance-based learningData miningBatch processingBounded functionExploitScale (descriptive set theory)Artificial intelligenceRedundancy (engineering)Computer scienceSet (abstract data type)

摘要: We study a novel problem of batch mode active learning for networked data. In this problem, data instances are connected with links and their labels correlated each other, the goal is to exploit link-based dependencies node-specific content information actively select query user an accurate model label unknown in network. present three criteria (i.e., minimum redundancy, maximum uncertainty, impact) quantify informativeness set instances, formalize as selecting by maximizing objective function which combines both link information. As solving NP-hard, we efficient algorithm optimize bounded approximation rate. To scale real large networks, develop parallel implementation algorithm. Experimental results on synthetic datasets real-world demonstrate effectiveness efficiency our approach.

参考文章(62)
Lise Getoor, Eran Segal, Daphne Koller, Ben Taskar, Probabilistic Models of Text and Link Structure for Hypertext Classification ,(2001)
Nicholas Roy, Andrew McCallum, Toward Optimal Active Learning through Sampling Estimation of Error Reduction international conference on machine learning. pp. 441- 448 ,(2001)
Sofus A Macskassy, None, Improving learning in networked data by combining explicit and mined links national conference on artificial intelligence. pp. 590- 595 ,(2007)
John Lafferty, Xiaojin Zhu, Ronald Rosenfeld, Semi-supervised learning with graphs Carnegie Mellon University. ,(2005)
Seán Slattery, Mark Craven, Combining Statistical and Relational Methods for Learning in Hypertext Domains inductive logic programming. pp. 38- 52 ,(1998) , 10.1007/BFB0027309
William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI: Portable Parallel Programming with the Message-Passing Interface ,(1994)
Yonghong Li, A.K. Jain, Classification of text documents international conference on pattern recognition. ,vol. 2, pp. 1295- 1297 ,(1998) , 10.1109/ICPR.1998.711938
Julian Besag, On the statistical analysis of dirty pictures Journal of the royal statistical society series b-methodological. ,vol. 48, pp. 259- 279 ,(1986) , 10.1111/J.2517-6161.1986.TB01412.X
Andreas Heß, Nicholas Kushmerick, Iterative ensemble classification for relational data: a case study of semantic web services european conference on machine learning. pp. 156- 167 ,(2004) , 10.1007/978-3-540-30115-8_17