作者: Lixin Shi , Yuhang Zhao , Jie Tang
关键词: Machine learning 、 Active learning (machine learning) 、 Instance-based learning 、 Data mining 、 Batch processing 、 Bounded function 、 Exploit 、 Scale (descriptive set theory) 、 Artificial intelligence 、 Redundancy (engineering) 、 Computer science 、 Set (abstract data type)
摘要: We study a novel problem of batch mode active learning for networked data. In this problem, data instances are connected with links and their labels correlated each other, the goal is to exploit link-based dependencies node-specific content information actively select query user an accurate model label unknown in network. present three criteria (i.e., minimum redundancy, maximum uncertainty, impact) quantify informativeness set instances, formalize as selecting by maximizing objective function which combines both link information. As solving NP-hard, we efficient algorithm optimize bounded approximation rate. To scale real large networks, develop parallel implementation algorithm. Experimental results on synthetic datasets real-world demonstrate effectiveness efficiency our approach.