作者: Kai Yu , Xiaowei Xu , Anton Schwaighofer , Volker Tresp , Hans-Peter Kriegel
关键词: Instance selection 、 Set (abstract data type) 、 Computer science 、 Scalability 、 Data set 、 Data mining 、 Machine learning 、 Redundancy (engineering) 、 Artificial intelligence 、 Collaborative filtering 、 Range (mathematics)
摘要: The application range of memory-based collaborative filtering (CF) is limited due to CF's high memory consumption and long runtime. approach presented in this paper removes redundant inconsistent instances (users) from the data. This aims distinguish informative large raw user preference database thus alleviate runtime cost widely used algorithm. Our work shows that a satisfactory accuracy can be achieved by using only small portion original data set, thereby alleviating storage CF In our approach, we consider instance selection as problem selecting increase We begin discussing general sense posteriori probability optimal model evaluate empirical performance PF on two real-world sets attain very promisingpositive experimental results. dData size prediction time are significantly reduced, while par with almost same results complete database.