Learning relaxed 3-clusters from pairs of related datasets

作者: Jagadeesh Patchala , Raj Bhatnagar

DOI: 10.1109/BIGDATA.2015.7363916

关键词: Big dataBiclusteringCluster analysisDimension (graph theory)Algorithm designComputer scienceData miningDomain (software engineering)

摘要: In many emerging data mining situations we encounter multiple large binary relational datasets that are generated independently but semantically interconnected and must be mined simultaneously to obtain an integrated effect of the residing in all them. The idea finding 3-clusters is increasingly used where one has concurrently mine two distinct share a common domain along dimension. By discovering 3-clusters, can important insights on underlying connections between objects different domains. All 3-clustering algorithms for presented till now able find bi-clusters 3-cluster strict, is, rectangle formed by contains only ‘1’ entries. However, real world applications very sparse relaxed bi-cluster, allows some zeros bi-clusters' rectangles, valuable. this paper, present novel search based algorithm finds from domain. Each identified involves whose overlap sets maximal. Through our algorithm, also exert finer control over percentage 1 s allowed each bi-clusters. We validate effectiveness using synthetic Our results show notion produce more meaningful when compared with strict requirement ones.

参考文章(9)
Oliver Voggenreiter, Stefan Bleuler, Wilhelm Gruissem, Exact biclustering algorithm for the analysis of large gene expression data sets BMC Bioinformatics. ,vol. 13, pp. 1- 2 ,(2012) , 10.1186/1471-2105-13-S18-A10
Miranda van Uitert, Wouter Meuleman, Lodewyk Wessels, Biclustering Sparse Binary Genomic Data Journal of Computational Biology. ,vol. 15, pp. 1329- 1345 ,(2008) , 10.1089/CMB.2008.0066
Hung-Chia Chen, Wen Zou, Yin-Jing Tien, James J. Chen, Identification of Bicluster Regions in a Binary Matrix and Its Applications PLoS ONE. ,vol. 8, pp. e71680- ,(2013) , 10.1371/JOURNAL.PONE.0071680
Faris Alqadah, Raj Bhatnagar, An effective algorithm for mining 3-clusters in vertically partitioned data Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08. pp. 1103- 1112 ,(2008) , 10.1145/1458082.1458228
Dmitry I. Ignatov, Dmitry V. Gnatyshak, Sergei O. Kuznetsov, Boris G. Mirkin, Triadic Formal Concept Analysis and triclustering: searching for optimal patterns Machine Learning. ,vol. 101, pp. 271- 302 ,(2015) , 10.1007/S10994-015-5487-Y
Amela Prelić, Stefan Bleuler, Philip Zimmermann, Anja Wille, Peter Bühlmann, Wilhelm Gruissem, Lars Hennig, Lothar Thiele, Eckart Zitzler, A systematic comparison and evaluation of biclustering methods for gene expression data Bioinformatics. ,vol. 22, pp. 1122- 1129 ,(2006) , 10.1093/BIOINFORMATICS/BTL060
Domingo S. Rodriguez-Baena, Antonio J. Perez-Pulido, Jesus S. Aguilar−Ruiz, A biclustering algorithm for extracting bit-patterns from binary datasets Bioinformatics. ,vol. 27, pp. 2738- 2745 ,(2011) , 10.1093/BIOINFORMATICS/BTR464
Derek Greene, Pádraig Cunningham, Practical solutions to the problem of diagonal dominance in kernel document clustering Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 377- 384 ,(2006) , 10.1145/1143844.1143892
Kevin Bache, Moshe Lichman, UCI Machine Learning Repository University of California, School of Information and Computer Science. ,(2007)