Constrained query of order-preserving submatrix in gene expression data

作者: Tao Jiang , Zhanhuai Li , Xuequn Shang , Bolin Chen , Weibang Li

DOI: 10.1007/S11704-016-5487-5

关键词:

摘要: Order-preserving submatrix (OPSM) has become important in modelling biologically meaningful subspace cluster, capturing the general tendency of gene expressions across a subset conditions. With advance microarray and analysis techniques, big volume expression datasets OPSM mining results are produced. query can efficiently retrieve relevant OPSMs from huge amount datasets. However, improving relevancy remains difficult task real life exploratory data processing. First, it is hard to capture subjective interestingness aspects, e.g., analyst's expectation given her/his domain knowledge. Second, when these expectations be declaratively specified, still challenging use them during computational process queries. best our knowledge, existing methods mainly focus on batch mining, while few works involve query. To solve above problems, paper proposes two constrained methods, which exploit userdefined constraints search kinds indices introduced. In this paper, extensive experiments conducted datasets, experiment demonstrate that multi-dimension index (cIndex) enumerating sequence (esIndex) based queries have better performance than brute force search.

参考文章(27)
Tao Jiang, Zhanhuai Li, Qun Chen, Zhong Wang, Wei Pan, Zhuo Wang, Parallel Partitioning and Mining Gene Expression Data with Butterfly Network Lecture Notes in Computer Science. pp. 129- 144 ,(2013) , 10.1007/978-3-642-40285-2_13
Ruggero G. Pensa, Jean-François Boulicaut, Constrained Co-clustering of Gene Expression Data siam international conference on data mining. pp. 25- 36 ,(2008)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Tao Jiang, Zhanhuai Li, Qun Chen, Zhong Wang, Kaiwen Li, Wei Pan, OMEGA: an order-preserving SubMatrix mining, indexing and search tool european conference on machine learning. pp. 303- 307 ,(2015) , 10.1007/978-3-319-23461-8_35
Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by pattern similarity in large data sets Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 394- 405 ,(2002) , 10.1145/564691.564737
Haixun Wang, Jian Pei, Philip S Yu, None, Pattern-based similarity search for microarray data knowledge discovery and data mining. pp. 814- 819 ,(2005) , 10.1145/1081870.1081978
Byron J. Gao, Obi L. Griffith, Martin Ester, Hui Xiong, Qiang Zhao, Steven J.M. Jones, On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach IEEE Transactions on Knowledge and Data Engineering. ,vol. 24, pp. 309- 325 ,(2012) , 10.1109/TKDE.2010.244
Amir Ben-Dor, Benny Chor, Richard Karp, Zohar Yakhini, Discovering local structure in gene expression data: the order-preserving submatrix problem. Journal of Computational Biology. ,vol. 10, pp. 373- 384 ,(2003) , 10.1089/10665270360688075
Thomas Dhollander, Qizheng Sheng, Karen Lemmens, Bart De Moor, Kathleen Marchal, Yves Moreau, Query-driven module discovery in microarray data Bioinformatics. ,vol. 23, pp. 2573- 2580 ,(2007) , 10.1093/BIOINFORMATICS/BTM387