Mining tree-structured data on multicore systems

作者: Shirish Tatikonda , Srinivasan Parthasarathy

DOI: 10.14778/1687627.1687706

关键词:

摘要: Mining frequent subtrees in a database of rooted and labeled trees is an important problem many domains, ranging from phylogenetic analysis to biochemistry linguistic parsing XML data analysis. In this work we revisit develop architecture conscious solution targeting emerging multicore systems. Specifically identify sequence memory related optimizations that significantly improve the spatial temporal locality state-of-the-art sequential algorithm -- alleviating effects latency. Additionally, these are shown reduce pressure on front-side bus, consideration context large-scale architectures. We then demonstrate while necessary not sufficient for efficient parallelization multicores, primarily due parametric data-driven factors which make load balancing significant challenge. To address challenge, present methodology adaptively automatically modulates type granularity being shared among different cores. The resulting achieves near perfect parallel efficiency up 16 processors challenging real world applications. have general purpose utility key out-come development scheduling service moldable task

参考文章(51)
Siegfried Nijssen, Joost Kok, Efficient discovery of frequent unordered trees First international workshop on mining graphs, trees and sequences. ,(2003)
節夫 有川, 比呂志 坂本, 真治 川副, Setsuo Arikawa, 賢治 安部, 達哉 浅井, 博紀 有村, Shinji Kawasoe, Kenji Abe, Hiroshi Sakamoto, Hiroki Arimura, Tatsuya Asai, Efficient Substructure Discovery from Large Semi-structed Data DOI Technical Report. ,vol. 200, ,(2001)
Srinivasan Parthasarathy, Mitsunori Ogihara, Mohammed J Zaki, Wei Li, New algorithms for fast discovery of association rules knowledge discovery and data mining. pp. 283- 286 ,(1997)
Hiroshi Mamitsuka, Tatsuya Akutsu, Nobuhisa Ueda, Kiyoko F. Aoki, Yasushi Okuno, Minoru Kanehisa, Atsuko Yamaguchi, Efficient tree-matching methods for accurate carbohydrate database queries. Genome Informatics. ,vol. 14, pp. 134- 143 ,(2003) , 10.11234/GI1990.14.134
T. Asai, Efficient substructure discovery from large semi-structured data siam international conference on data mining. pp. 158- 174 ,(2002)
Chen Wang, Mingsheng Hong, Jian Pei, Haofeng Zhou, Wei Wang, Baile Shi, Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining Advances in Knowledge Discovery and Data Mining. pp. 441- 451 ,(2004) , 10.1007/978-3-540-24775-3_54
Pavel Zezula, Giuseppe Amato, Franca Debole, Fausto Rabitti, Tree Signatures for XML Querying and Navigation international xml database symposium. pp. 149- 163 ,(2003) , 10.1007/978-3-540-39429-7_10
James Clifford, Donald J. Berndt, Finding patterns in time series: a dynamic programming approach knowledge discovery and data mining. pp. 229- 248 ,(1996)
A. Termier, M.-C. Rousset, M. Sebag, Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases international conference on data mining. pp. 543- 546 ,(2004) , 10.1109/ICDM.2004.10078
Yun Chi, Yirong Yang, Yi Xia, Richard R. Muntz, CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees Advances in Knowledge Discovery and Data Mining. pp. 63- 73 ,(2004) , 10.1007/978-3-540-24775-3_9