Fast network discovery on sequence data via time-aware hashing

作者: Tara Safavi , Chandra Sripada , Danai Koutra

DOI: 10.1007/S10115-018-1293-8

关键词:

摘要: Discovering and analyzing networks from non-network data is a task with applications in fields as diverse neuroscience, genomics, climate science, economics, more. In domains where are discovered on multiple time series, the most common approach to compute measures of association or similarity between all pairs series. The nodes resultant network correspond which linked by edges weighted according scores their endpoints. Finally, fully connected thresholded such that only stronger weights remain desired sparsity level achieved. While this feasible for small datasets, its quadratic (or higher) complexity does not scale individual series length number compared increase. Thus, circumvent inefficient wasteful intermediary step building graph before sparsification, we propose fast discovery based probabilistic hashing. Our methods emphasize consecutiveness, intuition following similar fluctuations longer time-consecutive intervals more overall. Evaluation real shows our method can build graphs nearly 15 times faster than baselines (when do run out memory), while achieving accuracy comparable to, better than, task-based evaluation. Furthermore, proposals general, modular, may be applied variety sequence search tasks.

参考文章(43)
Karl J. Friston, Functional and effective connectivity: a review. Brain connectivity. ,vol. 1, pp. 13- 36 ,(2011) , 10.1089/BRAIN.2011.0008
David C Kale, Dian Gong, Zhengping Che, Yan Liu, Gerard Medioni, Randall Wetzel, Patrick Ross, None, An Examination of Multivariate Time Series Hashing with Applications to Health Care 2014 IEEE International Conference on Data Mining. pp. 260- 269 ,(2014) , 10.1109/ICDM.2014.153
Theodore D. Satterthwaite, Mark A. Elliott, Kosha Ruparel, James Loughead, Karthik Prabhakaran, Monica E. Calkins, Ryan Hopson, Chad Jackson, Jack Keefe, Marisa Riley, Frank D. Mentch, Patrick Sleiman, Ragini Verma, Christos Davatzikos, Hakon Hakonarson, Ruben C. Gur, Raquel E. Gur, Neuroimaging of the Philadelphia neurodevelopmental cohort. NeuroImage. ,vol. 86, pp. 544- 553 ,(2014) , 10.1016/J.NEUROIMAGE.2013.07.064
Kimmo Kaski, Janos Kertész, Janos Kertész, Jukka-Pekka Onnela, Clustering and information in correlation based financial networks European Physical Journal B. ,vol. 38, pp. 353- 362 ,(2004) , 10.1140/EPJB/E2004-00128-7
Sen Yang, Qian Sun, Shuiwang Ji, Peter Wonka, Ian Davidson, Jieping Ye, Structural Graphical Lasso for Learning Mouse Brain Connectivity knowledge discovery and data mining. pp. 1385- 1394 ,(2015) , 10.1145/2783258.2783391
Leman Akoglu, Hanghang Tong, Danai Koutra, Graph based anomaly detection and description: a survey Data Mining and Knowledge Discovery. ,vol. 29, pp. 626- 688 ,(2015) , 10.1007/S10618-014-0365-Y
Roberto J. Bayardo, Yiming Ma, Ramakrishnan Srikant, Scaling up all pairs similarity search the web conference. pp. 131- 140 ,(2007) , 10.1145/1242572.1242591
Pierre Vandergheynst, Pascal Frossard, Sunil K. Narang, Antonio Ortega, David I Shuman, The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains IEEE Signal Processing Magazine. ,vol. 30, pp. 83- 98 ,(2013) , 10.1109/MSP.2012.2235192
Wei Dong, Charikar Moses, Kai Li, Efficient k-nearest neighbor graph construction for generic similarity measures Proceedings of the 20th international conference on World wide web - WWW '11. pp. 577- 586 ,(2011) , 10.1145/1963405.1963487
S. Chaudhuri, V. Ganti, R. Kaushik, A Primitive Operator for Similarity Joins in Data Cleaning international conference on data engineering. pp. 5- 5 ,(2006) , 10.1109/ICDE.2006.9