Size Matters: Finding the Most Informative Set of Window Lengths

作者: Jefrey Lijffijt , Panagiotis Papapetrou , Kai Puolamäki

DOI: 10.1007/978-3-642-33486-3_29

关键词:

摘要: Event sequences often contain continuous variability at different levels. In other words, their properties and characteristics change rates, concurrently. For example, the sales of a product may slowly become more frequent over period several weeks, but there be interesting variation within week same time. To provide an accurate robust "view" such multi-level structural behavior, one needs to determine appropriate levels granularity for analyzing underlying sequence. We introduce novel problem finding best set window lengths discrete event sequences. define suitable criteria choosing propose efficient method solve problem. give examples tasks that demonstrate applicability present extensive experiments on both synthetic data real from two domains: text DNA. find optimal sets themselves can new insight into data, e.g., burstiness events affects measuring frequencies.

参考文章(33)
David A. Forsyth, Jean Ponce, Computer Vision: A Modern Approach ,(2002)
Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover, Exact Discovery of Time Series Motifs. siam international conference on data mining. ,vol. 2009, pp. 473- 484 ,(2009) , 10.1137/1.9781611972795.41
Rasika A Mathias, Peisong Gao, Janet L Goldstein, Alexander F Wilson, Elizabeth W Pugh, Paulette Furbert-Harris, Georgia M Dunston, Floyd J Malveaux, Alkis Togias, Kathleen C Barnes, Terri H Beaty, Shau-Ku Huang, A graphical assessment of p-values from sliding window haplotype tests of association to identify asthma susceptibility loci on chromosome 11q. BMC Genetics. ,vol. 7, pp. 38- 38 ,(2006) , 10.1186/1471-2156-7-38
Toon Calders, Nele Dexters, Bart Goethals, Mining frequent items in a stream using flexible windows intelligent data analysis. ,vol. 12, pp. 293- 304 ,(2008) , 10.3233/IDA-2008-12304
Jamie M Thomas, Daniel Horspool, Gordon Brown, Vasily Tcherepanov, Chris Upton, GraphDNA: a Java program for graphical display of DNA composition analyses. BMC Bioinformatics. ,vol. 8, pp. 21- 21 ,(2007) , 10.1186/1471-2105-8-21
Erik D. Demaine, Alejandro López-Ortiz, J. Ian Munro, Frequency Estimation of Internet Packet Streams with Limited Space european symposium on algorithms. pp. 348- 360 ,(2002) , 10.1007/3-540-45749-6_33
C. BOURGAIN, E. GENIN, H. QUESNEVILLE, F. CLERGET-DARPOUX, Search for multifactorial disease susceptibility genes in founder populations. Annals of Human Genetics. ,vol. 64, pp. 255- 265 ,(2000) , 10.1046/J.1469-1809.2000.6430255.X
Rui Tang, Tao Feng, Qiuying Sha, Shuanglin Zhang, A variable-sized sliding-window approach for genetic association studies via principal component analysis. Annals of Human Genetics. ,vol. 73, pp. 631- 637 ,(2009) , 10.1111/J.1469-1809.2009.00543.X
Bill Chiu, Eamonn Keogh, Stefano Lonardi, Probabilistic discovery of time series motifs knowledge discovery and data mining. pp. 493- 498 ,(2003) , 10.1145/956750.956808