Classification and alignment of gene-expression time-series data

作者: Mark Craven , Adam Allen Smith

DOI:

关键词:

摘要: We present methods for comparing and performing similarity queries gene-expression time-series data. Such data is usually gathered via microarrays or related technologies. In the studies with which we work, are used to compare gene activity of mice after exposure different treatments, specific genes knocked out. This lets us effects treatments knockout at a molecular level. The tends be sparse in time, but it represents measurements thousands tens separate genes, each constitutes dimension. also subject technical noise biological variability. Our approach involves three key steps. first step reconstruct continuous time series from discrete observations. use B-splines accomplish this. Unlike previous methods, relax fit splines so that they less prone overfitting place points discontinuity spline such way well-defined over whole length series. The second align pairs order find time-by-time correspondence maximizes between them. two segment-based algorithms specially designed develop heuristics speed up alignment computations, without adversely affecting quality alignments found. Finally, an computing clustered alignments, split into small number clusters, aligned independently. final score found, based on allows conduct searches, query unknown character associated other have been well-studied. One our high-level goals create BLAST-like tool, will allow biologists enter their own studies, return affect expression similar ways.

参考文章(65)
Matthew D. Schmill, Paul R. Cohen, Tim Oates, Learned models for continuous planning. international conference on artificial intelligence and statistics. ,(1999)
Chotirat (Ann) Ratanamahatana, Eamonn J. Keogh, Three Myths about Dynamic Time Warping Data Mining. siam international conference on data mining. pp. 506- 510 ,(2005)
Waiyawuth Euachongprasit, Chotirat Ann Ratanamahatana, Accurate and efficient retrieval of multimedia time series data under uniform scaling and time warping knowledge discovery and data mining. pp. 100- 111 ,(2008) , 10.1007/978-3-540-68125-0_11
Howard Y Chang, James A Thomson, Xin Chen, None, Microarray Analysis of Stem Cells and Differentiation Methods in Enzymology. ,vol. 420, pp. 225- 254 ,(2006) , 10.1016/S0076-6879(06)20010-7
Eamonn Keogh, Efficiently Finding Arbitrarily Scaled Patterns in Massive Time Series Databases european conference on principles of data mining and knowledge discovery. pp. 253- 265 ,(2003) , 10.1007/978-3-540-39804-2_24
Mohammed Waleed Kadous, Learning Comprehensible Descriptions of Multivariate Time Series international conference on machine learning. pp. 454- 463 ,(1999)
Virginie M Aris, Michael J Cody, Jeff Cheng, James J Dermody, Patricia Soteropoulos, Michael Recce, Peter P Tolias, Noise filtering and nonparametric analysis of microarray data underscores discriminating markers of oral, prostate, lung, ovarian and breast cancer. BMC Bioinformatics. ,vol. 5, pp. 185- 185 ,(2004) , 10.1186/1471-2105-5-185
Dimitrios Gunopulos, Gautam Das, Heikki Mannila, Béla Bollobás, Time-series similarity problems and well-separated geometric sets Nordic Journal of Computing. ,vol. 8, pp. 409- 423 ,(2001)
Victor Eruhimov, Vladimir Martyanov, Eugene Tuv, Constructing high dimensional feature space for time series classification european conference on principles of data mining and knowledge discovery. pp. 414- 421 ,(2007) , 10.1007/978-3-540-74976-9_41
Martino Barenco, Jaroslav Stark, Daniel Brewer, Daniela Tomescu, Robin Callard, Michael Hubank, Correction of scaling mismatches in oligonucleotide microarray data BMC Bioinformatics. ,vol. 7, pp. 251- 251 ,(2006) , 10.1186/1471-2105-7-251