作者: Mark Craven , Adam Allen Smith
DOI:
关键词:
摘要: We present methods for comparing and performing similarity queries gene-expression time-series data. Such data is usually gathered via microarrays or related technologies. In the studies with which we work, are used to compare gene activity of mice after exposure different treatments, specific genes knocked out. This lets us effects treatments knockout at a molecular level. The tends be sparse in time, but it represents measurements thousands tens separate genes, each constitutes dimension. also subject technical noise biological variability. Our approach involves three key steps. first step reconstruct continuous time series from discrete observations. use B-splines accomplish this. Unlike previous methods, relax fit splines so that they less prone overfitting place points discontinuity spline such way well-defined over whole length series. The second align pairs order find time-by-time correspondence maximizes between them. two segment-based algorithms specially designed develop heuristics speed up alignment computations, without adversely affecting quality alignments found. Finally, an computing clustered alignments, split into small number clusters, aligned independently. final score found, based on allows conduct searches, query unknown character associated other have been well-studied. One our high-level goals create BLAST-like tool, will allow biologists enter their own studies, return affect expression similar ways.