作者: Fang-Xiang Wu , W. J. Zhang , Anthony J. Kusalik
关键词: Similarity measure 、 DNA microarray 、 Feature (machine learning) 、 Sample size determination 、 Computer science 、 Cluster analysis 、 Data mining 、 Hierarchical clustering 、 Set (abstract data type) 、 Pattern recognition 、 Artificial intelligence 、 Function (mathematics)
摘要: DNA microarray technologies allow for the simultaneous monitoring of thousands genes, which reveal important information about cellular and tissue expression phenotypes. From a viewpoint data analysis, experiments may be classified into (1) classification patients or non-patients more subtypes in terms gene expressions, (2) discovery patterns over set different conditions, (3) one same series time points while underlying biological process evolves. This article concerns class problems. An feature with this problems is dependency among corresponding to points. One issues here specification points, including number span between In absence knowledge from biologist specification, naturally turns quest whether behaviour resulting progressively generated help by itself determine "cut off" line, beyond further micorarray do not contribute pattern discovery. Additionally, such cut-off line implies minimum sample size, because these are rather costly reagents required. We have developed method determination size (or points) temporal expression, assuming that given hierarchical clustering technique used Our basic idea was develop similarity measure two clusterings expressed as function progressively. While experiment going on, evaluated see it reaches "saturated" state where discrimination any more. The has been verified previously published datasets; specifically both experiments, determined our less than experiments. Although at present employed technique, overall applicable other techniques