On Determination of Minimum Sample Size for Discovery of Temporal Gene Expression Patterns

作者: Fang-Xiang Wu , W. J. Zhang , Anthony J. Kusalik

DOI: 10.1109/IMSCCS.2006.95

关键词: Similarity measureDNA microarrayFeature (machine learning)Sample size determinationComputer scienceCluster analysisData miningHierarchical clusteringSet (abstract data type)Pattern recognitionArtificial intelligenceFunction (mathematics)

摘要: DNA microarray technologies allow for the simultaneous monitoring of thousands genes, which reveal important information about cellular and tissue expression phenotypes. From a viewpoint data analysis, experiments may be classified into (1) classification patients or non-patients more subtypes in terms gene expressions, (2) discovery patterns over set different conditions, (3) one same series time points while underlying biological process evolves. This article concerns class problems. An feature with this problems is dependency among corresponding to points. One issues here specification points, including number span between In absence knowledge from biologist specification, naturally turns quest whether behaviour resulting progressively generated help by itself determine "cut off" line, beyond further micorarray do not contribute pattern discovery. Additionally, such cut-off line implies minimum sample size, because these are rather costly reagents required. We have developed method determination size (or points) temporal expression, assuming that given hierarchical clustering technique used Our basic idea was develop similarity measure two clusterings expressed as function progressively. While experiment going on, evaluated see it reaches "saturated" state where discrimination any more. The has been verified previously published datasets; specifically both experiments, determined our less than experiments. Although at present employed technique, overall applicable other techniques

参考文章(24)
William Lee Hays, Statistics for the social sciences ,(1973)
Michael B. Eisen, Patrick O. Brown, DNA arrays for analysis of gene expression. Methods in Enzymology. ,vol. 303, pp. 179- 205 ,(1999) , 10.1016/S0076-6879(99)03014-1
Vladimir Filkov, Steven Skiena, Jizu Zhi, Analysis techniques for microarray time-series data. Journal of Computational Biology. ,vol. 9, pp. 317- 330 ,(2002) , 10.1089/10665270252935485
Patrick O. Brown, David Botstein, Exploring the new world of the genome with DNA microarrays Nature Genetics. ,vol. 21, pp. 33- 37 ,(1999) , 10.1038/4462
M. Schena, D. Shalon, R. W. Davis, P. O. Brown, Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray Science. ,vol. 270, pp. 467- 470 ,(1995) , 10.1126/SCIENCE.270.5235.467
Frank B. Baker, Stability of Two Hierarchical Grouping Techniques Case I: Sensitivity to Data Errors Journal of the American Statistical Association. ,vol. 69, pp. 440- 445 ,(1974) , 10.1080/01621459.1974.10482971
E. B. Fowlkes, C. L. Mallows, A Method for Comparing Two Hierarchical Clusterings Journal of the American Statistical Association. ,vol. 78, pp. 553- 569 ,(1983) , 10.1080/01621459.1983.10478008
Robert R. Sokal, F. James Rohlf, THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS TAXON. ,vol. 11, pp. 33- 40 ,(1962) , 10.2307/1217208
Mei-Ling Ting Lee, Frank C Kuo, GA Whitmore, Jeffrey Sklar, Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations Proceedings of the National Academy of Sciences of the United States of America. ,vol. 97, pp. 9834- 9839 ,(2000) , 10.1073/PNAS.97.18.9834
Edward R. Dougherty, Junior Barrera, Marcel Brun, Seungchan Kim, Roberto M. Cesar, Yidong Chen, Michael Bittner, Jeffrey M. Trent, Inference from clustering with application to gene-expression microarrays. Journal of Computational Biology. ,vol. 9, pp. 105- 126 ,(2002) , 10.1089/10665270252833217