Clustering Longitudinal Health Data Using Hidden Markov Models

作者: Shima Ghassem Pour

DOI:

关键词:

摘要: This thesis describes an approach for clustering multivariate time series with variables taking both categorical and continuous values. Time of this type are frequent in healthcare, where they represent the health trajectories individuals. The problem is challenging because make it difficult to define a meaningful distance between trajectories. Clustering one most common useful tasks data mining, so wellstudied problem. However, sequential or longitudinal more than traditional as sequence observations should be processed rather single point analysis interesting application area epidemiology clinical research, since allows researchers observe individual patterns change capture relationship exposure outcome. typical services research uses K-means form heath states (conditions) first order Markov chain describe transitions states. procedure ignores information from temporally-adjacent prevents uncertainty parameter estimation cluster assignments being incorporated into analysis. proposed here was based on incorporation Hidden Models (HMMs), using following steps: first, map each trajectory HMM, then suitable HMMs, finally proceed HMMs method matrix. assumption made that conditions observed just manifestations true state cannot directly, remain hidden. Therefore, modelling state, hidden were modelled, well probabilities observing certain state. tested simulated, but realistic, set 1,255 individuals 45 Up set, synthetic validation consist known structure, smaller 268 extracted Health Retirement Survey. can implemented quite simply standard packages R Matlab, may good candidate solving tools do not require advanced statistical knowledge, therefore accessible wide range researchers.

参考文章(210)
Oded Maimon, Lior Rokach, Introduction to Knowledge Discovery in Databases Data Mining and Knowledge Discovery Handbook. pp. 1- 17 ,(2005) , 10.1007/0-387-25465-X_1
R. Dubes, N. Wyse, A. K. Jain, A critical evaluation of intrinsic dimensionality algorithms. North-Holland. ,(1980)
Oded Maimon, Lior Rokach, Introduction to Knowledge Discovery and Data Mining Data Mining and Knowledge Discovery Handbook. pp. 1- 15 ,(2009) , 10.1007/978-0-387-09823-4_1
Jeroen K. Vermunt, Jay Magidson, Latent Class Cluster Analysis Applied Latent Class Analysis. pp. 89- 106 ,(2002) , 10.1017/CBO9780511499531.004
Anita Stewart, Caren Kamberg, Physical Functioning Measures Duke University Press. ,(1992)
Peter Rousseeuw, L Kaufman, Clustering by means of medoids Proc. Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, 1987. pp. 405- 416 ,(1987)
Dimitrios Gunopulos, Jessica Lin, Michail Vlachos, Eamonn Keogh, A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series ,(2003)
Charles Elkan, The foundations of cost-sensitive learning international joint conference on artificial intelligence. pp. 973- 978 ,(2001)
Herbert Reininger, Dietrich Wolf, Markus Falkhausen, Calculation of distance measures between hidden Markov models. conference of the international speech communication association. ,(1995)