作者: Shima Ghassem Pour
DOI:
关键词:
摘要: This thesis describes an approach for clustering multivariate time series with variables taking both categorical and continuous values. Time of this type are frequent in healthcare, where they represent the health trajectories individuals. The problem is challenging because make it difficult to define a meaningful distance between trajectories. Clustering one most common useful tasks data mining, so wellstudied problem. However, sequential or longitudinal more than traditional as sequence observations should be processed rather single point analysis interesting application area epidemiology clinical research, since allows researchers observe individual patterns change capture relationship exposure outcome. typical services research uses K-means form heath states (conditions) first order Markov chain describe transitions states. procedure ignores information from temporally-adjacent prevents uncertainty parameter estimation cluster assignments being incorporated into analysis. proposed here was based on incorporation Hidden Models (HMMs), using following steps: first, map each trajectory HMM, then suitable HMMs, finally proceed HMMs method matrix. assumption made that conditions observed just manifestations true state cannot directly, remain hidden. Therefore, modelling state, hidden were modelled, well probabilities observing certain state. tested simulated, but realistic, set 1,255 individuals 45 Up set, synthetic validation consist known structure, smaller 268 extracted Health Retirement Survey. can implemented quite simply standard packages R Matlab, may good candidate solving tools do not require advanced statistical knowledge, therefore accessible wide range researchers.