作者: Rosemary M McCloskey , Art FY Poon
DOI: 10.1101/165357
关键词: Data mining 、 Cluster analysis 、 Infectious disease transmission 、 Model based clustering 、 Biology 、 Outbreak 、 Nonparametric statistics 、 Infectious disease (medical specialty) 、 CURE data clustering algorithm 、 Sequence variation
摘要: Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected clinical management many diseases. A diverse number nonparametric clustering methods have been developed this purpose. These generally intuitive, rapid to compute, and readily scale with large data sets. However, we found that can be biased towards clusters diagnosis --- where individuals sampled sooner post-infection rather than the transmission meant foci public health efforts. We develop fundamentally new approach based on fitting Markov-modulated Poisson process (MMPP), which represents evolution rates along tree relating different infections. evaluated model-based method alongside five using both simulated actual HIV sequence For transmission, MMPP obtained higher mean sensitivity (85%) specificity (91%) methods. When applied these published HIV-1 from study cohort men who sex Seattle, USA, categorized about half (46%) as compared other methods, were more consistent outbreaks. This has significant implications application pathogen analysis health, it critical robustly accurately identify most cost-effective deployment prevention services resources.