作者: Hitoshi Koyano , Morihiro Hayashida , Tatsuya Akutsu
DOI: 10.1016/J.JCSS.2019.07.003
关键词:
摘要: Abstract In this study, we address the problem of clustering string data in an unsupervised manner by developing a theory mixture model and EM algorithm for strings based on probability topological monoid developed our previous studies. We begin with introducing parametric distribution set strings, which has location dispersion parameters positive real number. develop iteration estimating distributions introduced demonstrate that converges to algorithm, cannot be explicitly written model, one strongly consistently estimates its as numbers observed iterations increase. finally derive procedure is asymptotically optimal sense posterior making correct classifications maximized.