作者: Ola Amayri
DOI:
关键词: Statistical model 、 Feature selection 、 Cluster analysis 、 Probabilistic logic 、 Mixture model 、 Data modeling 、 Machine learning 、 Knowledge extraction 、 Data mining 、 Artificial intelligence 、 Computer science 、 Bayesian inference
摘要: In the light of interdisciplinary applications, data to be studied and analyzed have witnessed a growth in volume change their intrinsic structure type. other words, practice diversity resources generating objects imposed several challenges for decision maker determine informative terms time, model capability, scalability knowledge discovery. Thus, it is highly desirable able extract patterns interest that support management. Clustering, among machine learning approaches, an important engineering technique empowers automatic discovery similar object’s clusters the consequent assignment new unseen appropriate clusters. this context, majority current research does not completely address true nature particular application at hand. contrast most previous research, our proposed work focuses on modeling classification spherical are naturally generated many mining applications. Thus, thesis we propose estimation feature selection frameworks based Langevin distribution which devoted offline online settings. In thesis, first formulate unified probabilistic framework, where build kernels Fisher score information divergences from finite mixture Support Vector Machine. We motivated by fact blending generative discriminative approaches has prevailed exploring adopting distinct characteristic each approach toward constructing complementary system combining best both. Due high demand construct compact accurate statistical models automatically adjustable dynamic changes, next high-dimensional mixtures allow simultaneous clustering settings. To end, adopted long been heavily relied deterministic such as maximum likelihood estimation. Despite successful utilization wide spectrum areas, these drawbacks will discuss thesis. An alternative approach adoption Bayesian inference addresses uncertainty while ensuring good generalization. issue, also selection. When dynamically grow drastically, always feasible solution. In with suppose unknown number components, finally nonparametric assumes infinite number components. further enhance simultaneously detecting features process clustering. Through extensive empirical experiments, demonstrate merits diverse dimensional datasets challenging real-world