作者: Edward Herskovits
DOI:
关键词:
摘要: Faced with increasing amounts of data that they cannot analyze manually, biomedical researchers have turned increasingly to computational methods for exploring large databases. In particular, might benefit from a nonparametric, efficient, computer-based method determining the important associations among variables in domain, particularly when human expertise is not readily available. this dissertation, I demonstrate such algorithms are conceptually feasible, robust noise, computationally theoretically sound, and generate models can classify new cases accurately. I first describe two take as input database optional user-supplied prior knowledge, probabilistic network--in belief network--as output. The may incomplete data, contain noise. resulting network be used determine poorly understood or classifier were learning. After describing algorithms, present simple examples how these programs database. then results evaluating on databases several domains, including gynecologic pathology, lymph-node DNA-sequence analysis, poisonous-mushroom classification. most cases, networks test high accuracy. In addition discussing empirical results, an overview proofs based metrics will, number increases without limit, always prefer those more closely approximate true underlying distribution database; is, asymptotically correct. I conclude discussion work's contributions, list open research problems.