作者: Vincent Yan Fu Tan
DOI:
关键词:
摘要: The design and analysis of complexity-reduced representations for multivariate data is important in many scientific engineering domains. This thesis explores such representations from two different perspectives: deriving and analyzing performance measures learning tree-structured graphical models salient feature subset selection for discrimination. Graphical have proven to be a flexible class of probabilistic approximating high-dimensional data. Learning the structure such an important generic task. It known that if are drawn from tree-structured distributions, then algorithm Chow Liu (1968) provides efficient finding tree that maximizes likelihood data. We leverage this algorithm and theory large deviations derive error exponent of structure discrete Gaussian graphical models. We determine extremal structures learning, is, the structures lead highest lowest exponents. prove that star minimizes chain maximizes the exponent, which means among all unlabeled trees, and the worst best respectively. The analysis also extended foreststructured graphical models by augmenting Chow-Liu with thresholding procedure. prove scaling laws on number samples the number variables remain consistent in high-dimensions. next part concerned with discrimination. computationally tree-based algorithms learn pairs distributions specifically adapted task discrimination show they perform well various datasets vis-`a-vis existing tree-based algorithms. We define notion set using information-theoretic quantities so can recovered asymptotically.