作者: Nima Taghipour , Daan Fierens , Hendrik Blockeel
DOI:
关键词:
摘要: We consider the problem of discovering biclusters in gene expression data by means of machine learning. The data contains the measured expression levels of the genes of a particular organism under a number of varying conditions. The learning task given such a dataset is to find subsets of genes that are co-expressed under subsets of conditions (such a subset of genes together with the corresponding subset of conditions is called a bicluster). The problem of biclustering gene expression data has already been tackled using probabilistic model-based biclustering. So far, this approach was implemented in a special-purpose system [1], although there are a number of general-purpose probabilistic modelling systems that also appear suitable for solving this problem. A solution in a general-purpose system would have the advantage of being easily adaptable and extensible, for instance with respect to additional data sources about the considered genes [1]. The goal of this work is to investigate how well the problem of biclustering gene expression data can be solved with a number of general-purpose probabilistic modelling systems. Concretely, we consider so-called probabilistic logic learning (PLL) systems, which use elements of first-order logic for the sake of expressivity. PLL is currently a very popular approach in the artificial intelligence and machine learning community. In our work, we first made an analysis of the modelling-and learning-features required to solve the biclustering problem (such as the ability to deal with numerical data, with overlapping clusters, etc.). Next, we made an overview of which of these features are supported by …