Molecular feature mining in HIV data

作者: Stefan Kramer , Luc De Raedt , Christoph Helma

DOI: 10.1145/502512.502533

关键词:

摘要: We present the application of Feature Mining techniques to Developmental Therapeutics Program's AIDS antiviral screen database. The database consists 43576 compounds, which were measured for their capability protect human cells from HIV-1 infection. According these measurements, compounds classified as either active, moderately active or inactive. distribution classes is extremely skewed: Only 1.3 % molecules known be and 2.7 active.Given this database, we interested in molecular substructures (i.e., features) that are frequent molecules, infrequent inactives. In data mining terms, focused on features with a minimum support maximum inactive compounds. analyzed using levelwise version space algorithm forms basis inductive query system MOLFEA (Molecular Miner). Within framework, it possible declaratively specify interest, such frequency (possibly different) datasets well generality syntax them. Assuming detected causally related biochemical mechanisms, should facilitate development new pharmaceuticals improved activities.

参考文章(21)
Luc De Raedt, Stefan Kramer, The levelwise version space algorithm and its application to molecular fragment finding international joint conference on artificial intelligence. pp. 853- 859 ,(2001)
Luc Dehaspe, Hannu Toivonen, Ross Donald King, Finding frequent substructures in chemical compounds knowledge discovery and data mining. pp. 30- 36 ,(1998)
Luc De Raedt, A Logical Database Mining Query Language inductive logic programming. ,vol. 1866, pp. 78- 92 ,(2000) , 10.1007/3-540-44960-4_5
David D. Jensen, Paul R. Cohen, Multiple Comparisons in Induction Algorithms Machine Learning. ,vol. 38, pp. 309- 338 ,(2000) , 10.1023/A:1007631014630
Luc De Raedt, Stefan Kramer, Feature Construction with Version Spaces for Biochemical Applications international conference on machine learning. pp. 258- 265 ,(2001)
Rosa Meo, Giuseppe Psaila, Stefano Ceri, An Extension to SQL for Mining Association Rules Data Mining and Knowledge Discovery. ,vol. 2, pp. 195- 224 ,(1998) , 10.1023/A:1009774406717
Heikki Mannila, Sanjeev Saluja, Dimitrios Gunopulos, Discovering All Most Specific Sentences by Randomized Algorithms international conference on database theory. pp. 215- 229 ,(1997)
Ashwin Srinivasan, Douglas W. Bristol, Ross D. King, An assessment of submissions made to the predictive toxicology evaluation challenge international joint conference on artificial intelligence. pp. 270- 275 ,(1999)
Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda, An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data european conference on principles of data mining and knowledge discovery. pp. 13- 23 ,(2000) , 10.1007/3-540-45372-5_2