A needle in a haystack

作者: Koby Crammer , Gal Chechik

DOI: 10.1145/1015330.1015399

关键词:

摘要: This paper addresses the problem of finding a small and coherent subset points in given data. problem, sometimes referred to as one-class or set covering, requires find small-radius ball that covers many data possible. It rises naturally wide range applications, from gene-modules extracting documents' topics, where are irrelevant task at hand, applications only positive examples available. Most previous approaches this focus on identifying discarding possible outliers. In we adopt an opposite approach which directly aims coherently structured regions, by using loss function focuses local properties We formalize learning optimization Information-Bottleneck principle. An algorithm solve is then derived analyzed. Experiments gene expression text document corpus demonstrate merits our approach.

参考文章(9)
Yair Al Censor, Stavros A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications ,(1997)
Koby Crammer, Yoram Singer, Learning Algorithms for Enclosing Points in Bregmanian Spheres Learning Theory and Kernel Machines. pp. 388- 402 ,(2003) , 10.1007/978-3-540-45167-9_29
Bernhard Schölkopf, Vladimir Vapnik, Chris Burges, Extracting support data for a given task knowledge discovery and data mining. pp. 252- 257 ,(1995)
Robert P. W. Duin, David M. J. Tax, Data domain description using support vectors. the european symposium on artificial neural networks. pp. 251- 256 ,(1999)
Asa Ben-Hur, Hava T. Siegelmann, Vladimir Vapnik, David Horn, Support vector clustering Journal of Machine Learning Research. ,vol. 2, pp. 125- 137 ,(2002) , 10.5555/944790.944807
Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, Robert C. Williamson, Estimating the Support of a High-Dimensional Distribution Neural Computation. ,vol. 13, pp. 1443- 1471 ,(2001) , 10.1162/089976601750264965
Ash A. Alizadeh, Michael B. Eisen, R. Eric Davis, Chi Ma, Izidore S. Lossos, Andreas Rosenwald, Jennifer C. Boldrick, Hajeer Sabet, Truc Tran, Xin Yu, John I. Powell, Liming Yang, Gerald E. Marti, Troy Moore, James Hudson, Lisheng Lu, David B. Lewis, Robert Tibshirani, Gavin Sherlock, Wing C. Chan, Timothy C. Greiner, Dennis D. Weisenburger, James O. Armitage, Roger Warnke, Ronald Levy, Wyndham Wilson, Michael R. Grever, John C. Byrd, David Botstein, Patrick O. Brown, Louis M. Staudt, Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling Nature. ,vol. 403, pp. 503- 511 ,(2000) , 10.1038/35000501
Naftali Tishby, Fernando C. N. Pereira, William Bialek, The information bottleneck method Proc. 37th Annual Allerton Conference on Communications, Control and Computing, 1999. pp. 368- 377 ,(2000)
F. Itakura, A statistical method for estimation of speech spectral density and formant frequencies Trans. IECE Japan, A. ,vol. 53, pp. 35- 42 ,(1970)