Similarity measures for categorical data: A comparative evaluation

作者： Shyam Boriah , Varun Chandola , Vipin Kumar

DOI: 10.1137/1.9781611972788.22

关键词:

摘要: Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of continuous relatively well-understood, but categorical data, the computation not straightforward. Several data-driven measures have been proposed in literature to compute instances their relative performance has evaluated. In this paper we study variety context specific task: outlier detection. Results on sets show that while no one measure dominates others all types problems, some are able consistently high performance.

参考文章(33)

Christos Faloutsos, Christopher R. Palmer, Electricity based external similarity of categorical attributes knowledge discovery and data mining. pp. 486- 500 ,(2003) , 10.5555/1760894.1760959

Yoram Biberman, A context similarity measure european conference on machine learning. pp. 49- 63 ,(1994) , 10.1007/3-540-57868-4_50

Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy, Sal Stolfo, A Geometric Framework for Unsupervised Anomaly Detection Applications of Data Mining in Computer Security. pp. 77- 101 ,(2002) , 10.1007/978-1-4615-0953-0_4

Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)

Keki B. Irani, Usama M. Fayyad, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning international joint conference on artificial intelligence. ,vol. 2, pp. 1022- 1027 ,(1993)

Gautam Das, Heikki Mannila, Context-Based Similarity Measures for Categorical Databases european conference on principles of data mining and knowledge discovery. pp. 201- 210 ,(2000) , 10.1007/3-540-45372-5_20

Dekang Lin, An Information-Theoretic Definition of Similarity international conference on machine learning. pp. 296- 304 ,(1998)

David W. Goodall, A New Similarity Index Based on Probability Biometrics. ,vol. 22, pp. 882- ,(1966) , 10.2307/2528080

T. P. Burnaby, On a method for character weighting a similarity coefficient, employing the concept of information Mathematical Geosciences. ,vol. 2, pp. 25- 38 ,(1970) , 10.1007/BF02332078

10.

Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)

Similarity measures for categorical data: A comparative evaluation

来源期刊

我的账户

Similarity measures for categorical data: A comparative evaluation

来源期刊

相似文章 10

我的账户