Neighbourhood Counting Metric for Sequences

作者: Chang Liu , Hui Wang

DOI:

关键词: AlgorithmMathematicsLongest common subsequence problemSimilarity measureProbabilistic logicNeighbourhood (mathematics)Bayes classifier

摘要: The longest common subsequence (LCS) is a well known and popular method for measuring similarity between sequences. In this paper we consider all subsequences (ACS) as measure of sequence with the view that information captured. ACS inspired derived from generic measure, neighbourhood counting metric (NCM). close connection NCM probability Bayes classifier helps gain an insight probabilistic perspective into ACS. We also design algorithm to calculate carry out experiment in framework k-nearest neighbours on gene classification task. shows LCS have little difference small k values, but differ significantly large values. ACS's performance remains steady gets larger, LCS's drops sharply larger. Such property may be useful some tasks.

参考文章(10)
Werner Dubitzky, Hui Wang, A flexible and robust similarity measure based on contextual probability international joint conference on artificial intelligence. pp. 27- 32 ,(2005)
Hui Wang, Ivo Düntsch, Günther Gediga, Andrzej Skowron, Hyperrelations in version space International Journal of Approximate Reasoning. ,vol. 36, pp. 223- 241 ,(2004) , 10.1016/J.IJAR.2003.10.007
Sergei Bespamyatnikh, Michael Segal, Enumerating longest increasing subsequences and patience sorting Information Processing Letters. ,vol. 76, pp. 7- 11 ,(2000) , 10.1016/S0020-0190(00)00124-1
James W. Hunt, Thomas G. Szymanski, A fast algorithm for computing longest common subsequences Communications of the ACM. ,vol. 20, pp. 350- 353 ,(1977) , 10.1145/359581.359603
Michael L. Fredman, On computing the length of longest increasing subsequences Discrete Mathematics. ,vol. 11, pp. 29- 35 ,(1975) , 10.1016/0012-365X(75)90103-X
C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)
Lasse Bergroth, Harri Hakonen, Timo Raita, None, A survey of longest common subsequence algorithms string processing and information retrieval. pp. 39- 48 ,(2000) , 10.1109/SPIRE.2000.878178
Hui Wang, Nearest neighbors by neighborhood counting IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 28, pp. 942- 953 ,(2006) , 10.1109/TPAMI.2006.126
I-Hsuan Yang, Chien-Pin Huang, Kun-Mao Chao, A fast algorithm for computing a longest common increasing subsequence Information Processing Letters. ,vol. 93, pp. 249- 253 ,(2005) , 10.1016/J.IPL.2004.10.014
Daniel S. Hirschberg, Algorithms for the Longest Common Subsequence Problem Journal of the ACM. ,vol. 24, pp. 664- 675 ,(1977) , 10.1145/322033.322044