作者: Christina Leslie , Rui Kuang
DOI: 10.1007/978-3-540-45167-9_10
关键词:
摘要: We introduce several new families of string kernels designed in particular for use with support vector machines (SVMs) classification protein sequence data. These – restricted gappy kernels, substitution and wildcard are based on feature spaces indexed by k-length subsequences from the alphabet Σ (or augmented a character), hence they related to recently presented (k,m)-mismatch kernel used text classification. However, all we define here, value K(x,y) can be computed O(c K (|x| + |y|)) time, where constant c depends parameters but is independent size |Σ| alphabet. Thus computation these linear length sequences, like mismatch kernel, improve upon parameter-dependent \(c_K = k^{m+1} |\Sigma|^m\) kernel. compute efficiently using recursive function trie data structure relate our described transducer formalism. Finally, report experiments benchmark SCOP dataset, show that faster achieve SVM performance comparable Fisher derived profile hidden Markov models.