作者: Nakamasa Inoue , Shanshan Hao , Koichi Shinoda , Tatsuhiko Saito , Chin-Hui Lee
DOI:
关键词:
摘要: First, we extract SIFT features from all the image frames in each shot. This multi-frame technique is expected to perform well especially when objects are taken different angles. Then, model extracted shot by a GMM. We call resulting GMMs GMMs. They be more robust against quantization errors that occur hard-assignment clustering Bag-of-Keypoints approach. Furthermore, they also have variance information of features. The expectation-maximization (EM) algorithm often used estimate parameters However, there may not enough precisely parameters. Hence, GMM using maximum posteriori (MAP) adaptation which priori distribution estimated videos. classify shots support vector machines (SVMs) with radial basis function (RBF) kernel, where distance between defined as weighted sum Mahalanobis distances corresponding mixture components. 2. Acoustic As acoustic features, mel-frequency cepstrum coefficients (MFCCs), widely speech recognition. HLF an ergodic hidden Markov (HMM). make HMM for HLFs universal background (UBM) and use likelihood ratio target UBM detection.