Learning to geolocate videos

作者: Luciano Sbaiz , Jasper Snoek , Hrishikesh Aradhye , George Toderici

DOI:

关键词: Classifier (UML)Electronic mapPattern recognitionTraining systemArtificial intelligenceGeolocationTraining setLandmark matchingLandmarkComputer scienceComputer vision

摘要: A classifier training system trains classifiers for inferring the geographic locations of videos. number are provided, where each corresponds to a particular location and is trained from set videos that have been labeled as representing location. In one embodiment, further restricted those in which landmark matching label detected. The extracts, these videos, features characterize video, such audiovisual features, text address category features. Based on corresponding Each can be applied without associated labels predict whether, or how strongly, video represents prediction used variety purposes, automatic labeling with locations, presentation location-specific advertisements association display data relevant portions an electronic map.

参考文章(152)
Juan J. Rodríguez, César García-Osorio, Jesús Maudes, Forests of nested dichotomies Pattern Recognition Letters. ,vol. 31, pp. 125- 132 ,(2010) , 10.1016/J.PATREC.2009.09.015
Robert E. Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, Boosting the margin: a new explanation for the effectiveness of voting methods Annals of Statistics. ,vol. 26, pp. 1651- 1686 ,(1998) , 10.1214/AOS/1024691352
Huamin Feng, Rui Shi, Tat-Seng Chua, A bootstrapping framework for annotating and retrieving WWW images acm multimedia. pp. 960- 967 ,(2004) , 10.1145/1027527.1027748
Jun Yang, Rong Yan, Alexander G. Hauptmann, Cross-domain video concept detection using adaptive svms Proceedings of the 15th international conference on Multimedia - MULTIMEDIA '07. pp. 188- 197 ,(2007) , 10.1145/1291233.1291276
Matthieu Guillaumin, Jakob Verbeek, Cordelia Schmid, Multimodal semi-supervised learning for image classification computer vision and pattern recognition. pp. 902- 909 ,(2010) , 10.1109/CVPR.2010.5540120
Marc'Aurelio Ranzato, Geoffrey E. Hinton, Modeling pixel means and covariances using factorized third-order boltzmann machines computer vision and pattern recognition. pp. 2551- 2558 ,(2010) , 10.1109/CVPR.2010.5539962
S. Charles Brubaker, Jianxin Wu, Jie Sun, Matthew D. Mullin, James M. Rehg, On the Design of Cascades of Boosted Ensembles for Face Detection International Journal of Computer Vision. ,vol. 77, pp. 65- 86 ,(2008) , 10.1007/S11263-007-0060-1
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504
Cees G. M. Snoek, Marcel Worring, Arnold W. M. Smeulders, Early versus late fusion in semantic video analysis Proceedings of the 13th annual ACM international conference on Multimedia - MULTIMEDIA '05. pp. 399- 402 ,(2005) , 10.1145/1101149.1101236
Zheshen Wang, Ming Zhao, Yang Song, Sanjiv Kumar, Baoxin Li, YouTubeCat: Learning to categorize wild web videos computer vision and pattern recognition. pp. 879- 886 ,(2010) , 10.1109/CVPR.2010.5540125