A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

作者: Pradipto Das , Chenliang Xu , Richard F. Doell , Jason J. Corso

DOI: 10.1109/CVPR.2013.340

关键词: Image stitchingObject (computer science)Topic modelSemanticsLanguage modelVideo trackingComputer scienceObject detectionNatural languageArtificial intelligenceNatural language processing

摘要: The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach generating combinations object detections and models or bottom-up propagation keyword tags from training test probabilistic nearest neighbor techniques. In contrast, videos with is less studied problem. this paper, we combine ideas approaches propose method for video that captures most relevant contents description. We hybrid system consisting low level multimodal latent topic model initial annotation, middle concept detectors high module produce final lingual descriptions. compare results our human descriptions both short long forms two datasets, demonstrate output greater agreement than any single level.

参考文章(36)
Carl Vondrick, Donald Patterson, Deva Ramanan, Efficiently Scaling up Crowdsourced Video Annotation International Journal of Computer Vision. ,vol. 101, pp. 184- 204 ,(2013) , 10.1007/S11263-012-0564-1
Mirella Lapata, Yansong Feng, Topic Models for Image Annotation and Text Illustration north american chapter of the association for computational linguistics. pp. 831- 839 ,(2010)
Anja Belz, Ehud Reiter, Comparing automatic and human evaluation of NLG systems conference of the european chapter of the association for computational linguistics. pp. 313- 320 ,(2006)
Yiannis Aloimonos, Hal Daume Iii, Yezhou Yang, Ching Teo, Corpus-Guided Sentence Generation of Natural Images empirical methods in natural language processing. pp. 444- 454 ,(2011)
Ameesh Makadia, Vladimir Pavlovic, Sanjiv Kumar, A New Baseline for Image Annotation Lecture Notes in Computer Science. pp. 316- 329 ,(2008) , 10.1007/978-3-540-88690-7_24
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth, Every Picture Tells a Story: Generating Sentences from Images Computer Vision – ECCV 2010. pp. 15- 29 ,(2010) , 10.1007/978-3-642-15561-1_2
Marcus Rohrbach, Michaela Regneri, Mykhaylo Andriluka, Sikandar Amin, Manfred Pinkal, Bernt Schiele, Script Data for Attribute-Based Recognition of Composite Activities Computer Vision – ECCV 2012. pp. 144- 157 ,(2012) , 10.1007/978-3-642-33718-5_11
Duangmanee Putthividhy, Hagai T. Attias, Srikantan S. Nagarajan, Topic regression multi-modal Latent Dirichlet Allocation for image annotation computer vision and pattern recognition. pp. 3408- 3415 ,(2010) , 10.1109/CVPR.2010.5540000
David M Blei, Michael I Jordan, None, Modeling annotated data international acm sigir conference on research and development in information retrieval. pp. 127- 134 ,(2003) , 10.1145/860435.860460