摘要: Nowadays massive amount of web video datum has been emerging on the Internet. To achieve an effective and efficient retrieval, it is critical to automatically assign semantic keywords videos via content analysis. However, most existing tagging methods suffer from problem lacking sufficient tagged training due high labor cost manual tagging. Inspired by observation that there are much more well-labeled data in other yet relevant types media (e.g. images), this paper we study how build a "cross-media tunnel" transfer external tag knowledge image video. Meanwhile, intrinsic structures both spaces well explored for inferring tags. We propose Cross-Media Tag Transfer (CMTT) paradigm which able to: 1) between minimizing their distribution difference; 2) infer tags revealing underlying manifold embedded within spaces. also learn explicit mapping function handle unseen videos. Experimental results have reported analyzed illustrate superiority our proposal.