Scalable Deep Multimodal Learning for Cross-Modal Retrieval

作者: Peng Hu , Liangli Zhen , Dezhong Peng , Pei Liu

DOI: 10.1145/3331184.3331213

关键词: Machine learningScalabilityArtificial intelligenceSubspace topologyModality (human–computer interaction)Benchmark (computing)Multimodal learningModalFeature learningComputer science

摘要: … In this section, we first introduce the problem definition for the multimodal learning. Then, we present the Scalable Deep Multimodal Learning (SDML) algorithm, which trains the models …

参考文章(43)
Shotaro Akaho, A kernel method for canonical correlation analysis arXiv: Learning. ,(2006)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4489- 4497 ,(2015) , 10.1109/ICCV.2015.510
Karen Livescu, Galen Andrew, Jeff Bilmes, Raman Arora, Deep Canonical Correlation Analysis international conference on machine learning. pp. 1247- 1255 ,(2013)
Geoffrey E. Hinton, Vinod Nair, Rectified Linear Units Improve Restricted Boltzmann Machines international conference on machine learning. pp. 807- 814 ,(2010)
Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition computer vision and pattern recognition. ,(2014)
Karen Livescu, Jeff Bilmes, Weiran Wang, Raman Arora, On Deep Multi-View Representation Learning international conference on machine learning. pp. 1083- 1092 ,(2015)
Fangxiang Feng, Xiaojie Wang, Ruifan Li, Cross-modal Retrieval with Correspondence Autoencoder acm multimedia. pp. 7- 16 ,(2014) , 10.1145/2647868.2654902
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, Yantao Zheng, NUS-WIDE Proceeding of the ACM International Conference on Image and Video Retrieval - CIVR '09. pp. 48- ,(2009) , 10.1145/1646396.1646452
Xiaohua Zhai, Yuxin Peng, Jianguo Xiao, Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization IEEE Transactions on Circuits and Systems for Video Technology. ,vol. 24, pp. 965- 978 ,(2014) , 10.1109/TCSVT.2013.2276704