Optimal Bidding Strategy without Exploration in Real-time Bidding

作者: Saayan Mitra , Somdeb Sarkhel , Aritra Ghosh , Viswanathan Swaminathan

DOI:

关键词:

摘要: Maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. The policy maximizing referred to as optimal strategy. Earlier works on strategy apply model-based batch reinforcement learning methods which can not generalize unknown and time constraint. Further, advertiser observes censored market price makes direct evaluation infeasible test datasets. Previous ignore losing auctions alleviate difficulty states; thus significantly modifying distribution. We address challenge of lacking clear procedure well error propagated through RTB exploit two conditional independence structures sequential process that allow us propose novel practical framework using maximum entropy principle imitate behavior true distribution observed traffic. Moreover, allows train model unseen conditions than limit only those history. compare our real-world datasets several baselines demonstrate improved performance under various settings.

参考文章(22)
Sascha Lange, Thomas Gabel, Martin Riedmiller, Batch Reinforcement Learning Reinforcement Learning. pp. 45- 73 ,(2012) , 10.1007/978-3-642-27645-3_2
Shuai Yuan, Weinan Zhang, Jun Wang, Xuehua Shen, Real-Time Bidding Benchmarking with iPinYou Dataset arXiv: Computer Science and Game Theory. ,(2014)
Michael Kearns, Kareem Amin, Peter Key, Anton Schwaighofer, Budget optimization for sponsored search: censored learning in MDPs uncertainty in artificial intelligence. pp. 54- 63 ,(2012)
Max Welling, Diederik P Kingma, Auto-Encoding Variational Bayes international conference on learning representations. ,(2014)
Shuai Yuan, Jun Wang, Sequential selection of correlated ads by POMDPs Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12. pp. 515- 524 ,(2012) , 10.1145/2396761.2396828
Pieter Abbeel, Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning Twenty-first international conference on Machine learning - ICML '04. pp. 1- 8 ,(2004) , 10.1145/1015330.1015430
Claudia Perlich, Brian Dalessandro, Rod Hook, Ori Stitelman, Troy Raeder, Foster Provost, None, Bid optimizing and inventory scoring in targeted online advertising Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12. pp. 804- 812 ,(2012) , 10.1145/2339530.2339655
Wush Chi-Hsuan Wu, Mi-Yen Yeh, Ming-Syan Chen, Predicting Winning Price in Real Time Bidding with Censored Data knowledge discovery and data mining. pp. 1305- 1314 ,(2015) , 10.1145/2783258.2783276