Contextual Markov Decision Processes

作者: Assaf Hallak , Dotan Di Castro , Shie Mannor

DOI:

关键词:

摘要: We consider a planning problem where the dynamics and rewards of environment depend on hidden static parameter referred to as context. The objective is learn strategy that maximizes accumulated reward across all contexts. new model, called Contextual Markov Decision Process (CMDP), can model customer's behavior when interacting with website (the learner). depends gender, age, location, device, etc. Based behavior, determine customer characteristics, optimize interaction between them. Our work focuses one basic scenario--finite horizon small known number possible suggest family algorithms provable guarantees underlying models latent contexts, CMDPs. Bounds are obtained for specific naive implementations, extensions framework discussed, laying ground future research.

参考文章(24)
Gadiel Seroussi, Erik Ordentlich, Sergio Verdu, Tsachy Weissman, Marcelo J. Weinberger, Inequalities for the L1 Deviation of the Empirical Distribution ,(2003)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Arnab Nilim, Laurent El Ghaoui, Robust Control of Markov Decision Processes with Uncertain Transition Matrices Operations Research. ,vol. 53, pp. 780- 798 ,(2005) , 10.1287/OPRE.1050.0216
D. J. White, A Survey of Applications of Markov Decision Processes Journal of the Operational Research Society. ,vol. 44, pp. 1073- 1096 ,(1993) , 10.1057/JORS.1993.181
Mathieu Radenen, Thierry Artières, Handling signal variability with contextual markovian models Pattern Recognition Letters. ,vol. 35, pp. 236- 245 ,(2014) , 10.1016/J.PATREC.2013.08.015
Richard L. Tweedie, Sean Meyn, Markov Chains and Stochastic Stability ,(1993)
Rafail Ostrovsky, Yuval Rabani, Leonard Schulman, Chaitanya Swamy, The Effectiveness of Lloyd-Type Methods for the k-Means Problem foundations of computer science. pp. 165- 176 ,(2006) , 10.1109/FOCS.2006.75
Sariel Har-Peled, Soham Mazumdar, On coresets for k-means and k-median clustering symposium on the theory of computing. pp. 291- 300 ,(2004) , 10.1145/1007352.1007400