Learning Multiple Markov Chains via Adaptive Allocation

Odalric-Ambrym Maillard , Mohammad Sadegh Talebi
neural information processing systems 32 13343 -13353

2019
Budgeted Reinforcement Learning in Continuous State Space

Olivier Pietquin , Odalric-Ambrym Maillard , Tanguy Urvoy , Romain Laroche
neural information processing systems 32 9295 -9305

2
2019
Regret Bounds for Learning State Representations in Reinforcement Learning

Alessandro Lazaric , Odalric-Ambrym Maillard , Matteo Pirotta , Ronald Ortner
neural information processing systems 32 12738 -12748

1
2019
Model-Based Reinforcement Learning Exploiting State-Action Equivalence

Odalric-Ambrym Maillard , Mahsa Asadi , Hippolyte Bourel , Mohammad Sadegh Talebi
arXiv: Learning

2019
Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

Odalric-Ambrym Maillard , Mohammad Sadegh Talebi
algorithmic learning theory 770 -805

12
2018
Active Roll-outs in MDP with Irreversible Dynamics

Odalric-Ambrym Maillard , Ronald Ortner , Timothy Mann , Shie Mannor

2019
Tightening Exploration in Upper Confidence Reinforcement Learning

Odalric-Ambrym Maillard , Hippolyte Bourel , Mohammad Sadegh Talebi
international conference on machine learning 1 1056 -1066

1
2020
Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay

Odalric-Ambrym Maillard , Raphaël Féraud , Reda Alami
international conference on machine learning 1 211 -221

2020
Improved Exploration in Factored Average-Reward MDPs

Odalric-Ambrym Maillard , Anders Jonsson , Mohammad Sadegh Talebi
arXiv: Learning

1
2020
Robust-Adaptive Interval Predictive Control for Linear Uncertain Systems

Odalric-Ambrym Maillard , Denis Efimov , Edouard Leurent
conference on decision and control 1429 -1434

1
2020
Sub-sampling for Efficient Non-Parametric Bandit Exploration

Odalric-Ambrym Maillard , Emilie Kaufmann , Dorian Baudry
neural information processing systems 33 5468 -5478

1
2020
Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

Edouard Leurent , Odalric-Ambrym Maillard , Denis V. Efimov
neural information processing systems 33 3220 -3231

2
2020
Concentration inequalities for sampling without replacement

Rémi Bardenet , Odalric-Ambrym Maillard
Bernoulli 21 ( 3) 1361 -1385

157
2015
Memory Bandits: Towards the Switching Bandit Problem Best Resolution

Odalric-Ambrym Maillard , Raphaël Féraud , Réda Alami
MLSS 2018 - Machine Learning Summer School

2
2018
Monte-Carlo Graph Search: the Value of Merging Similar States

Odalric-Ambrym Maillard , Edouard Leurent
asian conference on machine learning 129 577 -592

2020
Reinforcement Learning in Parametric MDPs with Exponential Families.

Odalric-Ambrym Maillard , Aditya Gopalan , Sayak Ray Chowdhury
international conference on artificial intelligence and statistics 1855 -1863

2021
Kullback-Leibler upper confidence bounds for optimal sequential allocation

Olivier Cappé , Aurélien Garivier , Odalric-Ambrym Maillard , Rémi Munos
arXiv: Probability

375
2012
Online learning in adversarial Lipschitz environments

Odalric-Ambrym Maillard , Rémi Munos
european conference on machine learning 305 -320

22
2010
The non-stationary stochastic multi-armed bandit problem

Robin Allesiardo , Raphaël Féraud , Odalric-Ambrym Maillard
Journal of data science 3 ( 4) 267 -283

69
2017
Robust Risk-Averse Stochastic Multi-armed Bandits

Odalric-Ambrym Maillard
algorithmic learning theory 218 -233

23
2013