作者: Andrew Perrault , Haifeng Xu , Milind Tambe , Jackson A. Killian , Aditya Mate
DOI:
关键词:
摘要: We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows binary-state Markovian process with special structure: when an is played, the state fully observed, thus "collapsing" any uncertainty, but passive, no observation made, allowing uncertainty to evolve. The goal keep as many arms "good" possible by planning limited budget of actions per round. Such Collapsing Bandits are natural models for healthcare domains workers must simultaneously monitor patients deliver interventions way that maximizes health their patient cohort. Our main contributions follows: (i) Building on Whittle index technique RMABs, we derive conditions under problem indexable. derivation hinges novel characterize optimal policies may take form either "forward" or "reverse" threshold policies. (ii) exploit optimality build fast algorithms computing index, including closed-form. (iii) evaluate our algorithm several data distributions from real-world task worker maximize patients' adherence tuberculosis medication. achieves 3-order-of-magnitude speedup compared state-of-the-art RMAB techniques while achieving similar performance.