作者: Chaima Dhahri , Tomoaki Ohtsuki
DOI: 10.1016/J.PHYCOM.2014.04.008
关键词:
摘要: This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled an arbitrary finite-state Markov chain different state space statistics. user tries to learn best that maximizes its capacity reduces number handovers. a classic exploration/exploitation problem, where reward each considered be Markovian. In addition, process because evolves independently action. leads problem. To solve we refer upper confidence bound (RUCB) algorithm achieves logarithmic regret over time for MAB (proposal 1). Then, extend cope environment by applying change point detection test based on Page-Hinkley (PHT) 2). However, would entail some waste if change-point was actually false alarm. face our previous proposal referring meta-bandit 3) dilemma between Exploration Exploitation after occurs. Simulation results show thatour come close performance opportunistic method terms capacity, while fewer average handovers required.The use allows better than RUCB particularly changing environment.