作者: Parisa Mansourifard , Farrokh Jazizadeh , Bhaskar Krishnamachari , Burcin Becerik-Gerber
DOI:
关键词:
摘要: We consider the problem of automatically learning the optimal thermal control in a room in order to maximize the expected average satisfaction among occupants providing stochastic feedback on their comfort through a participatory sensing application. Not assuming any prior knowledge or modeling of user comfort, we first apply the classic UCB1 online learning policy for multi-armed bandits (MAB), that combines exploration (testing out certain temperatures to understand better the user preferences) with exploitation (spending more time setting temperatures that maximize average-satisfaction) for the case when the total occupancy is constant. When occupancy is time-varying, the number of possible scenarios (i.e., which particular set of occupants are present in the room) becomes exponentially large, posing a combinatorial challenge. However, we show that LLR, a recently-developed combinatorial MAB online …