作者: Parisa Mansourifard , Farrokh Jazizadeh , Bhaskar Krishnamachari , Burcin Becerik-Gerber
关键词:
摘要: We consider the problem of automatically learning optimal thermal control in a room order to maximize expected average satisfaction among occupants providing stochastic feedback on their comfort through participatory sensing application. Not assuming any prior knowledge or modeling user comfort, we first apply classic UCB1 online policy for multi-armed bandits (MAB), that combines exploration (testing out certain temperatures understand better preferences) with exploitation (spending more time setting average-satisfaction) case when total occupancy is constant. When time-varying, number possible scenarios (i.e., which particular set are present room) becomes exponentially large, posing combinatorial challenge. However, show LLR, recently-developed MAB algorithm requires recording and computation only polynomial quantities can be applied this setting, yielding regret (cumulative gap respect distribution aware genie) grows polynomially users, logarithmically time. This turn indicates difference unit-time obtained by compared tends 0. quantify performance these algorithms using real data collected from users iPhone app multi-occupancy an office building Southern California.