作者: Mari Ostendorf , Li Deng , Jianfeng Gao , Xiaodong He , Lihong Li
DOI:
关键词: Bellman equation 、 Task (computing) 、 Reinforcement learning 、 Natural language 、 Computer science 、 Action (philosophy) 、 Benchmark (computing) 、 Artificial intelligence 、 Space (commercial competition)
摘要: We introduce an online popularity prediction and tracking task as a benchmark for reinforcement learning with combinatorial, natural language action space. A specified number of discussion threads predicted to be popular are recommended, chosen from fixed window recent comments track. Novel deep architectures studied effective modeling the value function associated actions comprised interdependent sub-actions. The proposed model, which represents dependence between sub-actions through bi-directional LSTM, gives best performance across different experimental configurations domains, it also generalizes well varying numbers recommendation requests.