搜索历史记录选项已关闭,请开启搜索历史记录选项。
作者: Ryan Hayward , Chao Gao , Martin Mueller
DOI:
关键词:
摘要: Policy gradient reinforcement learning has been applied to two-player alternate-turn zero-sum games, eg, in AlphaGo, self-play REINFORCE was used to improve the neural net …
ICGA Journal,2019, 引用: 0
national conference on artificial intelligence,2019, 引用: 198
arXiv: Learning,2019, 引用: 28
arXiv: Learning,2019, 引用: 708
,2019, 引用: 0
2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC),2020, 引用: 1