Making Policy Gradient Estimators for Softmax Policies More Robust to Non-stationarities

作者： Shivam Garg , Samuele Tosatto , Yangchen Pan , Martha White , A Rupam Mahmood

DOI:

关键词:

摘要: Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability …

svmgrg.github.io 本地加速

svmgrg.github.io PDF 下载加速

参考文章(0)

Making Policy Gradient Estimators for Softmax Policies More Robust to Non-stationarities

来源期刊

我的账户

Making Policy Gradient Estimators for Softmax Policies More Robust to Non-stationarities

来源期刊

相似文章 0

我的账户