Making Policy Gradient Estimators for Softmax Policies More Robust to Non-stationarities

作者: Shivam Garg , Samuele Tosatto , Yangchen Pan , Martha White , A Rupam Mahmood

DOI:

关键词:

摘要: Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability …

参考文章(0)