【策略梯度和Actor-Critic训练】Deep Mind× UCL 2021年强化学习课程第9讲
由qxiao创建,最终由qxiao 被浏览 132 用户
Lecture 9: Policy-Gradient & Actor-Critic methods Research Scientist Hado van Hasselt covers policy algorithms that can learn policies directly and actor critic algorithms that combine value predictions for more efficient learning.
https://www.youtube.com/watch?v=y3oqOjHilio
/wiki/static/upload/08/08a8f342-7822-4c7d-b7b6-4dd4809e1949.pdf
\