AI量化知识树

【策略梯度和Actor-Critic训练】Deep Mind× UCL 2021年强化学习课程第9讲

由qxiao创建,最终由qxiao 被浏览 132 用户

Lecture 9: Policy-Gradient & Actor-Critic methods Research Scientist Hado van Hasselt covers policy algorithms that can learn policies directly and actor critic algorithms that combine value predictions for more efficient learning.

https://www.youtube.com/watch?v=y3oqOjHilio

/wiki/static/upload/08/08a8f342-7822-4c7d-b7b6-4dd4809e1949.pdf

\

标签

深度学习
{link}