A Natural Policy Gradient | TransferLab

Reference

A Natural Policy Gradient, Sham M. Kakade. Advances in Neural Information Processing Systems(2001)

Abstract

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

Content citing this item

Blog

Natural, Trust Region and Proximal Policy Optimization

We present an overview of the theory behind three popular and related algorithms for gradient based policy optimization: natural policy …

Reinforcement Learning

Aug 10, 2021

All works referenced in our site...