Double Gumbel Q-learning

David Yu-Tung Hui, MILA, will talk about his work presented at NeurIPS 2023, Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control.

Abstract

We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.

References

[Hui23D]

Double Gumbel Q-Learning, David Yu-Tung Hui, Aaron C. Courville, Pierre-Luc Bacon.

Dec 2023

We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is …

In this series →

Reinforcement Learning

Software: Tianshou: An elegant deep reinforcement … Trainings: Safe and efficient deep reinforcement … Trainings: Classical and modern methods in planning … Seminar: Episode-based RL with Movement Primitive Pills: Hyperparameters in Reinforcement … Pills: Exploiting past success in Off-Policy … Pills: Jump-Start Reinforcement Learning Pills: Implicit Q Learning Pills: Advantage-Induced Policy Alignment Pills: Critic Regularized Regression Pills: Deep Reinforcement Learning at the Edge … Pills: Planning with Diffusion for Flexible … Blog: Natural, Trust Region and Proximal … Seminar: Problems and solutions in offline … Seminar: Neural Network Dynamics for Model-Based … Seminar: A short introduction to model-based …