Writings

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables PEARL learns task belief from context via a probabilistic encoder, making off-policy meta-RL both sample-efficient and easier to adapt.

19 Jan 2021

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks MAML optimizes for rapid adaptation so a model can learn new tasks with only a few gradient steps, across supervised learning and RL.

16 Jan 2021

Duelling Architecture for DQNs Duelling networks split value and advantage estimation to make state-value learning cleaner and more stable.

8 Oct 2020

Double DQNs Double DQN reduces overestimation by separating action selection from action evaluation using the target network.

5 Oct 2020

Playing Atari with Deep Reinforcement Learning The original DQN setup for learning Atari control directly from pixels — Q-learning with a deep network, experience replay, and fixed targets.

3 Oct 2020